Note

You are viewing the documentation for an older version of boto (boto2).

Boto3, the next version of Boto, is now stable and recommended for general use. It can be used side-by-side with Boto in the same project, so it is easy to start using Boto3 in your existing projects as well as new projects. Going forward, API updates and all new feature work will be focused on Boto3.

For more information, see the documentation for boto3.

EMR¶

boto.emr¶

This module provies an interface to the Elastic MapReduce (EMR) service from AWS.

boto.emr.connect_to_region(region_name, **kw_params)¶

boto.emr.regions()¶

Get all available regions for the Amazon Elastic MapReduce service.

Return type:	list
Returns:	A list of `boto.regioninfo.RegionInfo`

boto.emr.connection¶

Represents a connection to the EMR service

class boto.emr.connection.EmrConnection(aws_access_key_id=None, aws_secret_access_key=None, is_secure=True, port=None, proxy=None, proxy_port=None, proxy_user=None, proxy_pass=None, debug=0, https_connection_factory=None, region=None, path='/', security_token=None, validate_certs=True, profile_name=None)¶

APIVersion = '2009-03-31'¶

DebuggingArgs = 's3://{region_name}.elasticmapreduce/libs/state-pusher/0.1/fetch'¶

DebuggingJar = 's3://{region_name}.elasticmapreduce/libs/script-runner/script-runner.jar'¶

DefaultRegionEndpoint = 'elasticmapreduce.us-east-1.amazonaws.com'¶

DefaultRegionName = 'us-east-1'¶

ResponseError¶: alias of boto.exception.EmrResponseError

add_instance_groups(jobflow_id, instance_groups)¶

Adds instance groups to a running cluster.

Parameters:	jobflow_id (str) – The id of the jobflow which will take the new instance groups instance_groups (list(boto.emr.InstanceGroup)) – A list of instance groups to add to the job

add_jobflow_steps(jobflow_id, steps)¶

Adds steps to a jobflow

Parameters:	jobflow_id (str) – The job flow id steps (list(boto.emr.Step)) – A list of steps to add to the job

add_tags(resource_id, tags)¶

Create new metadata tags for the specified resource id.

Parameters:	resource_id (str) – The cluster id tags (dict) – A dictionary containing the name/value pairs. If you want to create only a tag name, the value for that tag should be the empty string (e.g. ‘’) or None.

describe_cluster(cluster_id)¶

Describes an Elastic MapReduce cluster

Parameters:	cluster_id (str) – The cluster id of interest

describe_jobflow(jobflow_id)¶

This method is deprecated. We recommend you use list_clusters, describe_cluster, list_steps, list_instance_groups and list_bootstrap_actions instead.

Describes a single Elastic MapReduce job flow

Parameters:	jobflow_id (str) – The job flow id of interest

describe_jobflows(states=None, jobflow_ids=None, created_after=None, created_before=None)¶

This method is deprecated. We recommend you use list_clusters, describe_cluster, list_steps, list_instance_groups and list_bootstrap_actions instead.

Retrieve all the Elastic MapReduce job flows on your account

Parameters:	states (list) – A list of strings with job flow states wanted jobflow_ids (list) – A list of job flow IDs created_after (datetime) – Bound on job flow creation time created_before (datetime) – Bound on job flow creation time

describe_step(cluster_id, step_id)¶

Describe an Elastic MapReduce step

Parameters:	cluster_id (str) – The cluster id of interest step_id (str) – The step id of interest

list_bootstrap_actions(cluster_id, marker=None)¶

Get a list of bootstrap actions for an Elastic MapReduce cluster

Parameters:	cluster_id (str) – The cluster id of interest marker (str) – Pagination marker

list_clusters(created_after=None, created_before=None, cluster_states=None, marker=None)¶

List Elastic MapReduce clusters with optional filtering

Parameters:	created_after (datetime) – Bound on cluster creation time created_before (datetime) – Bound on cluster creation time cluster_states (list) – Bound on cluster states marker (str) – Pagination marker

list_instance_groups(cluster_id, marker=None)¶

List EC2 instance groups in a cluster

Parameters:	cluster_id (str) – The cluster id of interest marker (str) – Pagination marker

list_instances(cluster_id, instance_group_id=None, instance_group_types=None, marker=None)¶

List EC2 instances in a cluster

Parameters:	cluster_id (str) – The cluster id of interest instance_group_id (str) – The EC2 instance group id of interest instance_group_types (list) – Filter by EC2 instance group type marker (str) – Pagination marker

list_steps(cluster_id, step_states=None, marker=None)¶

List cluster steps

Parameters:	cluster_id (str) – The cluster id of interest step_states (list) – Filter by step states marker (str) – Pagination marker

modify_instance_groups(instance_group_ids, new_sizes)¶

Modify the number of nodes and configuration settings in an instance group.

Parameters:	instance_group_ids (list(str)) – A list of the ID’s of the instance groups to be modified new_sizes (list(int)) – A list of the new sizes for each instance group

remove_tags(resource_id, tags)¶

Remove metadata tags for the specified resource id.

Parameters:	resource_id (str) – The cluster id tags (list) – A list of tag names to remove.

run_jobflow(name, log_uri=None, ec2_keyname=None, availability_zone=None, master_instance_type='m1.small', slave_instance_type='m1.small', num_instances=1, action_on_failure='TERMINATE_JOB_FLOW', keep_alive=False, enable_debugging=False, hadoop_version=None, steps=None, bootstrap_actions=[], instance_groups=None, additional_info=None, ami_version=None, api_params=None, visible_to_all_users=None, job_flow_role=None, service_role=None)¶

Runs a job flow :type name: str :param name: Name of the job flow

Parameters:	log_uri (str) – URI of the S3 bucket to place logs ec2_keyname (str) – EC2 key used for the instances availability_zone (str) – EC2 availability zone of the cluster master_instance_type (str) – EC2 instance type of the master slave_instance_type (str) – EC2 instance type of the slave nodes num_instances (int) – Number of instances in the Hadoop cluster action_on_failure (str) – Action to take if a step terminates keep_alive (bool) – Denotes whether the cluster should stay alive upon completion enable_debugging (bool) – Denotes whether AWS console debugging should be enabled. hadoop_version (str) – Version of Hadoop to use. This no longer defaults to ‘0.20’ and now uses the AMI default. steps (list(boto.emr.Step)) – List of steps to add with the job bootstrap_actions (list(boto.emr.BootstrapAction)) – List of bootstrap actions that run before Hadoop starts. instance_groups (list(boto.emr.InstanceGroup)) – Optional list of instance groups to use when creating this job. NB: When provided, this argument supersedes num_instances and master/slave_instance_type. ami_version (str) – Amazon Machine Image (AMI) version to use for instances. Values accepted by EMR are ‘1.0’, ‘2.0’, and ‘latest’; EMR currently defaults to ‘1.0’ if you don’t set ‘ami_version’. additional_info (JSON str) – A JSON string for selecting additional features api_params (dict) – a dictionary of additional parameters to pass directly to the EMR API (so you don’t have to upgrade boto to use new EMR features). You can also delete an API parameter by setting it to None. visible_to_all_users (bool) – Whether the job flow is visible to all IAM users of the AWS account associated with the job flow. If this value is set to `True`, all IAM users of that AWS account can view and (if they have the proper policy permissions set) manage the job flow. If it is set to `False`, only the IAM user that created the job flow can view and manage it. job_flow_role (str) – An IAM role for the job flow. The EC2 instances of the job flow assume this role. The default role is `EMRJobflowDefault`. In order to use the default role, you must have already created it using the CLI. service_role (str) – The IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf.
Return type:	str
Returns:	The jobflow id

set_termination_protection(jobflow_id, termination_protection_status)¶

Set termination protection on specified Elastic MapReduce job flows

Parameters:	jobflow_ids (list or str) – A list of job flow IDs termination_protection_status (bool) – Termination protection status

set_visible_to_all_users(jobflow_id, visibility)¶

Set whether specified Elastic Map Reduce job flows are visible to all IAM users

Parameters:	jobflow_ids (list or str) – A list of job flow IDs visibility (bool) – Visibility

terminate_jobflow(jobflow_id)¶

Terminate an Elastic MapReduce job flow

Parameters:	jobflow_id (str) – A jobflow id

terminate_jobflows(jobflow_ids)¶

Terminate an Elastic MapReduce job flow

Parameters:	jobflow_ids (list) – A list of job flow IDs

boto.emr.step¶

class boto.emr.step.HiveBase(name, **kw)¶

BaseArgs = ['s3n://us-east-1.elasticmapreduce/libs/hive/hive-script', '--base-path', 's3n://us-east-1.elasticmapreduce/libs/hive/']¶

class boto.emr.step.HiveStep(name, hive_file, hive_versions='latest', hive_args=None)¶: Hive script step

class boto.emr.step.InstallHiveStep(hive_versions='latest', hive_site=None)¶

Install Hive on EMR step

InstallHiveName = 'Install Hive'¶

class boto.emr.step.InstallPigStep(pig_versions='latest')¶

Install pig on emr step

InstallPigName = 'Install Pig'¶

class boto.emr.step.JarStep(name, jar, main_class=None, action_on_failure='TERMINATE_JOB_FLOW', step_args=None)¶

Custom jar step

A elastic mapreduce step that executes a jar

Parameters:	name (str) – The name of the step jar (str) – S3 URI to the Jar file main_class (str) – The class to execute in the jar action_on_failure (str) – An action, defined in the EMR docs to take on failure. step_args (list(str)) – A list of arguments to pass to the step

args()¶

Return type:	list(str)
Returns:	List of arguments for the step

jar()¶

Return type:	str
Returns:	URI to the jar

main_class()¶

Return type:	str
Returns:	The main class name

class boto.emr.step.PigBase(name, **kw)¶

BaseArgs = ['s3n://us-east-1.elasticmapreduce/libs/pig/pig-script', '--base-path', 's3n://us-east-1.elasticmapreduce/libs/pig/']¶

class boto.emr.step.PigStep(name, pig_file, pig_versions='latest', pig_args=[])¶: Pig script step

class boto.emr.step.ScriptRunnerStep(name, **kw)¶

ScriptRunnerJar = 's3n://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar'¶

class boto.emr.step.Step¶

Jobflow Step base class

args()¶

Return type:	list(str)
Returns:	List of arguments for the step

jar()¶

Return type:	str
Returns:	URI to the jar

main_class()¶

Return type:	str
Returns:	The main class name

class boto.emr.step.StreamingStep(name, mapper, reducer=None, combiner=None, action_on_failure='TERMINATE_JOB_FLOW', cache_files=None, cache_archives=None, step_args=None, input=None, output=None, jar='/home/hadoop/contrib/streaming/hadoop-streaming.jar')¶

Hadoop streaming step

A hadoop streaming elastic mapreduce step

Parameters:

name (str) – The name of the step
mapper (str) – The mapper URI
reducer (str) – The reducer URI
combiner (str) – The combiner URI. Only works for Hadoop 0.20 and later!
action_on_failure (str) – An action, defined in the EMR docs to take on failure.
cache_files (list(str)) – A list of cache files to be bundled with the job
cache_archives (list(str)) – A list of jar archives to be bundled with the job
step_args (list(str)) – A list of arguments to pass to the step
input (str or a list of str) – The input uri
output (str) – The output uri
jar (str) – The hadoop streaming jar. This can be either a local path on the master node, or an s3:// URI.

args()¶

Return type:	list(str)
Returns:	List of arguments for the step

jar()¶

Return type:	str
Returns:	URI to the jar

main_class()¶

Return type:	str
Returns:	The main class name

boto.emr.emrobject¶

This module contains EMR response objects

class boto.emr.emrobject.AddInstanceGroupsResponse(connection=None)¶

Fields = set(['InstanceGroupIds', 'JobFlowId'])¶

class boto.emr.emrobject.Application(connection=None)¶

Fields = set(['Args', 'Version', 'Name', 'AdditionalInfo'])¶

class boto.emr.emrobject.Arg(connection=None)¶

endElement(name, value, connection)¶

class boto.emr.emrobject.BootstrapAction(connection=None)¶

Fields = set(['Path', 'Args', 'Name', 'ScriptPath'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.BootstrapActionList(connection=None)¶

Fields = set(['Marker'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.Cluster(connection=None)¶

Fields = set(['Name', 'ServiceRole', 'TerminationProtected', 'RunningAmiVersion', 'NormalizedInstanceHours', 'Id', 'MasterPublicDnsName', 'VisibleToAllUsers', 'RequestedAmiVersion', 'AutoTerminate', 'LogUri'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.ClusterStateChangeReason(connection=None)¶

Fields = set(['Message', 'Code'])¶

class boto.emr.emrobject.ClusterStatus(connection=None)¶

Fields = set(['Timeline', 'State', 'StateChangeReason'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.ClusterSummary(connection)¶

Fields = set(['NormalizedInstanceHours', 'Id', 'Name'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.ClusterSummaryList(connection)¶

Fields = set(['Marker'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.ClusterTimeline(connection=None)¶

Fields = set(['ReadyDateTime', 'CreationDateTime', 'EndDateTime'])¶

class boto.emr.emrobject.Ec2InstanceAttributes(connection=None)¶

Fields = set(['Ec2SubnetId', 'IamInstanceProfile', 'Ec2KeyName', 'Ec2AvailabilityZone'])¶

class boto.emr.emrobject.EmrObject(connection=None)¶

Fields = set([])¶

endElement(name, value, connection)¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.HadoopStep(connection=None)¶

Fields = set(['Id', 'ActionOnFailure', 'Name'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.InstanceGroup(connection=None)¶

Fields = set(['ReadyDateTime', 'InstanceType', 'InstanceRole', 'EndDateTime', 'InstanceRunningCount', 'State', 'BidPrice', 'Market', 'StartDateTime', 'Name', 'InstanceGroupId', 'CreationDateTime', 'InstanceRequestCount', 'LastStateChangeReason', 'LaunchGroup'])¶

class boto.emr.emrobject.InstanceGroupInfo(connection=None)¶

Fields = set(['RequestedInstanceCount', 'Name', 'InstanceGroupType', 'Id', 'BidPrice', 'InstanceType', 'Market', 'RunningInstanceCount'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.InstanceGroupList(connection=None)¶

Fields = set(['Marker'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.InstanceInfo(connection=None)¶

Fields = set(['Ec2InstanceId', 'PublicDnsName', 'PrivateDnsName', 'PublicIpAddress', 'Id', 'PrivateIpAddress'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.InstanceList(connection=None)¶

Fields = set(['Marker'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.JobFlow(connection=None)¶

Fields = set(['TerminationProtected', 'MasterInstanceId', 'State', 'HadoopVersion', 'LogUri', 'AmiVersion', 'Ec2KeyName', 'ReadyDateTime', 'Type', 'JobFlowId', 'CreationDateTime', 'LastStateChangeReason', 'Name', 'EndDateTime', 'Value', 'InstanceCount', 'RequestId', 'StartDateTime', 'SlaveInstanceType', 'AvailabilityZone', 'MasterPublicDnsName', 'NormalizedInstanceHours', 'MasterInstanceType', 'VisibleToAllUsers', 'KeepJobFlowAliveWhenNoSteps', 'Id'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.JobFlowStepList(connection=None)¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.KeyValue(connection=None)¶

Fields = set(['Value', 'Key'])¶

class boto.emr.emrobject.ModifyInstanceGroupsResponse(connection=None)¶

Fields = set(['RequestId'])¶

class boto.emr.emrobject.RunJobFlowResponse(connection=None)¶

Fields = set(['JobFlowId'])¶

class boto.emr.emrobject.Step(connection=None)¶

Fields = set(['Name', 'EndDateTime', 'Jar', 'ActionOnFailure', 'State', 'MainClass', 'StartDateTime', 'CreationDateTime', 'LastStateChangeReason'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.StepConfig(connection=None)¶

Fields = set(['MainClass', 'Jar'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.StepId(connection=None)¶

class boto.emr.emrobject.StepSummary(connection=None)¶

Fields = set(['Id', 'Name'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.StepSummaryList(connection=None)¶

Fields = set(['Marker'])¶

startElement(name, attrs, connection)¶

class boto.emr.emrobject.SupportedProduct(connection=None)¶

EMR¶

boto.emr¶

boto.emr.connection¶

boto.emr.step¶

boto.emr.emrobject¶

Table Of Contents

Previous topic

Next topic

This Page