AWS Batch is a regional service that allows developers, scientists, and engineers to smoothly and efficiently execute
thousands of batch computing jobs on AWS over several Availability Zones in a single region. AWS Batch dynamically delivers the best size and type of compute resources example - CPU or memory optimized instances related to the volume and particular resource needs of the batch jobs submitted in order to remove capacity constraints, decrease compute costs, and provide solutions faster. You can build AWS Batch compute domains in a current or new VPC.
Now multi-node parallel jobs are supported by AWS Batch that allows you to run single jobs which need several EC2 instances. You can execute large-scale, tightly coupled, high performance computing applications and distributed GPU model training without the require to inaugurate, configure, and handle Amazon EC2 resources directly with AWS Batch multi-node parallel jobs. An AWS Batch multi-node paralleljob is compatible with any layout which supports IP-based, internode communication, like Apache MXNet, TensorFlow, Caffe2, or Message Passing Interface (MPI).
You can take your own Docker container with selected frameworks and libraries, like Apache MXNet, TensorFlow, Caffe2, and Message Passing Interface (MPI). AWS Batch will manage job deployment and compute resource management, enabling you to target on analyzing results rather than installing and handling infrastructure.
No comments:
Post a Comment