Saturday, 16 March 2019

AWS Glue Allows Executing Apache Spark SQL queries

AWS Glue is a fully organized ETL (extract, transform, and load) service which makes it easy and profitable to classify your data, clean it, enhance it, and move it reliably between several data stores. AWS Glue made up of a central metadata repository familiar as the AWS Glue Data Catalog, an ETL engine which automatically creates Python or Scala code, and a flexible scheduler which controls dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there’s no infrastructure to set up or manage. AWS Glue Data Catalog is an Apache Hive Metastore compatible catalog. Now users can configure their AWS Glue jobs and development endpoints to use AWS Glue Data Catalog as an external Apache Hive Metastore. This enables them to straight execute Apache Spark SQL queries versus the tables saved in the AWS Glue Data Catalog. This feature is accessible in every assisted regions for AWS Glue. To know further about this new potential, refer documentation.

No comments:

Post a Comment

Optimizing Performance and Cost: Migrating an Express.js Application from EC2 to AWS Lambda

Introduction: In a recent project, our team worked on optimizing a Node.js application that was originally hosted on an EC2 instance. The ap...