Saturday 16 March 2019

AWS Glue Allows Executing Apache Spark SQL queries

AWS Glue is a fully organized ETL (extract, transform, and load) service which makes it easy and profitable to classify your data, clean it, enhance it, and move it reliably between several data stores. AWS Glue made up of a central metadata repository familiar as the AWS Glue Data Catalog, an ETL engine which automatically creates Python or Scala code, and a flexible scheduler which controls dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there’s no infrastructure to set up or manage. AWS Glue Data Catalog is an Apache Hive Metastore compatible catalog. Now users can configure their AWS Glue jobs and development endpoints to use AWS Glue Data Catalog as an external Apache Hive Metastore. This enables them to straight execute Apache Spark SQL queries versus the tables saved in the AWS Glue Data Catalog. This feature is accessible in every assisted regions for AWS Glue. To know further about this new potential, refer documentation.

No comments:

Post a Comment

Discovering Athena's Might: Analyzing Amazon CloudFront Logs

  In the vast world of cloud technology, Amazon Web Services (AWS) is a big player, providing lots of tools to help businesses grow smoothly...