AWS Glue Allows Executing Apache Spark SQL queries

Saturday, 16 March 2019

AWS Glue Allows Executing Apache Spark SQL queries

AWS Glue is a fully organized ETL (extract, transform, and load) service which makes it easy and profitable to classify your data, clean it, enhance it, and move it reliably between several data stores. AWS Glue made up of a central metadata repository familiar as the AWS Glue Data Catalog, an ETL engine which automatically creates Python or Scala code, and a flexible scheduler which controls dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there’s no infrastructure to set up or manage. AWS Glue Data Catalog is an Apache Hive Metastore compatible catalog. Now users can configure their AWS Glue jobs and development endpoints to use AWS Glue Data Catalog as an external Apache Hive Metastore. This enables them to straight execute Apache Spark SQL queries versus the tables saved in the AWS Glue Data Catalog. This feature is accessible in every assisted regions for AWS Glue. To know further about this new potential, refer documentation.

Labels

Saturday, 16 March 2019