Saturday, 16 March 2019

AWS Glue Allows Executing Apache Spark SQL queries

AWS Glue is a fully organized ETL (extract, transform, and load) service which makes it easy and profitable to classify your data, clean it, enhance it, and move it reliably between several data stores. AWS Glue made up of a central metadata repository familiar as the AWS Glue Data Catalog, an ETL engine which automatically creates Python or Scala code, and a flexible scheduler which controls dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there’s no infrastructure to set up or manage. AWS Glue Data Catalog is an Apache Hive Metastore compatible catalog. Now users can configure their AWS Glue jobs and development endpoints to use AWS Glue Data Catalog as an external Apache Hive Metastore. This enables them to straight execute Apache Spark SQL queries versus the tables saved in the AWS Glue Data Catalog. This feature is accessible in every assisted regions for AWS Glue. To know further about this new potential, refer documentation.

No comments:

Post a Comment

Kickstart Your GenAI Journey with Amazon Bedrock & Boto3 — How to Send Files, Handle Unsupported Formats & Start Exploring

✅ No S3 buckets ✅ No web frameworks ✅ Just plain Python, a file, and a smart AI agent ✅ BONUS: Learn how to handle Excel/Word files even if ...