AWS Glue is an entirely organized ETL service (Extract, Transform, Load) which
makes it simpler for users to set and load their data for analytics. You only identify AWS Glue to your data saved on AWS, and AWS Glue finds your data and keeps in linked metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. A crawler can crawl several data stores within one execution. After execution, the crawler adds or updates one or more tables in your Data Catalog. A crawler can be used to populate the AWS Glue Data Catalog with tables. When cataloged, your data is instantly searchable, queryable, and available for ETL. Now you can define a list of tables from your AWS Glue Data Catalog as sources in the crawler configuration. Earlier, crawlers were only able to take data paths as sources, scan your data, and add new tables in the AWS Glue Data Catalog. Now crawlers can take current tables as sources, recognize modifications to their schema and update the table definitions, and register new partitions as new data gets accessible. This feature will be functional if you want to import current table definitions from an outer Apache Hive Metastore into the AWS Glue Data Catalog and use crawlers to maintain these up to date your data modifications. This feature is accessible in each AWS regions where AWS Glue is accessible. To know detailed about this feature, refer documentation.
No comments:
Post a Comment