Monday, 12 August 2019

AWS Glue offers FindMatches ML transform to remove duplicate data and search equivalent records in your dataset

AWS Glue is ETL service (extract, transform, and load) which is easy and cost-effective to classify your data, clean it, enrich it, and move it reliably between various data stores. AWS Glue is serverless, so there’s no infrastructure to set up or manage. AWS Glue includes a central metadata repository called as the AWS Glue Data Catalog, an ETL engine that automatically creates Python or Scala code, and a flexible scheduler which manages dependency resolution, job monitoring, and retries. Now AWS Glue can use to search equivalent records over a dataset with the help of new FindMatches ML Transform. FindMatches ML Transform is a custom machine learning transformation which aids you find equivalent records. By connecting the FindMatches transformation to your Glue ETL jobs, you can search connected products, places, suppliers, customers, and more. Besides, you can use this to remove duplicate data like to find customers who have signed up more than once, products that have been added inadvertently to your product catalog more than once, and so forth. You can instruct the FindMatches ML Transform your definition of a “duplicate” via examples, and it will utilize machine learning to search other possible duplicates in your dataset. This new feature AWS Glue ML Transforms is currently accessible in the US East (Northern Virginia), US East (Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) AWS regions.

No comments:

Post a Comment

Now Amazon Athena helps querying data in Amazon S3 Requester Pays buckets

Amazon Athena is an interactive query service which makes it simple to examine data straight in Amazon Simple Storage Service (Amazon S3)...