Thursday, 6 December 2018

Amazon S3 Inventory offers Apache Parquet output format

Flat file lists of objects and selected metadata for your bucket or shared prefixes are offered by the Amazon S3 Inventory. This Amazon S3 Inventory can be used to list, audit, and report on the status of your objects, or to clarify and faster business workflows and big data jobs. Apache Parquet file format is now obtainable for users in Amazon S3 Inventory reports. Apache Parquet is a storage file format, alike to ORC (optimized row-columnar) and is accessible to any project in the Hadoop ecosystem anyway of the choice of data processing framework, data model, or programming language. This columnar format allows the you read, decompress, and operate only the columns which are needed for the current query. AWS endorse configuring your S3 Inventory report in either Parquet or ORC for rapid query performance and less query costs for querying S3 Inventory with AWS services like Amazon Athena or Amazon Redshift Spectrum, or tools such as Apache Hive, Spark, HBase or Presto. Apache Parquet format for S3 Inventory is obtainable in every AWS commercial and AWS GovCloud Regions. You can refer AWS Management Console or use S3 API, CLI, or SDK to set your S3 Inventory configuration.

No comments:

Post a Comment

Alexa for Business includes WPA2 Enterprise Wi-Fi aid for Shared Echo Devices

Alexa for Business is a service that allows organizations and employees to operate Alexa to get more work done. Now Alexa for Business p...