Amazon S3 Inventory offers Apache Parquet output format

Thursday, 6 December 2018

Amazon S3 Inventory offers Apache Parquet output format

Flat file lists of objects and selected metadata for your bucket or shared prefixes are offered by the Amazon S3 Inventory. This Amazon S3 Inventory can be used to list, audit, and report on the status of your objects, or to clarify and faster business workflows and big data jobs. Apache Parquet file format is now obtainable for users in Amazon S3 Inventory reports. Apache Parquet is a storage file format, alike to ORC (optimized row-columnar) and is accessible to any project in the Hadoop ecosystem anyway of the choice of data processing framework, data model, or programming language. This columnar format allows the you read, decompress, and operate only the columns which are needed for the current query. AWS endorse configuring your S3 Inventory report in either Parquet or ORC for rapid query performance and less query costs for querying S3 Inventory with AWS services like Amazon Athena or Amazon Redshift Spectrum, or tools such as Apache Hive, Spark, HBase or Presto. Apache Parquet format for S3 Inventory is obtainable in every AWS commercial and AWS GovCloud Regions. You can refer AWS Management Console or use S3 API, CLI, or SDK to set your S3 Inventory configuration.

Labels

Thursday, 6 December 2018