Wednesday, 25 December 2024

Amazon Macie: Identifying Sensitive Information in S3 Objects


Amazon Macie: An Overview

Amazon Macie is an AWS service designed to help detect sensitive information, such as Personally Identifiable Information (PII), credit card numbers, account names, and credentials, within objects stored in S3 buckets. This service is particularly valuable for organizations that use shared object storage, where employees have access to a central repository for storing documents. When your organization’s security policy mandates that no sensitive data or PII should be present in these shared buckets, Amazon Macie provides an automated solution to scan and identify such information, ensuring compliance and enhancing data security

Why Macie?

● Automated Sensitive Data Detection at Scale: Streamline the process of identifying sensitive information across large volumes of data stored in S3.

● Cost-Effective Discovery of Sensitive Data: Efficiently discover and classify sensitive data within S3 buckets without incurring significant overhead costs.

● Enhanced Data Security and Privacy Monitoring: Continuously monitor and safeguard sensitive information, ensuring compliance with data protection regulations.

● Reduced Triage Time: Quickly identify and address security risks or misconfiguration, minimizing the time spent on manual investigation and remediation.

Managed Identifiers vs. Custom Identifiers in Amazon Macie

● Managed Identifiers: These are a set of built-in criteria and techniques that are designed to detect a specific type of sensitive data. Sensitive data examples include credit card numbers, AWS secret access keys, and passport numbers specific to certain countries or regions. These identifiers can identify a wide and continuously growing variety of sensitive data types globally.

● Custom Identifiers: These are user-defined patterns tailored to detect specific data types based on organizational needs. Customers can configure custom identifiers by specifying a regular expression (Regex) for pattern matching, defining proximity-based keywords, setting exclusion keywords, and determining the maximum distance between the keyword and the identified data. This flexibility enables precise detection of unique data types that are not covered by managed identifiers.

● Allow lists : These lists let you define specific text or patterns that Macie should ignore during its scans. These lists are particularly useful for excluding sensitive data exceptions relevant to your organization's environment, such as publicly available names, organizational phone numbers, or test data used in development scenarios. When Macie encounters text that matches an entry or pattern in an allow list, it excludes that occurrence from its findings, ensuring that only relevant sensitive data is flagged for review.

Creating a custom identifier to detect files that construct SQL statements.

Let's create a custom identifier to detect files that construct SQL statements:

1. Open the AWS Management Console and search for Macie.

2. In the Macie console, navigate to the left-hand pane and select Custom Identifiers.

3. Click on Create to start setting up a new custom identifier.

4. Configure the custom identifier with the necessary details, as shown in the reference or image provided.

5. Once you've entered all the required information, click Submit to complete the setup.

This custom identifier will enable Macie to detect files containing patterns related to SQL statement construction.



Creating an Amazon Macie job:

Create an Amazon Macie job to scan S3 buckets for objects containing SQL-like syntax that matches the regex pattern ?fakeparam=. 

This job will utilize the previously defined custom data identifier to detect and flag relevant objects uploaded in s3. 

Follow below Steps to create a job in Amazon Macie to scan S3 buckets for detecting objects containing SQL-like syntax using a custom data identifier:

1. Access the Macie Console:
Go to the Amazon Macie console. In the left-hand pane, click Get Started.

2. Create a Job:
Under the Analyze Buckets section, click Create Job.

3. Select Buckets to Scan:
Choose the Select specific buckets option and select the S3 buckets you want to scan for SQL syntax. Click Next.

4. Review Buckets:
On Review Buckets page, verify your selected buckets and proceed by clicking Next.

5. Refine the Scope:
○ Choose One-time job.
○ Expand Additional Settings, and under Object Criteria, specify the file types you want to scan (e.g., .txt).
○ Click Include and then Next.




6. Manage Identifiers:
○ Select Custom for managed identifiers.
○ Click Don't use any managed data identifiers.

7. Add your created Custom Identifier:
○ On the Custom Identifier page, select the custom data identifier you created earlier (e.g., SQL-Identifier).
○ Click Next

8. Skip Allow Lists:
○ On the Allow List page, click Next.

9. Name the Job:
○ Provide a name for the job (e.g., SQL-Identifier-Info). Click Next.

10. Review and Submit:
○ Review the job’s settings and verify that they’re correct. To help ensure accurate results for audits or investigations, you can’t change these settings after you submit the job.
○ Click Submit.

11. View Findings:

Once the scan completes, Amazon Macie will display any findings in the Findings section. If objects containing the specified SQL-like syntax (?fakeparam=) are detected, they will be listed (e.g., rounak testing/robots.txt). You can also filter, group, and sort findings based on specific fields and field values to retrieve and review findings. By using the API, you can transfer the data to other applications, services, or systems for advanced analysis, extended storage, or detailed reporting.em for deeper analysis, long-term storage, or reporting.

Enhancing AWS Macie findings with Monitoring and Automation Tools:

● Amazon EventBridge: Route findings to other AWS services or external systems and set rules to trigger automated actions.

● AWS Security Hub: Consolidate Macie findings with other security data to gain a unified view of your security posture.

● AWS Lambda: Automate processing of findings, including alerts, remediation, or enrichment tasks.

● Amazon SNS: Send immediate alerts via email, SMS, or other channels for critical findings.

● AWS CloudWatch: Monitor Macie metrics, set alarms, and track trends through dashboards.

● AWS Step Functions: Automate complex workflows for multi-step responses.

● S3 Event Notifications: Trigger actions, such as Lambda functions, when Macie results are stored.

● AWS Config: Enforce compliance by evaluating findings against custom rules.

● Third-party SIEMs: Integrate findings into your existing SIEM for centralized analysis.

● Custom Dashboards: Use tools like Amazon QuickSight to visualize findings and create detailed reports.

Conclusion

To identify objects stored in S3 containing specific sensitive data, you can create and use custom identifiers tailored to your requirements. These identifiers enable you to scan and detect sensitive information within objects effectively. By leveraging custom identifiers, Amazon Macie empowers users to align data discovery with organizational needs or compliance requirements, ensuring adherence to data privacy regulations.

Written by Rounak Naik ( Cloud Engineer @ Cloud.in)

No comments:

Post a Comment

Amazon Macie: Identifying Sensitive Information in S3 Objects

Amazon Macie: An Overview Amazon Macie is an AWS service designed to help detect sensitive information, such as Personally Identifiable Info...