Reach Us

CloudifyOps Mini-blog series: Automating Snapshot and AMI Cleanup with AWS Lambda

In today’s fast-paced cloud computing environments, managing resources like Amazon Elastic Block Store (EBS) snapshots and Amazon Machine Images (AMIs) is crucial to optimize costs, improve security, and maintain overall system health. In this blog post, we will explore how to leverage AWS Lambda, CloudWatch Events, and smart tagging strategies to automate the process of identifying and cleaning up unused snapshots and AMIs, leading to better resource utilization and efficient cloud infrastructure.

Understanding the Problem:

  • The Importance of Resource Cleanup: Over time, cloud environments can accumulate a significant number of unused or outdated snapshots and AMIs. These resources not only consume valuable storage space but also lead to unnecessary costs.
  • Challenges of Manual Management: Manually identifying and deleting unused snapshots and AMIs can be time-consuming, error-prone, and might lead to overlooking important resources.
  • The Role of Automation: Automation empowers organizations to streamline resource management, reduce operational overhead, and ensure compliance with best practices.
  • The Importance of Tagging: Resources proliferate at a staggering pace on the cloud. Keeping tabs on every virtual asset becomes a challenge of its own. Tags play an important role in filtering resources. This brings more visibility to resource utilization and timely clean-up.

Lambda Implementation with Python:

AWS Lambda, a cornerstone of serverless computing, will be our instrument of choice to tackle the problem, and we’ll wield the power of the Python programming language to script our cleanup logic.

1. Environment Setup: Begin by creating a new Lambda function, either via the AWS Management Console or by programmatically using the AWS CLI. Ensure that you specify Python as the runtime environment for the function.

2. Writing the Lambda Function: With the stage set, it’s time to craft the heart of our operation—the Lambda function’s Python code. This code will execute the essential cleanup tasks, identifying unused snapshots and AMIs based on predefined criteria. To interact with AWS services, we’ll harness the power of the AWS SDK for Python, better known as Boto3. The logic encapsulated within this code will initiate the deletion and deregistration processes, efficiently freeing up valuable resources. Below is the python code to determine and delete unwanted snapshots and AMI’s.

import os
import boto3
import botocore
from datetime import datetime, timedelta, timezone

def datetime_handler(x):
    if isinstance(x, datetime):
        return x.isoformat()
    elif isinstance(x, str):
        return x  # Already a string
    raise TypeError("Unknown type")

def lambda_handler(event, context):
    ec2_client = boto3.client('ec2')

    # Get the number of days from environment variable (default to 30)
    retention_days = int(os.environ.get('RETENTION_DAYS', 30))
    
    # Get the current datetime in timezone-naive form
    current_datetime_naive = datetime.now().replace(tzinfo=None)

    print("Getting a list of all snapshots...")
    snapshots = ec2_client.describe_snapshots(OwnerIds=['self'])['Snapshots']

    print("Getting a list of all AMIs...")
    amis = ec2_client.describe_images(Owners=['self'])['Images']

    # Separate snapshots into tagged, untagged, and unused lists
    tagged_snapshots = []
    untagged_snapshots = []
    old_snapshots = []
    unused_snapshots = []
    old_amis = []

    # Separate AMIs into used and unused lists
    used_amis = []
    unused_amis = []

    for snapshot in snapshots:
        snapshot_id = snapshot['SnapshotId']
        tags = snapshot.get('Tags', [])

        # Check if the snapshot is tagged with "Project"
        if any(tag.get('Key') == 'Project' for tag in tags):
            tagged_snapshots.append({
                'SnapshotId': snapshot_id,
                'StartTime': datetime_handler(snapshot['StartTime'])
            })
        else:
            # Check if the snapshot is older than the retention days
            creation_date = snapshot['StartTime']
            if isinstance(creation_date, str):
                creation_date = datetime.strptime(creation_date, "%Y-%m-%dT%H:%M:%S.%fZ").replace(tzinfo=timezone.utc)
            if (current_datetime_naive - creation_date.replace(tzinfo=None)) > timedelta(days=retention_days):
                old_snapshots.append({
                    'SnapshotId': snapshot_id,
                    'StartTime': datetime_handler(snapshot['StartTime'])
                })
                # Remove the old snapshot
                try:
                    ec2_client.delete_snapshot(SnapshotId=snapshot_id)
                    print(f"Deleted old untagged snapshot: {snapshot_id}")
                except botocore.exceptions.ClientError as e:
                    print(f"Failed to delete snapshot {snapshot_id}: {e}")
            else:
                untagged_snapshots.append({
                    'SnapshotId': snapshot_id,
                    'StartTime': datetime_handler(snapshot['StartTime'])
                })
                try:
                    ec2_client.delete_snapshot(SnapshotId=snapshot_id)
                    print(f"Deleted old untagged snapshot: {snapshot_id}")
                except botocore.exceptions.ClientError as e:
                    print(f"Failed to delete snapshot {snapshot_id}: {e}")

    for ami in amis:
        ami_id = ami['ImageId']
        # Check if the AMI is older than the retention days
        creation_date = ami['CreationDate']
        if isinstance(creation_date, str):
            creation_date = datetime.strptime(creation_date, "%Y-%m-%dT%H:%M:%S.%fZ").replace(tzinfo=timezone.utc)
        if (current_datetime_naive - creation_date.replace(tzinfo=None)) > timedelta(days=retention_days):
            old_amis.append({
                'AmiId': ami_id,
                'CreationDate': datetime_handler(creation_date)
            })
            # Remove the old AMI
            try:
                ec2_client.deregister_image(ImageId=ami_id)
                print(f"Deleted old AMI: {ami_id}")
            except botocore.exceptions.ClientError as e:
                print(f"Failed to deregister AMI {ami_id}: {e}")
        else:
            used_amis.append({
                'AmiId': ami_id,
                'CreationDate': datetime_handler(creation_date)
            })
            
    # Separate unused snapshots
    for snapshot in snapshots:
        snapshot_id = snapshot['SnapshotId']
        if not any(mapping.get('Ebs', {}).get('SnapshotId') == snapshot_id for ami in amis for mapping in
                   ami.get('BlockDeviceMappings', [])):
            unused_snapshots.append({
                'SnapshotId': snapshot_id,
                'StartTime': datetime_handler(snapshot['StartTime'])
            })        

    # Separate unused AMIs
    for ami in amis:
        ami_id = ami['ImageId']
        if ami_id not in used_amis:
            unused_amis.append({
                'AmiId': ami_id,
                'CreationDate': ami['CreationDate']
            })

    return {
        'TaggedSnapshots': tagged_snapshots,
        'UntaggedSnapshots': untagged_snapshots,
        'OldSnapshots': old_snapshots,
        'UnusedAMIs': unused_amis,
        'oldAMIs': old_amis,
        'TotalTaggedSnapshots': len(tagged_snapshots),
        'TotalUntaggedSnapshots': len(untagged_snapshots),
        'TotalOldSnapshots': len(old_snapshots)
    }

Here the unused snapshots are termed as the ones that are not associated with any AMI’s. Untagged_snapshots are the ones whose Tag doesn’t have a key as Project. This can be kept as whatever the mandate key should be.

We generate cost reports based on the Tag key Project, as I have done. Old_snapshots are determined based on the number of days since their creation. Here the ‘RETENTION_DAYS’ variable is used to define the days. Any snapshots created before the number of days specified will be deleted. Since we are using a variable, it can be changed at the Environment Variable section of the Lambda function.

used_amis variable is determined by iterating through the block device mappings of a specific AMI and checking if any of the mappings have a SnapshotId that matches the current snapshot_id. If a match is found, it means that the snapshot associated with this AMI is considered “used”.

3. Handling AWS Credentials: Authentication is paramount when dealing with AWS services. Fear not, for Lambda has a solution. Instead of hardcoding credentials, you can securely provide them using environment variables. For a more elegant approach, leverage AWS Identity and Access Management (IAM) roles to grant your Lambda function the precise permissions it requires, without exposing sensitive keys. I have provided Lambda roles with access to EC2, Cloudwatch and EventBridge to perform the required actions.

4. Deploying the Lambda Function: With our function primed and ready, it’s time for deployment. This can be achieved through the AWS Management Console, the command-line prowess of the AWS CLI, or by employing dedicated deployment tools such as AWS SAM. Configure your function’s name, runtime, code location, and other essential parameters during this phase.

5. Configuring CloudWatch Events: Automation without a reliable trigger is like a ship without a rudder. Enter CloudWatch Events, your scheduling maestro. Set up a CloudWatch Event rule to orchestrate the periodic summoning of your Lambda function. Whether it’s a daily, weekly, or custom schedule, CloudWatch Events ensures your cleanup operation is executed at the right time. Provide the scheduler expression based on your requirement. I have configured it to run on the first day of every month.

6. Continuous Improvement: Our journey doesn’t culminate with a single successful implementation. As your organization’s requirements evolve, so will your Lambda function. You might choose to refine your cleanup logic, experiment with different strategies, or fine-tune retention policies. The true beauty of automation lies in its adaptability and evolution.

Tagging is a cornerstone practice in Amazon Web Services (AWS) that involves attaching metadata—key-value pairs—to resources. This seemingly simple concept holds immense significance in optimizing resource management, cost allocation, security, compliance, and overall operational efficiency within cloud environments.

In this blog, we saw how AWS Lambda and CloudWatch Events can be used to automate a resource management process, effectively saving us time and manual effort. We also saw how to apply effective tagging strategies to prevent accidental deletions.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google
Spotify
Consent to display content from - Spotify
Sound Cloud
Consent to display content from - Sound
Contact Us