Automatically Tagging Uploads to S3

Published on 14 July 2019

There is a newer post with a Terraform version here. Feel free to use this original one too though!

Why?

You might want to tag uploads for a variety of reasons, which could be related to tracking billing, departmental usage, or using as metadata for any other system or process that interfaces with S3.

Unlike using PowerShell with Lamba, you don't need to install any prerequities on your system before doing this, but of course you might want to play around with your code locally first, in which case an installation of Python is going to be useful. VS Code or PyCharm are good editors. You will also want to install Boto 3 (pip install boto3). Python is also a hugely popular language, and it has history with AWS. I read the AWS CLI are written using Python and Boto 3. Not sure if that is true, but it has a longer history with Lambda than PowerShell does.

And you don't need to install anything for using using Python with Lamba, you can just use the AWS online editor. As cool as it is to do use PowerShell with Lambda (and it is cooler, come on!), the initial setup is a little more complex.

So here, we are going to create a simple Lambda function that applies a tag (I am just doing to use the current date) to a file that has been uploaded to S3.

Permissions

Before you run your lamba function, you need to ensure that it has permissions to do what you want it to do. So you need to create an IAM policy to access S3, with the rights to Allow: s3:GetBucketLocation, Allow: s3:ListAllMyBuckets, and also on the resources, s3:PutObjectTagging. This will allow the Lambda function to do the only what you want it to do. Remember to allow the rights for Cloudwatch, so it can log what it does.

Next, create a new role for the Lambda function and give it permissions to use the AddS3TagsPython (or whatever you chose to call your policy) policy that we just created. Now we can create the function.

Create the Function

  1. Open your browser and logon to the AWS Console

  2. Go to Lambda

  3. Press "Create Function"

  4. Leave it as "create from scratch"

  5. Enter a description name for the function

  6. Choose "Python 3.7" as the runtime.

  7. Press "Create function".

The next thing it to create the trigger.

Create the Trigger

This determines when the code is going to run. In this case we know we want to run when a file is upload to an S3 bucket.

  1. Choose "S3" under the Trigger Configuration

Trigger

  1. Select your bucket

  2. Under Event Type, choose "All Object Create Events"

  3. Enter something for the prefix/suffix if required. So you could only tag files with a specific extension of example

  4. Press "Add."

The trigger is now complete, we have told AWS when we want the Lambda function to run.

Create the Code

Finally! Well, it wasn't that bad to be honest.

  1. Click on your function, and that will open the "Function Code" section.

Function

If you look at the "Code Entry Type" box, you can see that you can edit code inline, or upload a file. In this case, use the online editor.

  1. Have a read of the populated code, and delete it.

  2. Replace it with the code below:

         import json
         import urllib.parse
         import boto3
         import datetime
    
         print('Loading function')
    
         s3 = boto3.client('s3')
         now = datetime.datetime.now()
         tagValue = now.date()
         tagName = "MyTag"
    
    
         def lambda_handler(event, context):
             print("Recevied event: " + json.dumps(event, indent=2))
    
             #Get the object from the event
             bucket = event['Records'][0]['s3']['bucket']['name']
             key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    
             try:
                 response = s3.put_object_tagging(
                     Bucket = bucket,
                     Key = key,
                     Tagging={
                         'TagSet': [
                             {
                                 'Key': tagName,
                                 'Value': str(tagValue)
                             },
                         ]
                     }
                 )
             except Exception as e:
                 print(e)
                 print('Error applying tag {} to {}.'.format(tagName, key))
                 raise e
    
  3. Press "Save"

What the code does, is import the Python libraries we need. We then create "s3" as an interface to the S3 via Boto 3. It is then create a tag name, and a value. The value is just the current date, retrieved using the datetime library.

There is then the function that is executed by Lamda. This getting the object that has been uploaded (this is part of the information in "event"). And then we "try" and apply the tag.

Like any other Lambda function, anything we "print" goes to the Cloudwatch logs. This is incredibly useful when trying to figure out what information you can work with (like the contents of "event" for instance).

Troubleshooting

Check your Cloudwatch logs.

  1. Permissions - You will see an Access Denied error in the logs. Just change your IAM permissions appropriately.

  2. Syntax errors - You will see thes in the logs too. So in this error, you can see that the it was expecting a string, and wasn't getting one.

Cloudwatch

Summary

Python is probably the lingua franca of Lambda in my limited experience. Depending on your experience, you might find other langages easier, PowerShell for instance, but there are just so many examples done for Python. Additionally, the ability to do it from the browser is great. With the online editor, and Cloudwatch logs, for simple functions, you can get a lot done.

One last thing, if you want to know what other things you can do with Boto3 and S3, you can run the following:

    s3 = boto3.client('s3')
    print(dir(client))

That will give you a list of all your different properties, methods etc.

Update

I have created a Terraform script to deploy the lambda function for this, which is here.

comments powered by Disqus