Sample Application - New Films in IMDB Top 250 - Part II

Published on 09 May 2022

In the first part of this, we looked at the services we could use, installed AWS SAM, and creating the first application. This part is about thinking what we need to store, and making some changes to the initial template.yaml to reflect those.

My first objective was to detect changes to the list of top films. To do this, I needed a Lambda function that would go and retrieve the Top 250 filsm from IMDB, but I also decided to store the current IMDB Top 250 in a table, as having the top 250 stored somewhere would allow me to detect when it has changed, when the Lambda function ran again. DynamoDB seemed like the best place for storing this as it is a serverless solution, so I am only paying for what I use.

So I started with a Lambda function, that would read the films from IMDB, and then put write them to a DynamoDB table, whose name would be provided as a variable.

There is one thing you I neeeded to do first, and that is to register with IMDB to get API access. This is free, but I needed to do it so I can get the URL and then plug that into the code below.

    import boto3
    import json
    import os
    import requests

    print('Loading function')
    # client created outside of the handler
    region_name = os.environ['REGION_NAME']
    dynamo = boto3.resource('dynamodb', region_name=region_name)
    table_name = os.environ['TABLE_NAME']
    table = dynamo.Table(table_name)

    def respond(err, res=None):
        return {
            'statusCode': '400' if err else '200',
            'body': err.message if err else json.dumps(res),
            'headers': {
                'Content-Type': 'application/json',
            },
        }

    def lambda_handler(event, context):
        print("Received event: " + json.dumps(event, indent=2))

        url = "https://imdb-api.com/en/API/Top250Movies/k_rrXXXXkx"
        payload = {}
        headers= {}
        response = requests.request("GET", url, headers=headers, data=payload)
        print(response.text.encode('utf8'))
        items_json = json.loads(response.text)['items']

        for i in items_json:
            #print(i['title'], ',' , i['year'], ',' ,i['imDbRating'])
            table.put_item(
                Item={
                    'title': str(i['title']),
                    'year': str(i['year']),
                    'rating': str(i['imDbRating']),
                    'filmID': str(i['id']),
                    'image': str(i['image'])
                }
            )

        scan_result = "table updated"
        return respond(None,res=scan_result)

We then need to save our python in a file, GetTop250.py, and put it into a subfolder called "src". The src folder is references in the template.yaml, which we will get to in a minute.

One issue I faced straight away, was that "requests" isn't a standard module. The workaround for this is to use a layer that contains the module. The easiest way to do this is using Docker, which you should have installed before installing SAM.

So create a file give it the following contents:

    FROM amazonlinux:2.0.20191016.0
    RUN yum install -y python39 && \
        yum install -y python3-pip && \
        yum install -y zip && \
        yum clean all
    RUN python3.9 -m pip install --upgrade pip && \
        python3.9 -m pip install virtualenv

Save it as layer.dockerfile, or similar. We will reference the file in a second. Then create a docker image using that docker file.

    docker build -f "requests.Dockerfile" -t requests:latest .

Then run and connect:

    docker run -it --name requests requests:latest bash

Install your module:

    pip install requests -t ./python

Then you need to extract it to your PC.

    zip -r requests.zip ./python/
    docker cp lambdalayer:requests.zip ./Desktop/

You should now have a ZIP file that contains the module, and we can use this as a layer for AWS Lambda by uplaoding it.

    aws lambda publish-layer-version --layer-name "requests" --description "Lambda Layer for Requests" \
        --zip-file "fileb://requests.zip" --compatible-runtimes "python3.9"

One thing we still don't have though is our DynamoDB, so that is the next step, but thankfully a little easier. We can put that into out template.yaml.

    AWSTemplateFormatVersion: '2010-09-09'
    Transform: 'AWS::Serverless-2016-10-31'
    Description: Top 250 from IMDB.
    Resources:
    GetTop250Function:
        Type: 'AWS::Serverless::Function'
        Properties:
        Handler: GetTop250.lambda_handler
        Runtime: python3.9
        Layers:
            - !Ref RequestsLayer
        CodeUri: src
        Description: Grab the Top 250 from IMDB and load into DynamoDB.
        MemorySize: 128
        Timeout: 60
        Policies:
            - DynamoDBCrudPolicy:
                TableName: !Ref Table
        Environment:
            Variables:
            TABLE_NAME: !Ref Table
            REGION_NAME: !Ref AWS::Region
        AutoPublishAlias: live
        DeploymentPreference:
            #Type: Canary10Percent10Minutes
            Type: AllAtOnce
    Table:
        Type: AWS::Serverless::SimpleTable
        Properties:
        PrimaryKey:
            Name: title
            Type: String
    RequestsLayer:
        Type: AWS::Serverless::LayerVersion
        Properties:
        LayerName: sam-app-dependency
        Description: 'Contains requests'
        ContentUri: dependencies/
        CompatibleArchitectures:
            - x86_64
        CompatibleRuntimes:
            - python3.9
            - python3.8
            - python3.7
            - python3.6
        RetentionPolicy: Delete

    Outputs:
    myTableName:
        Description: 'Name of the DynamoDB Table'
        Value: !Ref Table
        Export:
        Name: !Sub "${AWS::StackName}-Table"

One of the key elements of the file for DynamoDB is here:

    Table:
        Type: AWS::Serverless::SimpleTable
        Properties:
            PrimaryKey:
                Name: title
                Type: String

This is where we tell SAM to create the table with a Primary key called "title", and a type of string. We may of course want to use other columns (and we will), but as this is not a SQL database, we can create them later.

Additionally, we need to set the crud policy for the table, and also set an environment variable.

        Policies:
            - DynamoDBCrudPolicy:
                TableName: !Ref Table
        Environment:
            Variables:
                TABLE_NAME: !Ref Table
                REGION_NAME: !Ref AWS::Region

You can see that we have also added the section for the layer that we have created, and lastly, we are going to output the name of the DynamoDB table from the Cloudformation template. This is because we will want to use the table elsewhere, and not have to bother about thinks like names.

So, now, how do we create it in AWS? We can use SAM.

sam deploy --guided

That will walk you through the commands to create it all in AWS, via Couldformation.

Phew. I know it is a lot, but Part II is done. We have a table to store the Top 250, and we also have a Lambda function, with a layer, that goes out to retrieve the Top 250, and then write it to the table. The next step is to have a function that will check for updates, so basically compare what is on the website, with what is in the table.

comments powered by Disqus