Using SSM with VPC Endpoints

Published on 16 June 2023

Overview

In this post, we will look at how you can use VPC Endpoints to connect to SSM.

The reason you want to do this, is that it keeps your traffic off the Internet.

By default, if you make an API call to SSM from your laptop (lets say you have the AWS CLI installed), your laptop will make a DNS request for a public record, get the IP address of the SSM service, and send the HTTPS traffic across the Internet to Amazon's network where it will hit the service. Similarly, if you have an EC2 instance, and it makes a request to SSM, it will resolve a public DNS name to an IP address, and HTTPS traffic will head out through your NAT gateway, on to the Internet, and back into AWS's network where it will talk with SSM.

For your laptop, that might be perfectly fine, but for your EC2 instance, if you can keep the traffic off the Internet (even though it is HTTPS), it is perferable. It is going to be more secure (you don't even need an Internet Gateway), without adding too much complexity. This can also be essential for compliance.

We can do with with a VPC Endpoint.

What is a VPC Endpoint

Creating an Interface Endpoint will place a virtual interface in your VPC, and a corresponding DNS record in your VPCs DNS zone. This means that any requests for the service should resolve to the virtual adapter in your VPC, and then be sent through the AWS network, rather than over the Internet, to the service that you require.

There are also gateway endpoints, typically use for S3 and DynamoDB, but those work slightly differently, as they add a route to your routetables.

For the interface endpoints though, you would typically want to create one endpoint per AZ to allow you to use the service if the AZ is unavailable. So if you have three AZs, you should have three endpoints for the service. When you create your endpoint, one of the things you need to tell it, is which subnets to be associated with. Even if you have many private subnets in an AZ, you only need to associate the endpoint with one of them.

Using SSM Endpoints

So, to test this, you first want to have a VPC, with some subnets, and also an EC2 instance that you will use to test the connectivity to SSM. You will know this works when you instance is registered in SSM, and at that point, hopefully you can also connect using SSM Session Manager.

So firstly, to create a VPC. Easiest way is using Terraform, and we can just use their VPC module:

    terraform {
        required_version = ">= 0.12.0"
    }

    provider "aws" {
        #version = ">= 2.28.1"
        region = var.region
    }

        data "aws_availability_zones" "available" {
    }

    module "vpc" {
        source = "terraform-aws-modules/vpc/aws"

        name                   = "greg-endpoint-client-vpc"
        cidr                   = "10.0.0.0/16"
        azs                    = data.aws_availability_zones.available.names
        private_subnets        = ["10.0.0.0/19", "10.0.32.0/19"]
        public_subnets         = ["10.0.64.0/19", "10.0.96.0/19"]
        enable_nat_gateway     = false
        single_nat_gateway     = false
        one_nat_gateway_per_az = false

        enable_dns_hostnames = true
        enable_dns_support   = true


        tags = {
            Environment = "Endpoint-client"
        }
    }

That will create a VPC, some public and prviate subnets. We then need our instance, and a security group:

    resource "aws_security_group" "ec2_instance_sg" {
        name_prefix = "ec2_instance_sg"
        vpc_id      = module.vpc.vpc_id

        egress {
            from_port   = 0
            to_port     = 0
            protocol    = "-1"
            cidr_blocks = ["0.0.0.0/0"]
        }
        ingress {
            from_port   = 0
            to_port     = 0
            protocol    = "-1"
            cidr_blocks = ["10.0.0.0/8"]
        }
    }


    module "ec2_client_instance" {
        source = "terraform-aws-modules/ec2-instance/aws"

        name = "greg-single-instance"

        instance_type = "t3.micro"
        iam_instance_profile   = aws_iam_instance_profile.ConnectSSM.name
        monitoring             = false
        vpc_security_group_ids = [aws_security_group.ec2_instance_sg.id]
        subnet_id              = module.vpc.private_subnets[0]

        metadata_options = {
            "http_endpoint" : "enabled",
            "http_put_response_hop_limit" : 1,
            "http_tokens" : "optional"
        }

        tags = {
            Terraform   = "true"
            Environment = "dev"
        }
    }

Note that we are trying to attach an iam_instance_policy, so we should create that too.

    resource "aws_iam_instance_profile" "ConnectSSM" {
        name = "ConnectSSM-instance-profile"
        role = aws_iam_role.ConnectSSM-Role.name
    }


    resource "aws_iam_role" "ConnectSSM-Role" {
        name = "ConnectSSM-Role"
        assume_role_policy = jsonencode({
            Version = "2012-10-17"
            Statement = [
                {
                    Action = "sts:AssumeRole"
                    Effect = "Allow"
                    Principal = {
                    Service = "ec2.amazonaws.com"
                    }
                }
            ]
        })
    }

    resource "aws_iam_role_policy_attachment" "SSMRole_attach" {
        role       = aws_iam_role.ConnectSSM-Role.name
        policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
    }

Once we have created that, the last thing we need to do is create the endpoints, and a security group for each one if those.

    resource "aws_vpc_endpoint" "ec2messages" {
        service_name      = "com.amazonaws.us-east-1.ec2messages"
        vpc_id            = module.vpc.vpc_id
        vpc_endpoint_type = "Interface"
        security_group_ids = [aws_security_group.ec2messages.id]

        subnet_ids          = module.vpc.private_subnets
        private_dns_enabled = true

        tags = {
            Name = "greg-ec2messages"
        }
    }

    resource "aws_security_group" "ec2messages" {
        name_prefix = "ec2messages_endpoint"
        vpc_id      = module.vpc2.vpc_id

        egress {
            from_port   = 0
            to_port     = 0
            protocol    = "-1"
            cidr_blocks = ["0.0.0.0/0"]
        }
        ingress {
            from_port   = 443
            to_port     = 443
            protocol    = "tcp"
            cidr_blocks = ["10.0.0.0/8"]
        }
    }

    resource "aws_vpc_endpoint" "ssmmessages" {
        service_name      = "com.amazonaws.us-east-1.ssmmessages"
        vpc_id            = module.vpc.vpc_id
        vpc_endpoint_type = "Interface"
        security_group_ids = [aws_security_group.ssmmessages.id]

        subnet_ids          = module.vpc.private_subnets
        private_dns_enabled = true

        tags = {
            Name = "greg-ssmmessages"
        }
    }

    resource "aws_security_group" "ssmmessages" {
        name_prefix = "ssmmessages_endpoint"
        vpc_id      = module.vpc2.vpc_id

        egress {
            from_port   = 0
            to_port     = 0
            protocol    = "-1"
            cidr_blocks = ["0.0.0.0/0"]
        }
        ingress {
            from_port   = 443
            to_port     = 443
            protocol    = "tcp"
            cidr_blocks = ["10.0.0.0/8"]
        }
    }


    resource "aws_vpc_endpoint" "ssm" {
        service_name      = "com.amazonaws.us-east-1.ssm"
        vpc_id            = module.vpc.vpc_id
        vpc_endpoint_type = "Interface"
        security_group_ids = [aws_security_group.ssm.id]

        subnet_ids          = module.vpc.private_subnets
        private_dns_enabled = true

        tags = {
            Name = "greg-ssm"
        }
    }

    resource "aws_security_group" "ssm" {
        name_prefix = "ssm_endpoint"
        vpc_id      = module.vpc2.vpc_id

        egress {
            from_port   = 0
            to_port     = 0
            protocol    = "-1"
            cidr_blocks = ["0.0.0.0/0"]
        }
        ingress {
            from_port   = 443
            to_port     = 443
            protocol    = "tcp"
            cidr_blocks = ["10.0.0.0/8"]
        }
    }

The terraform above will create three endpoints for us. As I am using us-east-1, they will be created there.

  • com.amazonaws.us-east-1.ssm
  • com.amazonaws.us-east-1.ssmmessages
  • com.amazonaws.us-east-1.ec2messages

Because we gave them the private subnets created in our VPC module, those will be passed in and used, as will the id of our VPC. The endpoints require a Security Group, that allows 443 traffic in. This was also created. We could just create one security group and use it for all of the endpoints, but that might not always be viable.

Once we put all of that in our TF file and deploy it, we should have created:

  1. A VPC, with public and private subnets in two AZs
  2. Our instance (which comes with the SSM agent already installed)
  3. A security group that allows traffic out, and traffic in from within the our CIDR
  4. Our IAM profile that is applied to the instance. This profile will have the AmazonSSMManagedInstanceCore policy attached.
  5. Our SSM endpoints.

So, if we then go to the console, we can then click on our instance. There you can see that it is running, that it has an IAM role assigned, that it is in a private subnet, and at the bottom, we have the option to run the Reachability Analyzer.

ssm-2

If we click to run the analyzer, we can do a test to ensure that the instance can connect to our endpoints. Choose "instances" as the source type, then select your instance. Choose VPC Endpoints as the destination, and then select your SSM endpoint as the destination. Then choose "create". You will then be sent to the AWS Network Manager page, and you can see the analysis is pending. If you wait a minute or so and refresh the window, you should see that it is reachable. Success!

ssm-3

So at that point, you know you have the required subnets, IAM Roles, VPC endpoints, security groups, and an instance. What should have happened, is that the SSM Agent on the instance has successfully reached SSM over the endpoint, and registered, and you can then connect to it.

To do that, go back to EC2 select the instance, and choose "Connect". Then go to the Session Manager tab. This should then work (woohoo).

Troubleshooting

If it doesn't, you might get something like this:

ssm-4

That isn't good.

If we go to the SSM documentation we can see that it suggests using the AmazonSSMManagedInstanceCore policy. If you have verified connectivity, as described above, IAM is possibly the most likely cause.

AmazonSSMManagedInstanceCore is a newish policy which replaces AmazonEC2RoleforSSM. It is worth creating a new instance profile from scratch, attaching to the instance, and rebooting it. Then give it five minutes and try connecting again (and refesh your browser window to be sure).

If the connectivity test, you can check your Endpoint configurations and ensure they are connected to a subnet in each AZ.

Costs

One thing to consider is costs. You pay for endpoints, but it is not a huge amount individually. An endpoint will cost approximately USD$7.30/month. However, if you have three AZs, for SSM alone, you need nine endpoints. Add in multiple endpoints, multiple accounts, multiple services, and of course traffic sent over the endpoints, and it can start to add up. Once you get to, say 10 accounts, with 3 AZs, each with 8 endpoints, you are looking at over $1700/month, plus traffic costs.

Can I just put all my endpoints in one account/VPC?

Yes you can! You can create a Shared Services account (if you don't have one already), and place your endpoints in that central account. If you are following best practice, you should be considering an account per workload (for isolation) so a Shared Services account will make more and more sense, for many reasons.

If you put your endpoints in your Shared Account, you will need fewer, but you will to do some additonal work. First, you will need connectivity between the subnets in your Shared Services account that host the endpoints, and also the subnets in the accounts that will access them. This can be done using VPC Peering, or a Transit Gateway attachment. The method you select will probably depend on the number of accounts you have, and the complexity you want to manage.

Once you have routable connectivity, you then need to modify your endpoints. The TF code to create the endpoints above automatically registers with Route 53 DNS. When you want to share your interface endpoints between accounts, you will need to disable that, and add records for them in each specific account that will access the endpoints.

To do that, you will first need a second VPC.

    module "vpc2" {
        source = "terraform-aws-modules/vpc/aws"
        #version = "3.18.1"

        name                   = "greg-sharedservices-vpc"
        cidr                   = "10.1.0.0/16"
        azs                    = ["us-east-1a", "us-east-1b"]
        private_subnets        = ["10.1.0.0/19", "10.1.32.0/19"]
        public_subnets         = ["10.1.64.0/19", "10.1.96.0/19"]
        enable_nat_gateway     = false
        single_nat_gateway     = false
        one_nat_gateway_per_az = false

        enable_dns_hostnames = true
        enable_dns_support   = true


        tags = {
            Environment = "shareservices-vpc"
        }
    }

We then need to create a Transit Gateway (generally easier to manage than VPC peering) and attach it to both of our VPCs.

    module "tgw" {
        source  = "terraform-aws-modules/transit-gateway/aws"
        version = "~> 2.0"

        name        = "greg-tgw"
        description = "My TGW shared with several other AWS accounts"

        amazon_side_asn = 65432
        share_tgw       = false

        vpc_attachments = {
            vpc1 = {
            vpc_id       = module.vpc.vpc_id
            subnet_ids   = module.vpc.private_subnets
            dns_support  = true
            ipv6_support = false


            tgw_routes = [
                {
                destination_cidr_block = "10.0.0.0/16"
                },
                {
                blackhole              = true
                destination_cidr_block = "10.2.0.0/16"
                }
            ]
            }

            vpc2 = {
            vpc_id       = module.vpc2.vpc_id
            subnet_ids   = module.vpc2.private_subnets
            dns_support  = true
            ipv6_support = false


            tgw_routes = [
                {
                destination_cidr_block = "10.1.0.0/16"
                },
                {
                blackhole              = true
                destination_cidr_block = "10.3.0.0/16"
                }
            ]
            }
        }
    }

You then need to add routes from your new VPC. This will ensure that when traffic is destined for anything ourside your CIDR ranges, it will go to the Transit Gateway:

    resource "aws_route" "ssm-vpc2" {
        route_table_id = module.vpc2.private_route_table_ids[0]
        destination_cidr_block = "0.0.0.0/0"
        transit_gateway_id = module.tgw.ec2_transit_gateway_id
    }


    resource "aws_route" "ssm-vpc2-2" {
        route_table_id = module.vpc2.private_route_table_ids[1]
        destination_cidr_block = "0.0.0.0/0"
        transit_gateway_id = module.tgw.ec2_transit_gateway_id
    }

And add them in our original VPC:

    resource "aws_route" "ssm" {
        route_table_id = module.vpc.private_route_table_ids[0]
        destination_cidr_block = "0.0.0.0/0"
        transit_gateway_id = module.tgw.ec2_transit_gateway_id
    }

    resource "aws_route" "ssm2" {
        route_table_id = module.vpc.private_route_table_ids[1]
        destination_cidr_block = "0.0.0.0/0"
        transit_gateway_id = module.tgw.ec2_transit_gateway_id
    }

We also need to change the endpoints, which we are going to place in the new VPC. Delete the previous endpoints and replace the endpoint definition with the following (this is for ec2messages):

    resource "aws_vpc_endpoint" "ec2messages" {
        service_name      = "com.amazonaws.us-east-1.ec2messages"
        vpc_id            = module.vpc2.vpc_id
        vpc_endpoint_type = "Interface"
        security_group_ids = [aws_security_group.ec2messages.id]

        subnet_ids          = module.vpc2.private_subnets
        private_dns_enabled = false

        tags = {
            Name = "greg-ec2messages"
        }
    }

    resource "aws_security_group" "ec2messages" {
        name_prefix = "ec2messages_endpoint"
        vpc_id      = module.vpc2.vpc_id

        egress {
            from_port   = 0
            to_port     = 0
            protocol    = "-1"
            cidr_blocks = ["0.0.0.0/0"]
        }
        ingress {
            from_port   = 443
            to_port     = 443
            protocol    = "tcp"
            cidr_blocks = ["10.0.0.0/8"]
        }
    }

    resource "aws_route53_zone" "ec2messages" {
        name = "ec2messages.us-east-1.amazonaws.com"

        vpc {
            vpc_id = module.vpc.vpc_id
        }
        vpc {
            vpc_id = module.vpc2.vpc_id
        }
        }

        resource "aws_route53_record" "ec2messages" {
        #depends_on = [aws_vpc_endpoint.ssm, aws_route53_zone.ssm]
        zone_id    = aws_route53_zone.ec2messages.zone_id
        name       = "ec2messages.us-east-1.amazonaws.com"
        type       = "A"

        alias {
            name                   = aws_vpc_endpoint.ec2messages.dns_entry[0].dns_name
            zone_id                = aws_vpc_endpoint.ec2messages.dns_entry[0].hosted_zone_id
            evaluate_target_health = false
        }
    }

There are a few things to note when having your endpoints in a shared VPC. DNS is disabled on these endpoints, and for each VPC endpoint, we are creating a new route 53 private zone (called the same as the service endpoint), and a single record, a naked alias A record.

You can repeat the above for the other two endpoints (ssm, and ssmmessages).

Then you should be able to connect via SSM to your instance (although you may need to reboot it after the endpoints have been created!), because it will route the traffic out of the VPC, to the transit gateway, and to the endpoints in your other VPC. You could place the other VPC in a different account as well.

You can download the TF for this here.

comments powered by Disqus