Cloudformation updates, what's happened since you started using terraform

CloudFormation, the AWS specific infrastructure as code service, I don’t use it anymore….but maybe I should. Maybe you should too.

Common Problems.

A couple of years ago, if you were automating your infrastructure, you were probably using CloudFormation. AWS looked after the state of your application, handled rollbacks, validated your template, it was the best. You could build multiple AWS services into 1 JSON template, EC2, ELB, Cloudwatch, Route53 all came together to launch an application. Sounds great, but it actually was really difficult to manage.

Hardcoding VPC ID’s, subnets, Hosted Zone ID’s, it was all a bit….non cloud, if that’s such a thing. Many tools were built to take on this challenge, CFNDSL and Troposphere were two that BBC used while I worked there. CFNDSL was so heavily used in our team, a colleague, Steve Jack, became the maintainer of the project. We then extended this with another tool for internal use due to the fact we had multiple AWS accounts with multiple environments. We used a YAML file that was environment specific which could share VPC, Route53, ASG specific variables across multiple stacks, this would have a directory such as :

|____int
| |____Newsbeat.yaml
|____live
| |____Newsbeat.yaml
|____test
| |____Newsbeat.yaml

with our stacks, written in CFNDSL Ruby syntax, looking like this.

|____cloudfrontdns
| |____aws
| | |____route_53
| | | |____record_set.rb
|____dns
| |____aws
| | |____route_53
| | | |____record_set.rb
|____main
| |____aws
| | |____auto_scaling
| | | |____group.rb
| | | |____launch_config.rb
| | | |____scaling_policy.rb
| | |____cloud_watch
| | | |____alarm.rb
| | |____cloudfront
| | | |____distribution.rb
| | |____ec2
| | | |____security_group.rb
| | |____elastic_load_balancing
| | | |____load_balancer.rb
| | |____iam
| | | |____instance_profile.rb
| | | |____policy.rb
| | |____route_53
| | | |____record_set.rb
| |____template.rb

As you can see, we were doing a lot to manage CloudFormation, Ruby DSL with 2 sets of tooling in order to create some JSON. What we needed was a tool that would manage cross stack resources, template out stacks that are identical(except for parameters) and make it easy to write infrastructure, JSON is not as great for the task.

Turns out Hashicorp figured the same and wrote Terraform.

Terraform

Terraform is a common syntax for multiple “providers”, a provider could be AWS, Azure etc. This means that the basic concepts, servers, load balancers, databases, take vendor specific parameters while Terraform manages the glue that binds it all together. It also means multi cloud setups are using the same language to describe infrastructure. Terraform is now maturing, but a couple of years ago, it was lacking a lot of the AWS resources needed to make it production viable, but it’s come a long way.

Infrastructure is a living thing

When I started using Terraform, CloudFormation began to look quite primitive. This was especially clear with the concept of infrastructure as an organism that Terraform embraces, not a collection of CloudFormation stacks that have hardcoded parameters.

Terraform also allowed for changes to be displayed not just for a single stack, but for changes that impact all your infrastructure. If you went ahead and tried to delete a VPC that was created in a CloudFormation stack, AWS will happily go head and try to do that for you. The problem is that 20 other stacks rely on that VPC, so your operation is going to fail. But you may not know that up front, so you waste time and potentially break infrastructure.

With Terraform, you can see how a delete VPC event will impact the whole infrastructure, this feature was one of the catalysts for many to transition to Terraform from CloudFormation.

Is that JSON?

This is a stack I created to setup a service, as you can see, lots of weirdness going on here. This is because we are generating user data, an EC2 feature that lets scripts and commands be run on boot up of an instance. In this case, a parameter is being used in conjunction with an RDS instance that is also being setup in the same CloudFormation stack. There is some syntax to get the service port which is being joined with that parameter, as well as a new line, also loads of spaces. This sucks.

{
  "Fn::Join": [
    "",
    [
      "        - CATTLE_DB_CATTLE_MYSQL_PORT=",
      {
        "Fn::GetAtt": [
          "ComponentRDS",
          "Endpoint.Port"
        ]
      },
      "\n"
    ]
  ]
}

Terraform has the ability to use template files that render user data files using variables that are injected into the template and then out comes a block of user data for use with an instance. This means at runtime, I can get any part of my infrastructure and use it to build user data. Further, I can also track changes, meaning if my RDS endpoint changes, my user data for another server will be updated and by extension, the server will be updated.

Terraform is not a silver bullet

I could write an entire post on why I have a love/hate relationship with Terraform, it’s coming soon, but suffice to say, it has issues. State isn’t handled by a service like AWS, so you have to deal with that yourself, S3 for example. Your state is an actual file, but you can’t check it into version control if you create AWS credentials as the credentials will be listed in the state file. It lacks some CloudFormation features, like rolling updates. One of the great things about CloudFormation is the ability to update an AMI in a stack and have it roll out that AMI in a controlled way. In order to do this exactly the same way in Terraform, you need to learn how to do….CloudFormation. There are ways around this, but the best way is to use CloudFormation from within Terraform to manage rolling updates, which brings us to the point where we ask, haven’t AWS realised what we want?

CloudFormation, it’s about time.

I know the CloudFormation team do a great job, the service is good, but it could be better if more effort was made to understand what 2016 infrastructure design is like. In the last few months, AWS have made a big effort to solve most of the hacky solutions people have been using to resolve dependencies in CloudFormation, but also to probably stop people moving to Terraform.

Firstly, you can use YAML. As someone that only writes stacks in YAML, this is a good move. In doing this, there are some new features in the AWS CloudFormation syntax.

Mappings:
  RegionMap:
    us-east-1:
      32: "ami-6411e20d"
      64: "ami-7a11e213"
Resources:
  myEC2Instance:
    Type: "AWS::EC2::Instance"
    Properties:
      ImageId: !FindInMap [ RegionMap, !Ref "AWS::Region", 32 ]
      InstanceType: m1.small

This new !FindInMap function is nice because you no longer need to use multiple lines, though if you didn’t know, the bang(!) is a part of the syntax, rather than a logical not operation.

Cross Stack Referencing!!

Yes, my stacks can know about each other. This is one of the best things released this year by AWS.

You can get started by using the outputs functionality. In the example below, you can also see the new !Sub syntax, great for interpolation.

Outputs:
  VPCId:
    Description: VPC ID
    Value:
      Ref: VPC
    Export:
      Name:
        !Sub '${AWS::StackName}-VPCID'
  PublicSubnet:
    Description: The subnet ID to use for public web servers
    Value:
      Ref: PublicSubnet
    Export:
      Name:
        !Sub '{AWS::StackName}-SubnetID'
  WebServerSecurityGroup:
    Description: The security group ID to use for public web servers
    Value:
      !GetAtt
        - WebServerSecurityGroup
        - GroupId
    Export:
      Name:
        !Sub '${AWS::StackName}-SecurityGroupID'

Watch as your YAML linter goes crazy with these syntax tags. In any case, we can see the stack name being used to create a “name” for the variable to be used in another stack. Interestingly, you can’t see the name values in the AWS console.

Let’s assume this stack is the first stack you created and it’s called CoreStack. Your second stack will look something like this

AWSTemplateFormatVersion: '2010-09-09'
Resources:
  AppServerSecurityGroup:
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupDescription: Enable HTTP ingress
      VpcId: !ImportValue CoreStack-VPCID
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '80'
          ToPort: '80'
          SourceSecurityGroupId: !ImportValue CoreStack-SecurityGroupID

As you can see, for the life of this stack, there will be a relationship between the two stacks. The syntax is quite nice too, the shorthand YAML syntax is simple and has the potential to support strings, arrays etc. There is also a more secure feeling here, in Terraform, you can go ahead and create a reference to something anywhere in your infrastructure. With these outputs, you are defining what should be accessible by other stacks. Obviously, if you reference lots of things, you are going to build up a huge list of outputs, but that is more likely on core stacks and not on application/generic stacks.

Also note that you can use dynamic variables in imports, this uses Fn::ImportValue. For example, importing the value of a security group ID from another stack.

{
  "Fn::ImportValue": {
    "Fn::Sub": "${CoreStack}-SecurityGroupID"
  }
}

Looking good, but that mess from earlier?

CloudFormation doesn’t have baggage, it travels light, it wants a complete stack at upload time. I still feel the CLI tool could implement a file templating feature, but the CloudFormation team have a solution that does make it easier to substitute variables, sort of.

{
  "Fn::Join": [
    "\n",
    [
      {
        "Fn::Sub": [
          "        - CATTLE_DB_CATTLE_MYSQL_PORT=${RDSEndpoint}",
          {
            "RDSEndpoint": {
              "Fn::GetAtt": [
                "ComponentRDS",
                "Endpoint.Port"
              ]
            }
          }
        ]
      }
    ]
  ]
}

So we actually have more code than previously, but it is clear what is going on, a variable of RDSEndpoint is going to be added to the string preceding it. The YAML layout is actually better, but I still feel this is an enhancement enough for shorter substitution.

Here is what a simple reference sub looks like, much nicer.

{
  "Fn::Sub": "/opt/aws/bin/CloudFormation-init -v --stack ${AWS::StackName}"
}

OK, it’s never going to be as good as Terraform, but if you are staying away from cloud init because of the syntax, you should try this new style.

CloudFormation Plan…I mean Change sets

CloudFormation has seen the power of Terraform plan, the ability to view your changes ahead of time and created change sets. This is a much much much, much * 1000 feature.

Change sets work by indicating what you want to change and AWS will actually store that change set for you to apply. What that means is that you can create a change set, have AWS store it for review, then determine if it is good to execute. This is a great workflow choice and is great for Pull Requests.

In case you are wondering what happens if you delete a stack that is being used by another stack, the answer is nothing. You will get an error message of “Export CoreStack-VPCID cannot be deleted as it is in use by OtherStack”.

If you’re using any automation around CloudFormation, use the create changes API, it will probably make you wonder how you did infra before.

So, I’m going back to CloudFormation, right?

No. Not yet.

While I have problems with Terraform, it still wins, for now. CloudFormation still has a long way to go to become the best way to orchestrate AWS, which is quite a thing to say given how short a time Terraform has been around.

I would like to see CloudFormation as a broader tool. As an example, I decide to change 4 CloudFormation stacks at the same time, now I have 4 change sets and I can’t see what the impact is across those 4 stacks from all the changes, only the changes from the stack in the change set. So yes, reviewing is nicer, but it is a one at a time approach.

This is where the ‘infrastructure as a living organism’ comes into the fore. If you need to replace the kidneys in a human, you don’t do 3 surgeries, taking 1 out at a time and then putting 1 back in, you do one surgery. The idea of stacks is great, but as people begin to have dosens of stacks that are dependant on each other, changes are going to become more difficult. What might happen is people don’t use other stacks because it becomes difficult to scope the changes, particularly for IAM and S3 policy files. I’m sure the team at AWS see the potential, but it is a concept Terraform got right from very early on, so perhaps CloudFormation needs to become something new, something designed for the post Lambda, Docker, microservice world.

Consul and service discovery, how it can help

Consul is a tool I’ve been using the last few months to get a handle on our expanding platform. Consul is a tool built by Hashicorp to help with service discovery and configuration management featuring key/value store, health checks and DNS forwarding. The service as a whole is open source, with other tools plugging into the Consul HTTP API. It’s these other tools that have made Consul worthwhile for us, this post will focus on envconsul and consul-template.

Sounds great, why do I need it?

If you are running a single app, you are probably very aware of where it is running, what class of server it is on, how many instances, what the up time is, how to deploy it etc. There comes a point in a lot of companies, big and small, where a single app just isn’t the right fit anymore.

You might go down the route of building micro services or try a service orientated architecture, whatever you decide to do, you want another app to complement your monolith. That is to say, your monolith is staying around, but you want to be able to build things that are not really in the scope of that project anymore. In a situation where you are completely cloud, this isn’t something you can jump right into without some thought. Is this new thing you are building an internal service, is it public facing, do I need to have walls between the new app and the old, or does it need to be highly connected.

These sorts of decisions are essential to deciding on whether you need a service like Consul, as not everyone will.

When you decide you need something like Consul, it is probably because you come to one of the following questions.

  • How do we know when something fails?
  • How do we know when a new server comes up?
  • How do we check multiple parts of my internal apps?
  • How do we update configs from servers coming up and down?
  • How do we store things like database names, S3 bucket names, other variables?
  • How do we update variables when they change, do we need to redeploy the app?

Essentially it boils down to

  • How do we automatically know and control what my apps and infrastructure are doing at any given time

Discover all the things

Service discovery is quite a big topic, with lots of books written on discovery in distributed systems. To not repeat what others have already gone into detail about, here is a post by NGINX about micro services and service discovery which will help in understanding different service discovery approaches.

Some of the take away points are knowing what port, IP address, DC/VPC and whether the service is healthy or not is great for automation with other systems, like NGINX.

Some cloud providers allow you to utilise service discovery without having to run something like Consul. AWS have the Application Load Balancer(ALB), this joins up nicely with an auto scaling group so that when your servers need to scale due to demand, they are automatically added to the load balancer. This technique means you can leave your infrastructure to respond to application demands without human intervention.

If you are manually editing NGINX configs, updating server lists or using fixed IP address, you could benefit from using service discovery via Consul.

OK, but I’m still not sold

It can often be the more practical details that really get people on board with something like Consul, so let’s get to the cool stuff first.

There are 3 tools to talk about, Consul, consul-template and envconsul. Hopefully you know roughly what Consul is and it’s main features, but here is a quick refresher. Consul is a two-part system, server and agent. The Servers run in a cluster of 3 nodes for high availability and are the single point of truth for any node that connects to it. Consul has a K/V store, supports DNS forwarding, service health checks and has a fully featured HTTP API. Consul can host a web UI that is quite nice for viewing the KV store and looking at health checks. The agent is a service that runs every server in your infrastructure and communicates with the other nodes in your cluster. This means that Consul is constantly seeding data out to make Consul API calls fast and up to date.

Consul’s API is core to the other two tools, consul-template and envconsul.

envconsul

envconsul can be used to drive dynamic config changes from the Consul KV store and restart your application to take the values. If you are currently having to rebuild your server or Docker image because you bake in config, you probably know how a single character mistake can cause you to rebuild and redeploy. envconsul is the best solution I have seen to prevent that, by giving you lots of options in how the app restart, what values from Consul are going to be injected into your app and what the behaviour should be if something fails.

envconsul is written in Go and is deployed as a single binary. You wrap your call to your app with envconsul, such as

envconsul -consul 127.0.0.1:8500 -sanitize -upcase -prefix myvalues/ /opt/my_application

What happens here is we specify the consul agent, then use envconsul to force the environment variables to a certain case, upcase, then pull in KV from Consul from a folder, myvalues. The my_application can be anything, so you can print out all of the values from your Consul values using

envconsul -consul 127.0.0.1:8500 -sanitize -upcase -prefix myvalues env

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
SHLVL=1
HOME=/root
no_proxy=*.local, 169.254/16
_=/usr/local/bin/envconsul
TESTAPP_API_KEY=3234234235sosefse
TESTAPP_S3_BUCKET=testing_bucket

Here is a capture from the Consul UI to show the original form of the values.

Consul-UI Link to image

You can obviously do some simple things to prevent failure if Consul wasn’t contactable at some point, such as cache the output into a JSON file. This pattern is used in some Docker systems using a “side car” container, but this could also apply to standard on server apps.

Service Definition

Consul has covered a few of the questions I posed earlier, but this is the tool that really puts Consul at the core of you service discovery. Consul supports the concept of services, which can be anything that you want it to be. A service has a few basics and is described in a JSON file.

{
  "service": {
    "name": "some-service",
    "tags": [
      "golang"
    ],
    "port": 8080,
    "check": {
      "name": "some-api",
      "script": "curl -s localhost:8080/status",
      "interval": "10s"
    }
  }
}

In reality, a service definition can just be a file like this

{
  "service": {
    "name": "some-service",
    "port": 8080
  }
}

Either way, your Consul cluster will know about some-service, what port it’s on and the meta data from the agent. The agent data will include IP address, datacenter etc.

When a Consul agent is started and this file is found in its config.d directory, it will register with the cluster. This will occur for every instance of the service, so you will know which servers your app is running on, what their IP’s are and what port they are running on. Taking a closer look at the first example, there is a health check, this combined with the service information means that if your health check fails, the instance of the app will not be included when queried by consul-template. This is useful when you are reliant on a DB, cache, external service and it goes down for that instance, you don’t want it to be in service anymore. This type of service availability is much more powerful than just a TCP or HTTP check, meaning much more thorough checking before a service is considered healthy.

consul-template

consul-template provides a convenient way to populate values from Consul into the file system using the consul-template daemon. In the same way we saw values injected into an app using envconsul, we can dynamically update files and reload applications such as HAProxy.

consul-template will only write healthy checks of apps, so you can be sure that your services are not going to be full of unhealthy apps. consul-template has a large array of options, with either a CLI or config file driven configuration.

Here is an example of using consul-template with HAProxy

consul-template -consul 127.0.0.1:8500 -template "/etc/haproxy/haproxy.ctmpl:/etc/haproxy/haproxy.cfg:service haproxy restart" -retry 30s

The ctmpl file contains the Go template syntax, so it is quite simple to write complex configs. In my example, the Consul dashboard is being exposed but the Consul health check operates on port 8300, so the web UI wouldn’t be accessible. With a little conditional logic, we can view the dashboard on a different port, 8500 in this case.

The template is also full of ranges, populated by Consul defined services that fill out the server’s, port and name sections. This is where the service definition file data is used, building a full HAProxy config.

{{range services}}
acl is_{{.Name}} hdr(host) -i {{.Name}}.example.com
{{end}}
{{range services}}
use_backend {{.Name}} if is_{{.Name}}
{{end}}

{{range services}}
{{ if .Name | contains "consul" }}
backend {{.Name}}
        mode http
        balance roundrobin
        {{range service .Name}}
        server {{.Name}} {{.Address}}:8500 {{end}}
{{else}}
backend {{.Name}}
	mode http
	balance roundrobin
	{{range service .Name}}
	server {{.Name}} {{.Address}}:{{.Port}} {{end}}
{{end}}{{end}}

frontend stats
	bind *:1936
	default_backend stats

backend stats
	mode http
	stats enable
	stats uri /

This will generate a file at /etc/haproxy/haproxy.cfg that should be thought as read only. If you need to make hard coded entries, edit the template file and restart the consul-template wrapper service.

Looks good, right?

At this point, you will hopefully be thinking, this all sounds great, centralised health checks, config values for injecting, dynamic config generation, but where does it fit in with my company.

Taking a step back for a moment, this post is aimed at those who don’t have a service like this currently and are thinking about it. Or perhaps you are a company that wants to know if it is worth the learning time and want to justify implementing a new system like Consul. As i’ve mentioned previously, there are a few reasons to run Consul, but it comes down to how much you already have in place. Do you have a health check service, central config store, how do you do load balancing and are you wanting to run HAProxy or NGINX instead of a cloud provider?

I like to think about it like this, once your cluster is up, all you need to do is write a 10 like JSON file and your service can be automatically discovered, load balanced, be DNS reachable and have health checks. I recently rolled out a Grafana server, it was based on a AMI bake and once it started I could see it passed health check and could visit the app immediately. I could have easily created an route53 entry on an ELB with ASG and EC2 instance and made it the same way, but this assumes a lot about my infrastructure knowledge.

I recently was setting up a service with another developer and once we had the health check file in place, our goal was simply to get a check light in Consul. Once we did, the app was working and we could hit the apps API. There are always going to be resources needed for deploying on cloud which may be a more infra team task, but once you have them in place, creating a new service can be as easy as creating a CI job, Docker image or baking an AMI.

Lots More

Consul has a lot more to offer than I’ve mentioned, with most people picking and choosing which features to take advantage of. If you feel like building your own service, have a look at the Watches part of the API, it can really help to run custom scripts based on events. An example would be sending a slack message when a service goes critical, example. There are also lots of things that plug into Consul, Hashicorp’s own Vault, but also some cool projects like Fabio and Traefik.

You can read the full documentation of Consul here https://www.consul.io

Lambda Deploys With Apex

AWS Lambda has been around for a couple of years, in that time the way in which you create and deploy functions has been streamlined by several tools, Apex is one of them. Apex may have only hit the scene about 8 months ago, but since then, has become one of the leading tools in Lambda automation, keeping pace with AWS releases and features. Here is how we use Apex and Lambda to create a pipeline of services.

What is Lambda

Amazon say :

AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you.

The word serverless is used to describe your usage, rather than Amazons, as everything still runs on their servers. Another way of describing Lambda is functions as a service, running code based on events or invoke by other AWS service calls. An example would be sending a Cloudwatch message to slack if a Cloudwatch alarm goes into alarm state for more than 5 minutes. Another would be to trigger a Lambda function when a file is uploaded to S3.

Just paste the code into the console, right?

Amazon love to demo pasting code into the AWS console as a way to deploy Lambda functions, but this isn’t really practical for teams who use version control. Git, for example, can aid in determining what code is actually running in your function. Things become especially hard if you upload dependencies and have a zip file with 200mb of NPM dependencies. Then you have to download the zip file to figure out what is in there.

For any kind of consistency, you want to use a central build system, a CI server, that will control the packaging and uploading of your functions and potentially, your deployment. There are many tools to aid in this, Gordon and Apex are two that focus on Lambdas. This is of course the Serverless framework, but this is more suited to full blown web apps, rather than just deploying Lambda functions.

Going Serverless…

Slight pun here, because despite being able to write code in Python, JavaScript(node.js) and Java, I write all my Lambda functions in Go. This is not an officially supported way of running Lambda functions, but it works great none the less.
If you want to run any kind of complex function, you will have to upload dependencies, AWS lets you have up to 250mb(compressed). This will mean that using just the console is out, so using a single static binary works nicely in this situation.

I achieved the running of a Go binary by wrapping the execution in a JavaScript shim. The only downside here is having to have node.js installed, but hopefully AWS can support Go natively and this will solve this issue.

When Apex came along, it solved the problems I was facing, managing the JavaScript shim, packaging, uploading and building the Go binary.

Apex

Apex supports all the languages that Lambda supports, as well as Go, built as a command line interface(CLI) tool. It makes it easy to create Lambdas, but also to manage the infrastructure around them. By default it supports Terraform, a wrapper around the AWS API. Conversely, Gordon and Serverless both use Cloudformation.

Apex creates the Lambda and uploads using the API, this is different to the others in that Cloudformation does everything, allowing for referencing later in other Clouformation stacks. You can create the Lambda functions ahead of time, then deploy to them though, so your workflow should be unaffected by Apex. For my usage, only the Lambda creation, role assumption and upload is controlled by Apex.

Example Project

The project structure for a single Lambda with one environment will be quite simple, a functions directory with a function name, function code and function.json file.

|____project.json
|____functions
| |____firstLambda
| | |____firstLambda.js
| | |____function.json

There is also a project.json file at the root, where the apex CLI is used from. This containers project wide options so that you don’t need to set the same options for each function. It can also be basic, such as

{
  "name": "My First Lambda Project",
  "description": "Service glue together some AWS Services"
}

The function.json will include all the Apex options, runtime, timeout, environment variables, it looks something like this

{
  "name": "FirstLambda",
  "description": "Some cool Lambda function",
  "memory": 128,
  "timeout": 60,
  "environment": {},
  "runtime": "golang",
  "role": "arn:aws:iam::000000000:role/Lambda-function",
  "vpc": {
    "securityGroups": [
      "sg-acb29383"
    ],
    "subnets": [
      "subnet-cgh5f4e4"
    ]
  }
}

While this works for a single Lambda, if you are running multiple environments or even multiple AWS accounts, your project will look like this

|____project.dev.json
|____project.prod.json
|____functions
| |____firstLambda
| | |____firstLambda.js
| | |____function.dev.json
| | |____function.prod.json

Multiple Environments and Environment Variables

If you decide to simply prefix your function with an environment, Dev-FirstLambda for example, that will allow you to test your functions before releasing them to production. Another way to achieve this is to use different AWS accounts. As long as the credentials to use Apex are capable, you can control the creation and uploading of Lambdas into multiple AWS accounts.

Environment variables can be placed into the function.json file, but this may be a too sensitive for say, Github. For that reason, you can mix the usage of the function.json and the Apex CLI environment variable injection. What actually happens under the hood is a yaml file is created with your variables in, this is added to the zip file and accessible as a normal environment variable in your application thanks to the JavaScript shim.

By using Apex, you can invoke your code locally and set environment variables to test, rather than waiting to deploy to AWS to test. This makes for a quick feedback loop locally and doesn’t require any Apex specific code, which prevents too much lock in. There is one exception with regards to the Go functions, they are wrapped in an apex handler, but that’s it.

Deploying with Apex

Teamcity, Jenkins, CircleCI, however you centrally build your software, they will all work with Apex, even if you are using Windows agents. What is really nice is that they all support environment variables, so you can create very specific jobs for you Lambda deployments. If you are running your agents on AWS, setting up your build agents for deployment will be a case of setting the right IAM policy.

Here is an example, though this will give a lot of permissions initially, so restrict once you know how you are going to use Apex. This policy will allow for VPC based Lambdas,

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "Lambda:*"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeSecurityGroups"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeSubnets"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeVpc*"
      ],
      "Resource": "*"
    }
  ]
}

It should be noted that multiple AWS account deploys using Apex with IAM roles is only possible via a fork, PR is here

Variables Outside of CI

If you are using config management via something like etcd or Consul, you will want to bring this via your CI server. I use Consul, so when the CI job runs, a script passes the variables to the Apex CLI after storing them from a Curl request. We need to pass in the region that we want to deploy our function to, this is unless the region is set in the project.json. After our variables, we use the Lambda function name, as well as the environment that we want, in this example, we deploy the FirstLambda function with the dev environment configuration to the eu-west-1 region.

RAVEN_DSN=`curl -s $consul_host/v1/kv/dev/RAVEN_DSN\?raw`
S3_BUCKET=`curl -s $consul_host/v1/kv/dev/S3_BUCKET\?raw`

apex deploy -r eu-west-1 -s RAVEN_DSN=$RAVEN_DSN \
                         -s S3_BUCKET=$S3_BUCKET \
                         FirstLambda -e dev

The Full Picture

By using Apex, we can achieve some really complex Lambda management, which AWS account, which environment, what environment variables to use, where the environment vars are stored, how we management the creation and upload all in one tool. Below is an example of how one of my Lambda functions is set up.

Github push kicks of the CI build, config is retrieved from Consul, then the Go binaries are built by Apex while all running on Teamcity. Apex assumes the correct role to deploy to the correct AWS account uploads the zip file created. In this setup, a cron is setup by Cloudwatch Events that invokes the Lambda function, then events are sent to Cloudwatch logs and Cloudwatch Metrics as well as our exception handling service, Sentry. Cloudwatch logs invokes another Lambda function itself which sends all Cloudwatch log data to Sumologic for analysis.

Apex Link to image

AWS Lambda is a great tool, but there is much more to using it than just writing code in the AWS console. It requires some finesse in order to create a pipeline for build, deployment and testing. Apex defines a way of doing things that makes sense, isn’t overly complicated and doesn’t require lots of dependencies in order to get going while fitting in with your current infrastructure.

You can read more about Apex here

How To Elb With Alb

Amazon recently announced its new load balancer, the Application Load Balancer(ALB). This load balancer can replace all elastic load balancers currently in service as well as offering several new features designed for container based architectures. While normal load balancers offer support for EC2 based architectures, container based architectures often require HAProxy or NGINX to function effectively. This is in part due to how applications have been broken up for containers, often using micro services.

An example might be one application running on port 8080 which has a single route of /foo and a second application running on port 9292 with routes of /bar and /baz.

If using a single monolithic stack, this wouldn’t be an issue as routing is achieved on a single port and managed in one place. In order to achieve this kind of routing for multiple applications, another service is needed that manages the path matching for you, either on the EC2 instance itself, or externally on separate EC2 instances. Running Apache on an EC2 instance that is setup to correctly route is quite easy, but does require updating the whole OS image or editing the config on the instance. If running a maximum of say 3 applications, this wouldn’t be so difficult to maintain, but more than that across a large fleet of instances can be slow to update.

Cool Features

While many other load balancers have had these features for a while, ALB supports HTTP/2 and WebSockets out of the box. Sticky sessions are also possible, even on WebSockets, which is a great feature for those who haven’t taken the WebSocket plunge. The ALB also comes with a few new metrics to help you target application performance. Each target group has its own group of metrics, so tracking down which application is performing best/worst should be easier. Four blocks of HTTP status codes now exist, 2xx, 3xx, 4xx and 5xx. These are available on both the ALB and the target groups.

Consul/etcd

Other solutions include running a reverse proxy with a service discovery service such as Consul/etcd using HAProxy as the load balancer instead of an ELB. This would allow for simple routing to the correct servers or containers without having to setup new load balancers. While this has its advantages, it is more to maintain and can be difficult to set up. For automatically adding new containers, you would need to run consul-template in order to dynamically update the HAProxy config.

A common solution is to use sub domains instead of complex routing setups. This isn’t ideal for applications that have very little responsibility however. There is also URL consistency when building API’s. For example, an API that can show orders, both by ID and the most recent 10 orders. The most recent 10 might be example.com/api/orders/ with by ID being example.com/api/order/:id with each being a different application. This could be because the recent orders endpoint is using data pushed into a cache while by ID is coming from a database.

Another example would be to make a single orders application which handles both the routes above, then another application that handles customers with similar api structure, example.com/api/customer/:id.

What is clear is that an ELB, or a Classic ELB as Amazon is now calling it, can be difficult to utilise with many separate applications. Where you may have a single monolith, which contains everything, that is attached to a single ELB, routing is not so much an issue.

AWS have often advised using ELB’s to form part of service discovery because you can attach them to containers and then have a single endpoint for which to reach your application. This isn’t cheap though, if you run 20 services, you will pay $400 a month and that doesn’t include data charges. Even with ELB’s in place, to correctly route your traffic, you will still need to have a service in front of all of them to route to the right load balancer.

This is where the ALB really makes sense.

The Application Load Balancer

The name given to this new load balancer varies depending on what page of AWS documentation you read. ALB is the most common name used so far, but AWS have changed the name of ELB to the Classic Load Balancer in several places, so does this mean the ALB is the ELB V2? Yes according to CloudFormation, which had support on day 1. Current ELB’s are still ELB’s in CFN, not V1, but ALB’s are ELBV2. So ELB’s are classic load balancers of the V1 variety and ALB’s are load balancers of the V2 variety, hope that clears up all the confusion.

New Terms

Target Groups and Listener Rules are new, they are the key differentiators to a classic ELB. An ALB has listener rules, with listener rules having target groups.

A target group, or target, is a port mapping to a container or server that has a health check settings on. Once setup, the target will look for applications on the port you have selected, then attempt to send traffic once the application is healthy. A target can be spread across lots of instances, or just one. Target groups must have unique ports if you are going to run multiple targets on the same server.

Listener rules are where the path matching occurs. By creating an /api/orders route, this is then tied to a target, which is managing the health of your application. The pattern matching is quite powerful, supporting wildcard expressions, for example /api/orders/*.

Normal load balancer listeners are still there to take in traffic on, for example, port 80 and 443. Security groups then allow traffic to pass from these listeners to your EC2 instances or containers.

Root Route?

If you are hoping to get the root of every application, you are out of luck. Lets for example say you want to run a app on port 3000 and the app is a third party app that requires it to run at the root, e.g :3000/. This will only work for 1 application, after this, routing must match the pattern as specified. If you were hoping to run a grafana server on :3000/grafana for example, you would have to run a reverse proxy on the server to make this work, which defeats the point of using an ALB.

The first listener rule you register will also assume the default route, so ensure you have your applications bound to the correct order and have the correct priority. More details here

CloudFormation on Day 1!!!

AWS shipping CloudFormation support on day is a rare treat. Usually, something like Terraform has support with a few days, or even a few hours sometimes. CloudFormation can often take months to catch up, which makes adoption of new features difficult. If you decided to use the new features with the console or API, you may end up playing the “What are we running in production game?”. Hopefully AWS keeps CloudFormation up to date with ALB updates.

As this is such a great day in AWS land, I have created an example stack here which uses 2 Docker containers with 2 different languages in conjunction with an ALB and ASG. If you take away the containers, you have a regular EC2 setup, which I think is a bit more helpful than making this a pure ECS setup.

A few pieces to focus on, the listener rule

{
  "Type": "AWS::ElasticLoadBalancingV2::ListenerRule",
  "Properties": {
    "Actions": [
      {
        "TargetGroupArn": {
          "Ref": "ELBTargetGroup"
        },
        "Type": "forward"
      }
    ],
    "Conditions": [
      {
        "Field": "path-pattern",
        "Values": [
          "/golang"
        ]
      }
    ],
    "ListenerArn": {
      "Ref": "ELBListen"
    },
    "Priority": 1
  }
}

and how to attach your EC2 instances in your ASG to your ALB

{
  "Type": "AWS::AutoScaling::AutoScalingGroup",
  "Properties": {
    "TargetGroupARNs": [
      {
        "Ref": "ELBTargetGroup"
      },
      {
        "Ref": "ELBTargetGroup2"
      }
    ]
  }
}

ALB All the Time

The question is, should ALB’s be used everywhere? In short, if your tooling has a nice upgrade path, yes. ALB’s offer better metrics and more flexibility down the line, while offering new features like HTTP/2 and WebSockets.

Does ALB take away the need for HAProxy or NGINX? Maybe, but there are going to be use cases where ALB doesn’t work completely and you will need something for central management. A simple example is IP restriction, if you have 5 ALB’s that are behind HAProxy, you will only change your restriction in one place, instead of 5 security group changes.

What Next

Docker recently announced a product called Docker for AWS which can tie in with ELB, but not ALB yet. Hopefully when this is all figured out, you should be able to create a docker service and publish it on a port and have that map to a route on the ALB. This is better than having apps run on random ports on the ELB which is what all the demos have shown so far. Docker Cloud only supports container based load balancing using HAProxy, so that is also out. Kubernetes has good support for ELB too, but again, not ALB, so that is still to come. Unsurprisingly, ECS has support, so if you are using vanilla ECS, you can upgrade immediately. For those using Empire, there is an early preview available at time of writing.

Bottom line is that the ALB is a pretty decent product from AWS, with features that make it easier to run microservices and container architectures.

Read more on the AWS Application Load Balancer

Creating An Alarm Service Using Aws Lambda And Slack

Lambda by Amazon was a service launched last year and was in preview until a few weeks ago. Now it has general availability and arrived with support for SNS. This opens up a lot of options for connecting AWS services to your workflow, especially for when things break.

The Idea

Consume SNS messages from Cloudwatch Alarms into a Lambda function, parsing it and posting the data we want to Slack to notify us when something is going wrong. Alert

Lambda

According to Amazon,

AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information.

If you are already using AWS for things like EC2 and Cloudwatch, you can achieve things that normally would have been handled by another service cheaply and easily using Lambda.

What Can Lambda Do?

Lambda has native support for node.js and Java, but I don’t know either of those and so had to find an alternative. Although it isn’t totally clear, you can essentially shell out to the file system underneath a function, which shows you have other options such as Python. Things look good, but our team are Rubyists, so I thought about packaging Ruby and running a function. Lambda is capped at 30MB, but I still got Ruby running in a Lambda function, with 800kb to spare….

Things like Ruby gems weren’t usable and it seemed clunky to use Ruby in this way, but it made me think that even node.js isn’t the best solution.

Golang is something I’ve been using in my personal projects and it allows for a single binary to be created, meaning you just need to shell out and call the binary in your function. I think this is a nice solution for single process jobs such as Lambda, so that’s what I ended up going with.

SNS & Lambda

Shortly after announcing SNS support for Lambda I attended the AWS Summit in London, I came away with an idea for helping us know when something was failing,

Cloudwatch > SNS > Lambda > Slack

Slack comes with great support for webhooks and advanced formatting, not to mention desktop and mobile apps as well as emailing you when you get a mention. This would give us a cheap, simple way of getting alerts, so I started putting it together.

Messages

Cloudwatch alarms are a good example of event driven compute, you need to know when your server is melting. How you get that information is actually a big deal, with many companies offering fully featured dashboards and incident management when something goes wrong. If you decide you want to run your own service, simply to pick up alarms and send a notification to your phone, you might think Email and SNS. SNS has the ability to send Email as well as Push notifications to apps and services, so why isn’t this enough?

Email isn’t always the best option; I can’t get my work Email on my phone without using the web browser. There are also companies who don’t allow for Email outside of the office. There is a question of availability too; there are situations where our work email servers are turned off to stop fishing attacks, so relying on one information source is a mistake.

Third Party Services

If there is one service that knows this, it’s PagerDuty, offering app push messages, phone calls, text and Email alerts. Recently, my team launched a new website, http://www.bbc.co.uk/newsbeat on AWS, so we are taking turns to go on-call. Coming from a company where teams going on-call is relatively new, we still have a lot of people and procedures in place to make it a lot easier for the dev team. When something goes wrong, our operations team try to fix the issue based on our runbook, if they can’t resolve the issue, we get called.

With this in mind, services like PagerDuty were deemed unnecessary by the higher ups, so we’re back to Emails and in need of something better.

Sonitus

The code for the Lambda function is open source and available on Github. The code is split into a JavaScript file that calls the Go binary and the Go file, which posts to Slack. There is also a debug folder with example JSON for those who want to try it out before integrating.

Build the binary as per the readme; zip the binary and the index.js file together, and upload it as a new Lambda function. Once you have set up your Cloudwatch alarms, create an SNS topic and create a new subscription to point to your new Lambda function, that’s it.

What you get in Slack is something that looks like this :

Link to image

Slack

You can customise the message based on your alarm structure, but the default layout is pretty simple, the alarm state, the alarm name, the description, a link to that alarm in the AWS console and the time of the alarm going off or being resolved. You will also see a colour based on the alarm state.

The only drawback I found was when alarms go to insufficient data, i.e. your application might be doing fine, but you have nothing coming in. Cloudwatch will send a message indicating it has insufficient data and the alarm goes into this state. If you have an alarm on something that maybe only metrics an event every other minute, you will get a lot of output in Slack. For that reason, I have ignored insufficient data messages from Cloudwatch, you will only ever see messages that are in an OK or Alarm state.

Many More Possibilities

This use case it quite small, Lambda functions can do so much more, replacing EC2 in some cases. What I like about Lambda is that it’s event driven, only being called into service when needed, such as when an Alarm goes off. This type of setup would have previously cost a lot more and would have probably involved a lot more setup, proving just how time saving Lambda can be.

The next step would be to tie into other services like goroost, which offer web push notifications, or other services that instant messaging, phone calls etc.

It will be interesting to see how the Lambda service improves over time, from my team though, it is something we will use all day, every day, but only when we need it.