Hacking Instance Automation with AWS Lambda

I have worked at several different software companies as a co-op student, however this was the first time I was able to participate in a company-wide hackathon. I have always wanted to participate but I always seem to get to a company just after it has completed or I leave just before they start.

Last October, Hootsuite ran #SuiteHacks, a company wide hackathon with the theme of Customers and Tools. I joined a team mostly made up of my fellow Ops teammates and a person from our Security team. The idea we decided to hack on was automating instance startup task using AWS Lambdas. The serverless, ephemeral nature of Lambdas match perfectly with automating relatively infrequent and quick tasks. It is also an excuse for us to play around a little bit more with one of Amazon’s newer services.

The Design

The idea behind this hack was a system where an instance can announce a change in state which the system would respond by running tasks for the instance based on that state change. When an instance goes down we can verify and automate cleanup and remediation where needed. When an instance is created, we can verify and automate different on startup tasks. Doing so we would automating tedious tasks that are important to infrastructure health. The diagram below gives a general idea of what the system looks like:

The system starts when the instances send information to a centralized Enrichment SNS topic. In this model, servers would be able trigger this data transfer based on different events in the server’s life cycle. In this project the event used to demo this functionality was server startup. On startup, the server gathers the data that is assigned when the instance was provisioned and push it to the Enrichment topic. The Enricher receives said data and proceeds to gather more data about the instances from AWS based on identifying data sent from the instance. This is done by using AWS’ Describe Instance API. The data would then be formatted and standardize, ensuring a common structure for the plugins to expect. The Enricher job would also be the point in the process where data sent from the instance is verified. Specifically the state that the instance is reporting to be in would need to be verified. This was done by cross referencing the data sent by the instance and the data that AWS provided about the instance. If there was a discrepancy the data would not move forward in the process. Some for of alerting would also be needed to surface these errors. Once verified and structured data is then pushed to a SNS topic for that trigger plugins.

The Plugin jobs are triggered by data being published to the Plugin SNS topic. Due to SNS being a publish/subscription system, each plugin would receive a copy of the data published to the topic. The plugin processes the enriched data to perform specific functions. Specifically plugins that would run based on states the instance reports. For example, startup tasks will only proceed if the data sent has the proper state specified in the data.

The POC

During the hackathon we were able to implement a simple version of this model to create a system that allow servers to trigger multiple tools on startup. In the proof of concept we focused on some common tasks such as DNS, Reverse DNS and Tag Verification. We also created a simple plugin to calculate the monthly costs of these resources and placing that information in Slack.

The Learnings

The hackathon started with the team deciding how to send data to the system. One option was to utilize AWS Config as the trigger for the Enricher Lambda. AWS Config was a service that provides AWS configuration management and history. A lot of the tasks that we want to automate are tied to configuration changes for different instances so it made sense to investigate the option. At first glance AWS Config integration with AWS Lambdas would make  it a good choice, however the delay between the change happening and when the change is register in Config was too long. In the worse case it was several minutes before the change was register and in turn trigger automation. For some tasks, we wanted to trigger the process as soon as possible, specifically DNS and Reverse DNS.  Due to this we decided to trigger events through a simple script located located on the instance. The script gathers information about the instance and then sends it to the Enricher SNS and following automation system.

The Future

Since it was for a hackathon we kept it simple but this system can easily be expanded to a multitude of applications. An expansion to the DNS plugin idea was one that would ensure that any DNS or Reverse DNS were removed on instance termination. Other applications that we considered during the hackathon was automatically registering and deregistering instances from monitoring systems or health checks on startup. Another advantage of this system was it being language and workflow agnostic. AWS Lambdas allow you to create jobs in multiple languages and all of them have the ability to ingest data from SNS topics. Furthermore the Lambdas themselves do not need to follow a specific workflow to achieve its task. They jobs can be design to best meet the needs of the task. The architecture is plug and play, allowing for a lot of flexibility. Since this system is designed using AWS Lambdas it scales to meet the needs of your growing system. With this flexibility, there are a number of tasks around scaling up and down infrastructure that could be automated in this simple and scalable manner.

About the Author

Stuart was a co-op at Hootsuite in the Fall of 2016. He’s currently studying Management Engineering at The University of Waterloo, and graduates in 2017.