Learn about the technology, culture, and processes we use to build Hootsuite.

Everyone — our software developers, co-op students, high school students — 'works out loud' about the tools we use, the experiments we ran, and lessons we learned. We hope our stories and lessons help you.

Recent Posts:

Henry Ford once said, “Coming together is a beginning, staying together is progress, and working together is success.” It is based upon this philosophy of embracing collaboration from start-to-finish that my team, as well as numerous others at Hootsuite, have adapted an additional role in our Agile methodology.

To provide some background for those unfamiliar with Agile, an Epic is a large unit of work which, when completed, will provide a lot of value to the customer. Epics are then broken down into stories and tickets/tasks which developers will commit to and complete.

Every developer is encouraged to work on whichever task is highest in priority allowing work to be fluid and ensure each developer is well-rounded. However, each sprint, there can be numerous Epics being worked on, as well as numerous more being planned in the backlog, which often makes it difficult for product owners to maintain an accurate idea of the current progress of each based on small, fragmented updates from each developer at scrum. Further, the process of conceiving a new feature often gets muddled as it is passed around between design and growth and management before finally arriving at the engineers. The solution to all these problems and more? The Epic Champion. Read More …

How did this idea get started?

When you are running more than 1500 servers in AWS and there was no consistent standard for creating servers, it is really hard for the Operations Developers to get an insightful view about the system inventory on each machine. How should I know when an instance needs to be patched? What if there is a package with unpatched vulnerabilities installed on several servers? As a result, the Ops team wanted a solution to monitor and gather inventory information on our servers.

Why use AWS Simple System Manager (SSM)

Why not use Puppet or other tools? Although some configuration management tools like Puppet and Chef already gather inventory information on their clients, they just don’t fit with Hootsuite’s Ansible based ecosystem. Setting up an additional Configuration Management tool and only using it for a small use case like this is just overkill. So what could be a good option that is efficient and requires little configuration?

How about run bash scripts as a cron job to collect required information (packages, CVEs, OS version, etc) in the system? First of all, the bash script for gathering CVE takes at least 20 minutes to run and 90% of the packages on most instances are the same, so it is not ideal to have all instances gathering duplicated information. Secondly, what if the script needs to be changed in the future? Is there a better way than re-deploying to every machine? Eventually, I came across Simple System Manager, a service that AWS recently launched to help users automate management tasks with no additional charges.

How to achieve all of these?

Prerequisites of SSM: An agent needs to be installed on every instance and an IAM role needs to be attached to the instance, so it is granted to access the console in AWS.

AWS SSM has a cool feature called “Send Command” which allows users to run bash scripts on target machines without establishing an SSH connection to them and the same command can be sent to as many machines as the user wants.  Documents define the actions that SSM will perform on the instance and they can be associated with EC2 instances as scheduled tasks. The bash script for gathering packages and system info will be embedded into SSM documents as parameters and then associated with all instances in the SSM console. The diagram below is a visual representation of the idea.

What do we need to collect?

Besides the basic system info such as installed packages, OS version/name, and uptime, the CVE (Common Vulnerability and Exposures) for all packages also need to be collected from each instance, as the CVEs are crucial for the Security team to determine potential vulnerabilities of Hootsuite servers.

Implementation

Workflow Diagram:

Uploading Data to Dynamodb:

After collecting all required information, the bash script will generate a JSON file that contains all the data and upload the file to Dynamodb using AWS CLI. The JSON object is strictly formatted to match the requirements of Dynamodb which look like:

1
2
3
4
5
6
{"Key":{"Data_Type":"Value"},
"Attribute1":{"Data_Type":"Value"}
"Attribute2":{"Data_Type":"Value"}
"Attribute3":{"Data_Type":"Value"}
"Attribute4":{"Data_Type":"Value"}
"Attribute5":{"Data_Type":"Value"}}

The object will also contain a timestamp attribute “TTL (time to live)” for auto-expiration of terminated instance in the DB. This attribute is important as the bash script will run every 5 days to update the information. If the “TTL” attribute is not updated on the 6th day, it likely means that the instance is terminated or stopped, so the database will remove the item to save space

Sample JSON Object:

1
2
3
4
5
6
7
{"instance_id":{"S":"instance_id"},
"runstatus": {"S": "True"},
"ttl": {"N": "1493081195"},
"os": {"M":{"name": {"S":"Ubuntu"}, "version": {"S":"14.04.5"}}},
"uptimebydays": {"S":" 84 "},
"pkg": {"M":{
"accountsservice":{"M":{ "pkgversion":{"S": "0.6.35-0ubuntu7.3"}, "status": {"S":"latest"}}}}}

Create Association:

AWS Config Rules are used to monitor configuration changes in SSM. When an instance is created, it will be automatically added to the SSM console in AWS, and the creation event will be captured by AWS Config Rules to trigger a Lambda function called “ssm_association”. The event is passed into the Lambda in JSON format, and the Lambda can easily retrieve the instance id and event type to determine if the association needs to be created. Then Lambda functions use Boto3 (Python AWS SDK) to create the association.

Instance Creation Event:

1
 

{‘configRuleId’: ‘config-rule-g3xyel’, ‘version’: ‘1.0’, ‘configRuleName’: ‘createssmassoiciation’, ‘configRuleArn’: ‘arn:aws:config:us-east-1:1111111111:config-rule/config-rule-g3xyel’, ‘invokingEvent’: ‘{“configurationItemSummary”: {“changeType”:”CREATE”,”configurationItemVersion”:”1.2″,”configurationItemCaptureTime”:”2017- 1111111111″,”configurationStateId”:12345687654,”awsAccountId”:”1111111111″,”configurationItemS tatus”:”OK”,”resourceType”:”AWS::SSM::ManagedInstanceInventory”,”resourceId”:”i- 06ad1615134baaa2a”,”resourceName”:null,”ARN”:”arn:aws:ssm:us-east-1:1111111111:managed- instance-inventory/i-xxxxxxxx”,”awsRegion”:”us-east-1″,”availabilityZone”:null,”configurationStateMd5Hash”:”6b1a5634c1f60482767fc239e4422ea4″,”res ourceCreationTime”:null},”s3DeliverySummary”:null,”notificationCreationTime”:”2017-04- 10T20:05:02.891Z”,”recordVersion”:”1.0″}’,’eventLeftScope’: False, ‘ruleParameters’: ‘{“type”:”ssm_testing”}’, ‘executionRoleArn’: ‘arn:aws:iam::1111111111:role/AWSConfig’, u’accountId’: ‘11111111’}

Lambda Function:

1
2
#The following code is the simplified Lambda function
#it takes the instance id from the creation event, checks if the instance is running, and creates the association when the event type is "CREATE"

invokingEvent = event[‘invokingEvent’] instanceid = configurationItem[‘resourceId’] configurationItemDiff = invokingEvent[‘configurationItemDiff’] changeType = configurationItemDiff[‘changeType’]

make sure the instance is running,

client = boto3.client(‘ec2’) response = client.describeinstancestatus( InstanceIds=[instanceid] ) state = response[‘InstanceStatuses’][0][‘InstanceState’][‘Name’] print(response) if state ==”running” and changeType == “CREATE”:

print(“Executing createassociation”) response = client.createassociation( Name=’uploadpkginfo’, DocumentVersion=’$LATEST’, Targets=[ { ‘Key’: ‘InstanceIds’, ‘Values’: [ instanceid ] }, ], ScheduleExpression=’cron(0 0 0/12 1/1 * ? *)’ )

Gathering CVE:

The security team not only wants to collect unpatched CVEs for all installed packages but also those that already patched. In fact, gathering CVEs has become the biggest bottleneck of the process as I could not find any available databases or API where I can query all CVEs using package name and version number. The only known method is to use

1
apt-get changelog PKG_NAME

It might take a few seconds to download each changelog which results in extremely long running time.

To solve this problem, another Lambda function is introduced to create a list of instances with the packages installed on them. Then the Lambda function will call SSM to invoke “Send Command” to run the bash script on each instance, so that this task will require minimum time and resources.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#Example of sending bash script to one of the targe instance
#target_instance is a list of target instances generated by the Lambda function.
# its key's value, list of packages, will be changed into a bash array and combined with the bash script.
# The bash script cannot be parameterized in this case because the list of instances and package might change every time we run the code.
str = " ".join(target_instance[key])
#basharray = ‘pkgs=( python3, apport, sensu,..... )’
basharray = "pkgs=( " + str + " )"
get_cve = '''
for i in "${pkgs[@]}"
do
CVE=`sudo apt-get changelog $i | grep -o "CVE-.*" | cut -c1-13 | sort -u | paste -s -d, -`
if [ -z "$CVE" ]; then
echo "No CVE"
else
aws dynamodb update-item --table-name pkg_CVE \
--key '{"pkg_name":{"S":"'$i'"}}' \
--update-expression "SET cve_list=:y" \
--expression-attribute-values '{":y":{"S":"'$CVE'"}}' \
--return-values ALL_NEW \
--region us-east-1
fi
Done'''
tmp = [basharray] + get_cve
response = client.send_command(
InstanceIds=[
key
],
DocumentName='AWS-RunShellScript',
TimeoutSeconds=1800,
Comment='get CVE from ' + key,
Parameters={'commands': tmp}
)

Limitation/Improvement:

  • Currently there is no API designed for this project which means people will have to pull data directly from the Database.
  • The API layer can give more flexibility in terms of designing the data structure. The API defines the data structure for the user, instead of strictly formatting it in the DB.
  • SSM provides little feedback when creating association between instances and documents.
  • Link Lambda functions with SNS topics to gather error message and help troubleshoot the system
Conclusion:

This project is a PoC for gathering system inventory using SSM and it can be optimized in many aspects. This is also a good test case of what SSM is capable of doing and we can clearly see the advantages such as parametrizing the bash script. I feel SSM is a tool that has some good potential and it can be leveraged to a higher level than just using it as a patching tool.

About the Author

Andy Han is an Operations Developer Co-op at Hootsuite. Andy studies Management Engineering at the University of Waterloo. Contact him on Linkedin.

 

Guy Drut. From hurdler49

Guy and the Gold Medal

The French Olympic hurdler, Guy Drut, found himself in an unenviable position in the early summer of 1976. He was France’s only hope for a track-and-field medal, and the burden of carrying the nation’s pride on his shoulders was getting to him. Drut later told me that he had spoken on several occasions prior to the games with our long-time client Jean-Claude Killy and that he really felt he owed a part of his gold medal to Killy. He explained it as follows: “Jean-Claude told me that I was the only one who knew how to get my body and mind to their ultimate peak for the Olympic Games. He then told me that after I had done this that I should keep saying to myself, ‘I have done everything I can to get ready for this race and if I win, everything will be great, but if I don’t win my friends will still be my friends, my enemies will still be my enemies, and the world will still be the same.’ I repeated this sentence to myself before the qualifying heats and during the break between the semi-finals and finals. I kept saying the sentence over and over, and it blocked out everything else. I was still repeating it to myself when I went up to get my gold medal.

From the Fear of Failure passage in What They Don’t Teach You at Harvard Business School: Notes from a Street-smart Executive by Mark H. McCormack. Underlining is mine.

This isn’t the fear you’re looking for

Few of us have been in the starting blocks at the Olympics but for many of us a similar level of anxiety can be brought on at the thought of presenting a technical demo in front of our fellow engineers – even our friends and colleagues.

By repeating those words, Drut downplayed the consequences of failure and detached his anxiety from his situation. Every time I go up on stage, I say those same words, for the exact same reason.

Working out loud is a good thing

Every Wednesday morning for the last four years our entire technical staff gets together for Demos and an All Hands. Engineers sign up to give 5-minute demonstrations of new product functionality or internal tooling and then take questions from the audience. After the Demos we move on to announcements and awards. Over the last four years we’ve done upwards of 440. I’ve watched almost all of them.

For everyone who attends this session, it celebrates people and accomplishments; it drives alignment around mission, strategy and priorities; and finally, it provides a forum to ask and answer questions. (Thanks for that excellent article Gokul).

Each presenter needs to make the most of this opportunity because talking about your work is as important as the work itself. The challenge is to get so good at presenting a technical demo that others feel compelled to celebrate your work, change their outlook, and share your story. That means making it succinct, informative, and relevant.

The leap from paper to the stage is huge – the way our ideas sound in our head is not at all how they sound out loud. Here are five ways to elevate a mediocre technical demo to a great one. Read More …

Do you build microservices in Golang? If so, today is your lucky day as we have just open sourced our Go Health Checks Framework which implements our standard Health Checks API.

What is it?

The Go Health Checks Framework is a declarative, extendable health checking framework written in Go that provides a simple way to register dependencies as status endpoints and integrate them into an existing microservice that uses either the standard net/http package or the Gin Framework.

Monitor From the Inside

The Health Checks API helps you monitor your service health from the inside by exposing a set of standardized endpoints at “/status/…” that can be monitored using any monitoring framework.

Monitor your microservice from the inside.
Monitor your microservice from the inside.

We have found that the best way to monitor the health of a microservice is from the inside. This is because it is the single source of real truth for its health. If you’re not monitoring from the inside, then you are inferring the health of the microservice and this comes with its own problems. Not convinced? Watch this talk by Kelsey Hightower titled Stop reverse engineering applications and start monitoring from the inside.

Getting Started

Using the Go Health Checks Framework in your Golang microservice is easy:

  1. Define a StatusEndpoint for each dependency in your microservice.
  2. Configure framework options and register the Health Checks framework to respond to all /status/… requests passing a slice of all your StatusEndpoints.

That’s it! As long as you have defined your StatusEndpoints correctly, the framework will take care of the rest.

Not Just Monitoring

The Health Checks API enables more than just microservice monitoring and gives you the power to explore, debug, and document your ever changing architecture. Below is a demo video of a tool we use that displays a dashboard for each microservice in a distributed application and lets developers/ops navigate the microservice graph in real time. This dashboard not only shows information about the services in your graph, but also displays the current status of each microservice in the graph and its dependencies. The open sourcing of this tool is coming soon!

Want to learn more?

Watch my full talk from DevOpsDaysYVR where I go over the Health Checks API and demo a tool we use to explore microservice graphs in real time.

Links

  • Health Checks API – A cross language standard for checking health in a distributed application
  • Go Health Checks Framework – A Golang implementation of the Health Checks API used for microservice exploration, documentation and monitoring.
About the Author
Adam Arsenault is a senior specialist in full stack and mobile development. He leads the mobile platform Hootsuite. Get in touch via Twitter @adam_arsenault

Loading ...