Making Build Metrics Collection Smarter

Our Build Metrics

Hootsuite extensively uses continuous integration (CI) technology like Jenkins to handle the building and deployment of its codebases. Several different Jenkins servers handle codebases for our software products. CI ensures that updates to our software are integrated smoothly, allowing us to release code at a fast rate: deploys to production happen several times a day.

At the end of June, our build metrics service collected build data for just one of our Jenkins servers – the one responsible for building and deploying updates to the main Hootsuite dashboard. This service scraped data from the Jenkins API and stored it in our metrics database. To get any relevant information about the health of the build pipeline an engineer would need to learn the schema to manually write queries. This is cumbersome so we built an internal web interface that gives a snapshot of the health of the process through data visualization of statistics. Using this, any engineer can easily see the health of the build pipeline and see if something is going wrong. Our Build and Deploy engineers are also able to quickly assess whether or not the build pipeline needs improvement.

A a clear picture of the health of this build and deploy process and whether or not it needs improvement.

The Problem

Our internal build metrics service worked like so: every 15 minutes, a Ruby script scraped data from the Jenkins server and stored it in our metrics database. While this is a reasonably reliable method for obtaining data, it left  a lot of room for improvement:

  1. Cumbersome to scale: our service  was only configured to scrape data for a particular set of jobs for only one Jenkins server, so Engineers would have to reconfigure the script to collect data from more Jenkins jobs and servers.
  2. Freshness: the data was only being collected every 15 minutes, so it was very possible for the data to be outdated.
  3. Effectiveness: our script would run every 15 minutes – no matter what -, even when no jobs are being performed. There were many times when the script would run even without collecting new data. This wastes CPU time and money.
The Solution

During one Sprint planning meeting, it was suggested that I build a Scala application to improve the metrics collection process based on the problems listed above. This service would be one of the first build and deploy software projects written in Scala, so that was a great challenge.

An additional suggestion was to send each build event to our internal Event Bus. The Event Bus is a feature of Hootsuite’s platform that allows applications to publish events as well as consume them. Using the Event Bus would decouple the process of receiving the job information from processing the job information, helping with reliability and scalability. Processing servers may be added seamlessly with minimal configuration, and a processing server failing would not affect the rest of the application.

Whenever a Jenkins job finishes, the Jenkins server should trigger an event that saves the build data to the metrics database. This way, data can be collected in real time, and since the Jenkins server only triggers events when a build finishes, every event triggered will yield new data, culling unnecessary behaviour.

Jenkins plugins are good for this type of work. Plugins are pieces of software that attach themselves to a Jenkins server and expand its functionality. Since Jenkins plugins can be installed on any Jenkins server, creating a Jenkins plugin would also make it simpler to collect metrics from any number of Jenkins servers that we have, solving the problem of scalability.

There’s a catch: as good as Jenkins plugins may seem, only using Jenkins plugins would not completely scale. If the address of the build metrics database changes, for example, I’d need to update or reconfigure every single instance of the Jenkins plugin. To achieve this, we planned for a  second application that accepts pieces of build data from all the Jenkins servers that have this plugin and insert the data into the database as it receives them.

At this point, all of the design details were finished and I began to build it.

Implementation

Jenkins plugins are generally written in Java. There has never been a Jenkins plugin written in Scala up to this point (as far as I know). Since Scala is a JVM language, and Jenkins plugins have been written in Groovy, another non-Java JVM language, I figured that I could easily write a Jenkins plugin in Scala, and that it had not been done so far because nobody really tried.

Put simply, I was wrong. Jenkins plugins use a special packaging format called “hpi”, which is essentially a special type of .jar. After translating an example Jenkins plugin to Scala, with all the conventions left intact, I was able to compile the code into a .hpi. However, when I installed the .hpi on a test Jenkins server, I found that it would not run as intended. Even after trying different modifications to my Maven file, my code, and the .hpi file itself, nothing worked. At that point, I decided to just write the Jenkins plugin in Java instead ¯_(ツ)_/¯

Writing the plugin in Java led to another change of plans. Since the Hootsuite Event Bus library had no classes in Java, the plugin would have to somehow send the build event data to the Event Bus from a Scala application. This actually made the functionality of the Jenkins plugin simpler: all it would have to do is send the build information to the Scala application via TCP. I originally planned the Scala application to be separate from the application that consumes the events from the Event Bus, but I realized that they could actually be part of the same application if I used some concurrency.

Java

Writing the plugin in Java led to another change of plans. Since the Hootsuite Event Bus library does not yet support Java, the plugin would have to somehow send the build event data to the Event Bus from a Scala application. This actually made the functionality of the Jenkins plugin simpler: all it would have to do is send the build information to the Scala application via TCP. I originally planned the Scala application to be separate from the application that consumes the events from the Event Bus, but I realized that they could actually be part of the same application if I used some concurrency.

Event Bus

The first application I made did not make use of the Event Bus, but instead received information directly from Jenkins servers via TCP messages and saved the information to a database. This primarily used Akka actors, which are constructs from the Akka toolkit that perform tasks asynchronously, as well as the Slick library for connecting to and querying the database. I made one actor, the “listener”, listen for TCP messages from the Jenkins servers. Another, “the scraper” would receive messages from the listener and scrape the information for the particular Jenkins job. The scraper would then send the information to the final actor, the “querier”, which would save the data to the database.

Metrics Collection System
Diagram detailing a simplified version of the new Metrics Collection system

The most interesting aspect of the functionality is the branch detection algorithm I built into the info scraper. Hootsuite’s Jenkins servers utilize some third-party Jenkins plugins, two of which are the Git plugin, which allows Jenkins jobs to be triggered immediately after a developer pushes to a Git repository, as well as the Parameterized Trigger plugin, which allows builds to trigger other builds with custom parameters after they complete. The Git plugin puts a field in the Jenkins job’s metadata containing the branch that was last pushed to. This may or may not be passed to the jobs that the job triggers. If one of the ‘child’ jobs does not have a Git branch associated with it, but the ‘parent’ does, an info scraper should be able to detect the parent’s branch and associate it with its child.

In the old build metrics Ruby script, branch detection was done in such a way that only worked for the particular Jenkins server it was scraping – it depended on the particular set of parameters that the jobs were using and passing to each other. It only worked for a small subset of jobs. To make branch detection more generic, I changed the algorithm so that, when scraping the info of a child job, it would recursively scrape info from its parents until it finds a job with branch information, and then associate the branch information with the child.

Code snippet

At the same time that I moved on to integrate the Event Bus code, I also decided to port the application to our service skeleton template. Our service template comes with configurations for Vagrant and makes use of the Akka microkernel. Porting my actors was as simple as copying the actor code to a package in the generated template, initializing them in the initialization code, and adding shutdown hooks. I added the Event Bus functionality such that the listener would send the information to a certain Event Bus topic for new builds, and a special Event Handler construct would consume the events from Event Bus and send it to the scraper.

Demo

At that point, I was ready to demo the project in front of the rest of the Platform team, as part of our team’s weekly demo series. After the demo, I received some feedback saying that it would be better for the app to receive HTTP requests rather than receive raw TCP messages from the Jenkins plugins because it would  make the app follow convention more closely and thus make it easier for other developers to work on (after I left to go to university).

The best way for a Scala application to deal with HTTP may be in the form of a Play app. The Play web framework for Java and Scala provides an effective model-view-controller pattern as well as built-in utilities that assist with features such as making external requests, parsing content, and unit testing aspects of the application, among others. I ported some of the app’s functionality to work within a Play controller, and changed the functionality so it accepted forms from HTTP post requests rather than raw TCP JSON strings. The transition was rather seamless and the Play framework made it easier to write unit tests and configure the app to deploy to production.

Future Direction

Metrics detection is just first thing that the new Scala application could be used for; there are other possible uses. For example, the internal web interface our engineers use to deploy their software to production, “dash4deploy”, is currently a node.js application that could use a bit of similar work and refactoring in terms of scalability and overall cleanliness, and integrating its functionality into a Play app may do it some good.

Conclusion

This was actually my very first Scala application, and through this work I learned so much about the language, functional programming patterns, and developing applications within a complex ecosystem. I’d like to personally thank all of the Platform team for helping me whenever I had questions about developing my application. A special shoutout goes to Jim Riecken, who personally helped me with Play, Scala, and straight up debugging on several occasions.

Emmanuel SalesAbout the Author

Emmanuel Sales is a recent graduate of Eric Hamber Secondary School, and starting a Computer Science degree at UBC this Fall. He loves back-end and network programming. You can find him on Github https://github.com/AbsoluteZero2A03 and Twitter @es_azff