Continuous integration and delivery for mobile

Continuous integration, for software, is the practice of merging code changes to a shared repository many times a day. For many, it includes automated building and testing. Continuous delivery, is continuous integration along with additional processes allowing the software to be released to customers after each change or update. Mobile development follows the same pattern, with added complications around releasing the software through 3rd party ecosystems.

For us at Hootsuite, continuous delivery for mobile means:

  1. Releasing to developers after every integration.
  2. Releasing internally across the business at least once per day.
  3. Releasing to external beta testers more often than releasing to the general public.
  4. Releasing to the public as soon as we decide it is good enough.

Yesteryear

Two years ago, Hootsuite had a simple mobile build pipeline. It would constantly poll our source code control system and subsequently kick off a build after every commit to our develop branch. First, it would run a set of smoke tests to validate that the build was ready for more extensive testing. Then overnight, it would execute a larger set of regression tests. Our automation engineers would monitor the results and send back any changes that caused failures.

We had continuous integration working fairly well, but we hadn’t yet automated the delivery of the native apps to customers, either internally or externally. Our Web product had a more complete pipeline, with continuous delivery included, and our mobile apps needed to catch up.

To easily review the summary for each build, we had dashboards on TVs in our area, like the one shown above. Although it was easy to see the high level summary, we didn’t have easily accessible detailed reports. An engineer would have to dig through our Jenkins server filesystem to piece together what went wrong and why. We had integration testing, but unit tests were not being run as part of the automated build process. As they weren’t being regularly executed, our unit tests often stayed in a broken state and overall code coverage stagnated.

Whenever it came time to submit a release to Apple or Google, you would hear someone on the team cry out “Who is going to make a build for the App store?” Someone would grumble, someone would flee, and whoever drew the short straw would sit down to make the build.

Often the one building the release wouldn’t have been the person who made the last submission, so typically they would trip over expired certificates, missing provisioning profiles, and mismatched or missing dependencies. An hour later, often after having to poll the team for advice, a build would be in. Heaven forbid the build is rejected or found to contain a critical issue in a staged rollout – as the whole process would begin again.

Getting deeper into our initial setup; it involved a couple local Jenkins servers which would poll the single git repo every few minutes. If a new commit was detected, Jenkins would run our Calabash tests and update our dashboard to red or green based on the results. Each server would create its own build from source (with no guarantee that the builds were exactly the same) and then test against a unique set of physical devices. Results were stored across the servers and there was no communication or aggregation happening across the different instances. If a failure happened, you had to first identify which server to look at, then dig through the Jenkins admin interface and file system to troubleshoot the problem.

From the start, we selected Jenkins over some of the other alternatives due to its combination of being open source, having a large community, being proven as scalable, and offering the ability to self host for security reasons.

A few years ago, tools like Buddybuild didn’t exist, with most companies focused on continuous integration and delivery for Web products. Tools like Fastlane were brand new and only supported iOS. As a small team, we needed something to support both of our native app teams, and we needed the processes to be as similar as possible to reduce overhead on training and maintenance.

For anyone other than the developers of a particular platform, there was a lot of friction around seeing progress made on a given day – there were no daily builds that were easily installable. If you wanted to see the current state you would need to pull the latest source from git, fire up Xcode or Android Studio, and build to your tethered device. In the case of Apple, you might even need to generate an updated provisioning profile. It wasn’t easy to share updates with non mobile developers, non technical teammates, or the CEO.

Our next generation pipeline 🖖

One goal we set for the first version of our updated pipeline was the ability to scale up, allowing us to add new slave machines easily. Additional slaves allow us to scale our connected physical test devices if we want to increase coverage in the future. Together with a backup script running as a scheduled job, we created an Ansible project which helps us rebuild or fix the whole pipeline easily and in a repeatable way. Ansible is popular open source software for IT automation which allowed us to script our pipeline setup and configuration. High visibility is another essential characteristic of a robust pipeline, we wanted developers to have instant notification about test results. Reliability was another focal point, we wanted all pull requests to pass a series of unit tests before merging into our develop branch. Through implementing these status checks, we aimed to lower the failure rate and make our developer’s jobs more efficient. We knew that one small problem in a branch could block the entire pipeline, but by doing so it protects our customers. Lastly, the  pipeline should automate the previously manual internal distribution of test builds. It should also facilitate app releases to the Apple Appstore or Google Play store.

Tools and integrations

Scaling with Jenkins

Our old pipeline, if one could really call it that, was not built with scalability in mind.. When we wanted to speed things up by parallelizing tests with another machine, we had to install Jenkins, exactly as it was on our other Jenkins instances, and divvy the UI tests up to spread the load evenly. Results were spread across multiple machines and as each machine built the app independently for testing, if our apps’ repositories saw two or more commits in quick succession, we could very well have been running our tests on different versions of the app – not an ideal means of accurately monitoring test failures and trends. In order to solve this and quickly scale up our new pipeline in the future, we decided to use a Jenkins Master-Slave architecture.

We moved the Jenkins master from one of the Mac minis in our office to an AWS EC2 instance. To improve security, only the master machine had access to our company GitHub host. All automation slaves  connected  to the master as slaves. A swarm was the best choice as it enabled slaves to auto-discover our nearby Jenkins master and join it automatically. It made life easier when we automated rebuilding and patching  the pipeline with Ansible.

Here are the steps to setup and use a swarm plugin:

  1. Install the swarm plugin from the update center on the Master
  2. Download swarm CLI agent from here to each slave
  3. Run: [ java -jar swarm-client-jar-with-dependencies.jar] with parameters (labels, ip, hostname …etc)
  4. Swarm will now reconnect the slave machine even after a reboot.

Reliability with GitHub

We use two GitHub status checks to help the health of the pipeline.  First, the pipeline will detect that there is a new pull request and then trigger the unit testing on that branch. If any of the tests have failed, it will send a “Failure” status to the GitHub API which stops that specific pull request from being merged into our master branch. The second status is a master branch status check which prevents all pull requests from being merged when the master branch itself is broken.

GitHub API documentation URL: https://developer.github.com/v3/repos/statuses/

 

Pull Request Status Check

Distribution using Fastlane

Fastlane is a set of tools that provide “the easiest way to automate building and releasing of your iOS and Android apps”. Fastlane also makes it easy to extend the built-in tools with proprietary plugins specific to your project. We use Fastlane extensively inside our pipeline – it has become as important as Jenkins as a basic building block.

 

Badge, a Fastlane application, modifies the app icon to visually distinguish between different builds. The app badge is modified to have a build number, version number, and a channel name. This reduces confusion for internal and beta testers who might be flipping between daily, beta, and production versions of our app.

Left to right: Alpha build, Release build  and Debug build:

 

Visibility using Slack

One way to improve developer productivity is to use Slack to notify a developer immediately following each round of testing. After the completion of unit testing, the result will be sent to a Slack channel. If a test failed, the branch creator or owner will be mentioned in the message. Any failed unit, integration, or UI test or build either during or after a merged pull request will trigger the QA developers and application developers to be mentioned in the slack channel. The Jenkins Slack plugin is the most simple and efficient way to communicate from the pipeline to the development team. One limitation we found, however, was that the plugin was unable to mention the specific slack user. To solve this, we used curl to post the message through the slack API directly instead of using the plugin.

 

Slack mention and notification for unit test result

The process of building it

On a personal level, it has been a singularly fun and rewarding project to work on. As mobile QA developers, we typically work on embedded subteams, testing and automating features for a specific platform. For development of our pipeline, however, we worked together daily, drawing on each other’s platform specific knowledge, and collaborated with our mobile application developers, Web QA automation developers, and Web operations teams. We became more familiar with each other’s platforms and are now comfortable adding to and customizing our pipeline as processes change and new ideas percolate. It has and continues to be a great source of learning, creativity, and collaboration.

Lessons learned

On a more tangible level, we learned some pretty cool stuff along the way. For example, due to networking constraints, our GitHub Enterprise service wasn’t able to communicate through webhooks with our Jenkins server, so we used the nifty jenkins-build-per-branch repo. With some modification, we were able to poll our repositories and generate Jenkins jobs to test feature branches whenever a new pull request was made. Using the GitHub and Jenkins APIs, we maintain a list in Jenkins of all the open pull requests across all our frameworks in our GitHub Enterprise organization as well as our open source frameworks on GitHub. We also use this to automatically generate Jenkins jobs to test a release branch as soon as it’s created. These generated jobs further test any new commits to these feature/release branches and update pull request statuses.

As our pipeline grew in features, its importance in our development process grew as well. When it is fully operational, it’s great! It helps maintain the stability of our app as well as automate menial tasks. On the other hand, should something go wrong with the pipeline itself, it can slow things down and become a barrier to developers. Stability has become increasingly important, and while we continue to add new features, we must also be able to restore our pipeline from scratch or to a previous version in source control with access only to assets under our control. We keep a backup of all third party dependencies, such as the current version of our Jenkins plugins on an AWS S3 server. We can restore our Jenkins master, configure a new master or slave, or add an existing slave to our pipeline with Ansible scripts. In this way, we can take risks when developing our pipeline and still be confident that it can be quickly and easily restored.

Once we had the foundation laid down, our iOS and Android pipelines diverged due to inherent differences in the way we build and test our apps. For iOS, Apple limits us to one active instruments process. It’s possible to have multiple VMs on a machine and run one instruments process per OS; however, rather than take the performance hit, we opted to parallelize with more machines. Fortunately, Android can easily run multiple instances to parallelize Calabash tests, so we were able to shift some resources. In both Android and iOS pipelines, our UI tests are run on Jenkins slaves. Our master Jenkins server is an AWS EC2 instance that runs Linux, so for our iOS pipeline, all UI and unit tests have to run on slaves that have OSX and Xcode installed. With Android, on the other hand, we run Espresso tests and Calabash tests on different machines. Espresso grabs any available devices and sets a device keyboard which interferes with our Calabash tests. The solution is that Espresso tests are run on the master Jenkins machine and Calabash tests on the slave to avoid any conflict.

Help needed

We have had plenty of help along the way from blogs, forums, open source projects, and communities, as well as a number of teams and individuals at Hootsuite. Our IT and DevOps teams were instrumental in getting a stable pipeline up and running. In true startup fashion, our previous pipeline’s GitHub SSH credentials were setup using a QA developer’s individual GitHub account. While the fastest approach when under the gun, it’s certainly not best practice. To increase our bus factor🚌, our IT and DevOps teams hooked us up with some team accounts and credentials. We now use a read-only Github SSH deploy key that’s not tied to any employee’s GitHub account. We also have a team email that’s used for iTunes Connect and Google Play Store accounts, all the login credentials for which are stored in LastPass and shared across the team. In this way, app submission will never be jeopardized by something as silly as updating a password on an individual’s account.

Of course, we would be nowhere without the open source projects we use. Jenkins, Fastlane, GitHub, Calabash, Ansible, and Jazzy were paramount to building our pipeline. Our learning and discovery process would have been vastly stunted without the great documentation, communities, and forums available to us. Fastlane, recently acquired by Google, has proven to have a particularly helpful community. When we first started building our pipeline, Fastlane  was in its early stages and lagging behind on Android, but we have been consistently impressed with the rate at which new features are added, speed at which bugs are addressed and fixed and the quality of responses to questions and feature requests. Though not everyone’s mobile automated testing framework of choice, the Calabash google groups (iOS and Android) have been a great help as well, though tending more towards Stack Overflow of late (along with handy dandy Twitter updates).

The result

A year has passed since we started building our next generation pipeline. To get an idea of how we’re doing, we gathered some feedback from the people who use it most. Our developers have continuously offered new ideas to streamline our build and testing process and so we asked them about which aspects of our pipeline grinds their gears or floats their boat. Does our pipeline, in its current state, elicit feelings of joy/contempt/indifference? And the results were resounding, it makes a positive impact.

The one thing that was unanimously voted a feature that we just couldn’t do without, is app submission with Fastlane. Looking back, we were living in the stone ages before Fastlane. Managers and other teams love getting daily Crashlytics builds and being part of the day-to-day progress. Developers love being able to build and test on their feature branches without tying up their IDE or having to maintain provisioning profiles. QA starts the day with the latest release version of the app, pre-built and ready for testing. Just add water, shake, and presto, the latest developments (and bugs) appear. Fabric beta groups allow us to easily send a build to a single device, a specific team, or a larger subset of the company for all stages of testing. We all sleep better at night knowing the app can be submitted to Apple or Google with the push of a button, without having to wade through expired certificates, invalid profiles, or the Google Play Developer console. It is also a great way to ensure the right build is being submitted and maintain a history of previous submissions.

Next up on our list of features helping developer happiness is blocking merge requests, pending test results. The down side is, merging is blocked. Building the app and running the tests takes time. Try as we might to speed it up, the app isn’t getting any smaller. This delay, however, is a small price to pay for having confidence in the state of our working branch. Pre-blocking, there were days long periods where build errors hid other build errors which hid failing unit tests and the problem compounded. Now, we have confidence in every merge and near-instant feedback if a feature branch has failing unit tests or build breaking changes.

Among our less loved features is documentation. We use Jazzy to generate documentation for our iOS frameworks. On every commit to our working branch, a job kicks off to update documentation. While some members of our team get jazzed about Jazzy, most were indifferent, had never used it, or had forgotten it exists.

Other features in our pipeline received mixed reviews. For some folks, slack integration is a time saver. Getting a notification as soon as unit tests fail on the latest commit to a pull request keeps things moving quickly. For others, it’s noisy and breaks concentration after having already moved on to another task. Slack notifications were an early addition to our pipeline and the same value is largely provided by github status updates, save for the ping and red badge that, while some find a great reminder, others find an annoyance. Our take-away here is that notifying developers of failing tests is best done in an individual’s pull request, whereas a slack channel with a summary of build and test results every 30 minutes or so is a good place for QA and management to monitor automated tests and progress throughout the day.

Lastly, (in more ways than one) is our dashboard. We show a pass or fail status on a TV in the office for our build, unit test (main app and frameworks) and UI test Jenkins jobs for the branch each was most recently run on. The general sentiment is that our dashboard could use some work. Out developers say that our dashboard either doesn’t display information that’s useful to them, or that they don’t know what it displays. While UI test results are useful, it’s primarily the QA developers on our team who monitor these tests. We hope to revamp our dashboard soon, making it more developer-centric with a greater emphasis on the build and unit test results for multiple branches, and stability over time.

Time and effort

2 QA automation developers from our core mobile team started during our 2015 Hootsuite Hackathon, and in 2 days, created a bare-bones master and slave configuration pipeline that built our iOS app and ran our existing UI tests. We had managed to port the existing functionality to a new pipeline that offered an infrastructure capable of expansion/parallelization and configurability. Over the next 3 months, we built out the features of our pipeline alongside other daily work.

Early on, the pipeline required more maintenance as the kinks were worked out. If we were to plot the effort, it would look a lot like a hat, or a boa constrictor digesting an elephant. At the tail end, it required minimal maintenance, primarily when significant changes were made to a dependency, such as Apple releasing a new version of Xcode.

What’s next

We view our pipeline as a dynamic system that is constantly being optimized to increase company, team, and developer productivity along with product quality. As such, we allocate a certain portion of our automation engineering bandwidth to maintaining and extending our pipeline.

One of our top goals is to increase the stability of our pipeline. Stability is impacted by change. Additions and updates to our product platform cause downstream changes in the pipeline or automated testing configuration. When the pipeline isn’t functioning, and builds are not being created, tested and distributed, our overall development pace slows.

Another goal we have is to get more beta users testing daily app builds. Now that we have delivery automated, we want to get more people onboard with installing and trying the latest build. We think this will help us identify bugs and usability issues earlier in our process.

Currently our pipeline uses a set of physical devices, each attached to a slave, to cover testing against the most popular devices in use by our customers. Next, we want to take some of our early work with AWS Device Farm and integrate it with our pipeline so that for our regression testing of nightly builds, we test against a greater number of devices in the cloud.

We prioritize our continuous integration and delivery backlog by looking at how each potential change impacts developer happiness, software quality and automation efficiency.

What should you use?

In the two years since we first started down the path of expanding our own pipeline, a number of new products have emerged and existing tools have evolved to better enable continuous integration and delivery on mobile.

If you scan Quora, Reddit, Stackoverflow, Medium, and other community services you’ll find teams talking about a variety of tools. The conversation typically gravitates towards whether or not you want to host and customize your pipeline or prefer a hosted and more turn key solution.

Many companies, like ourselves, prioritize self hosting, low or no ongoing cost commitment, open source, testing on real hardware, and the ability to customize and extend. For these requirements, a JenkinsFastlane based pipeline is hard to beat.

Others prefer ease of setup, cloud based hosting and access, developer support, elegant user interfaces, and a hardware free solution. If you care more about these requirements, you might end up using CircleCI, GreenhouseCI, Bitrise, or Buddybuild. In terms of momentum and buzz, Buddybuild is certainly pulling away from the pack today.

One way to approach your personal decision would be to consider what you want to use as the backbone of your pipeline (JenkinsBuddybuild) and then what you want to use to build on top of, or extend, your pipeline (Fastlane). Contemplate your needs around integration with internal systems, security and compliance, risk tolerance, and the time people have available to contribute to your pipeline.

Regardless of what technologies you choose to enable continuous integration and delivery for your team, it is important to consider that many top developer candidates today now expect continuous integration and delivery to be part of your workflow and development process. Once you’ve become accustomed to having it, it’s really hard to go back to living without it.

About the Authors

Roy Tang and Kirsten Dohmeier are QA software developers for Hootsuite’s Android and iOS teams, respectively. Paul Cowles is the director of software development for the Hootsuite mobile group. Roy, Kirsten and Paul know the secret to scaling is automating all the things and are busy doing just that for mobile at Hootsuite.