Hootsuite’s recent transition from Monolith to Microservices, like any large scale change, has met many challenging issues. As we move towards a SOA (Service Oriented Architecture), we need to build new infrastructure, tools, and pipelines. This new architecture requires our web applications to be made up of many small components, which can be done via containerization. Containers then need to be orchestrated in order to truly run as a distributed system. An issue arose when our initial container orchestrator choice slowly became outclassed by other alternatives. Looking at where we wanted to be in the near future, we decided to migrate to a new platform. This blog will describe our technical decisions that drove this change.

What is container orchestration/scheduling? Why should you care?

Consider a web developer who just finished building a simple PHP application. He has made sure every component functions properly on his localhost server. He uploads his code and assets to a host on the internet for the public, but is he guaranteed that everything will function the same as it did locally? The answer is ‘no’. The system environment could be very different between his local server and the external internet server. There might be missing dependencies, or the operating system could be entirely different. This is where a containerization tool called Docker comes in. Docker solves the inconsistency issue by stuffing the entire environment, along with the web app, into a “Docker image”. Any server with Docker installed can then run this image, regardless of the environment.

Running a single Docker image on one web server is pretty straightforward. That said, in a highly available SOA, there would ideally be many services each contained in their own Docker images, running on many servers scattered across large geographical areas. Important services expecting high traffic would have multiple copies of the same image running underneath a load balancer. How do we allocate server resources like CPU and memory to all these Docker containers? How do we update and scale our services without downtime? These are the problems container orchestration platforms and schedulers try to solve. And like all computer science problems, the solution is adding a level of indirection.

Container orchestrator acting as the layer of indirection between containerized applications and distributed servers
Container orchestrator acting as the layer of indirection between containerized applications and distributed servers

One Platform to Rule them All

Hootsuite initially chose Marathon as our scheduler in March 2016. Marathon is a framework developed by Mesosphere that runs on a Mesos cluster. It was a very appealing option at the time because it was the only production grade scheduler, and was used at many other big companies including Twitter, Apple, and Airbnb. Marathon came with a friendly web interface for viewing and managing deployments. It also provided a reasonably featured REST API. Nine months after building on Marathon, however, alternative schedulers grew in popularity and caught our attention. One of these was Kubernetes, a tool built by Google and the very container orchestrator used to schedule Pokemon Go. Another was Nomad by Hashicorp, which offered tremendous speed and excellent integration with the other Hashicorp products we use like Consul and Vault. We considered swapping to one of these alternatives, and it came down to a three way battle between Mesos, Kubernetes, and Nomad; a decision had to be made.

Read More …