Migrating Container Orchestrators – Mesos, Kubernetes, or Nomad?
Hootsuite’s recent transition from Monolith to Microservices, like any large scale change, has met many challenging issues. As we move towards a SOA (Service Oriented Architecture), we need to build new infrastructure, tools, and pipelines. This new architecture requires our web applications to be made up of many small components, which can be done via containerization. Containers then need to be orchestrated in order to truly run as a distributed system. An issue arose when our initial container orchestrator choice slowly became outclassed by other alternatives. Looking at where we wanted to be in the near future, we decided to migrate to a new platform. This blog will describe our technical decisions that drove this change.
What is container orchestration/scheduling? Why should you care?Consider a web developer who just finished building a simple PHP application. He has made sure every component functions properly on his localhost server. He uploads his code and assets to a host on the internet for the public, but is he guaranteed that everything will function the same as it did locally? The answer is ‘no’. The system environment could be very different between his local server and the external internet server. There might be missing dependencies, or the operating system could be entirely different. This is where a containerization tool called Docker comes in. Docker solves the inconsistency issue by stuffing the entire environment, along with the web app, into a “Docker image”. Any server with Docker installed can then run this image, regardless of the environment.
Running a single Docker image on one web server is pretty straightforward. That said, in a highly available SOA, there would ideally be many services each contained in their own Docker images, running on many servers scattered across large geographical areas. Important services expecting high traffic would have multiple copies of the same image running underneath a load balancer. How do we allocate server resources like CPU and memory to all these Docker containers? How do we update and scale our services without downtime? These are the problems container orchestration platforms and schedulers try to solve. And like all computer science problems, the solution is adding a level of indirection.
One Platform to Rule them AllHootsuite initially chose Marathon as our scheduler in March 2016. Marathon is a framework developed by Mesosphere that runs on a Mesos cluster. It was a very appealing option at the time because it was the only production grade scheduler, and was used at many other big companies including Twitter, Apple, and Airbnb. Marathon came with a friendly web interface for viewing and managing deployments. It also provided a reasonably featured REST API. Nine months after building on Marathon, however, alternative schedulers grew in popularity and caught our attention. One of these was Kubernetes, a tool built by Google and the very container orchestrator used to schedule Pokemon Go. Another was Nomad by Hashicorp, which offered tremendous speed and excellent integration with the other Hashicorp products we use like Consul and Vault. We considered swapping to one of these alternatives, and it came down to a three way battle between Mesos, Kubernetes, and Nomad; a decision had to be made.
Mesos + Marathon?Let’s first explore the Mesos + Marathon architecture and continuous integration system that we spent nine months building. By observing what worked and what didn’t, we can better understand our motivation for migrating to another scheduler.
From the beginning, our team aimed for a seamless continuous integration pipeline for Hootsuite developers. We wanted developers to be able to deploy their service by simply pushing code to their repositories. Any complicated cluster infrastructure and networking concerns should be abstracted. Meanwhile, developers should still be able to interact with their deployment by specifying the resources required to run their app, and viewing logs generated from the app for debugging purposes. All this was accomplished through a Github, Jenkins, and Docker pipeline, with Marathon-specific deployment information abstracted from developers via our in-house tool Skytrain.
When a service has started deploying on Marathon, developers can interact with it via the Marathon UI. From here, they are able to perform actions such as scaling the application, viewing logs, and editing deployment configurations. Some actions are also executable through Marathon’s REST API. These are the primary ways for developers to monitor and control their deployments.
One important background operation abstracted away from developers is service discovery. Service discovery is a mechanism that enables communication among microservices in a SOA. It allows Service A to reference dynamically allocated instances of Service B for sending requests. For this, we used a Hashicorp tool called Consul.
Kubernetes?Google’s container scheduler gained immense momentum from when we made our first scheduler choice until now. It has many smart, well-documented features that set it apart from other schedulers. I personally had the chance to experiment with Kubernetes first using Minikube, a tool for quickly setting up a single-node instance on my local machine, then with an experimental cluster provisioned on AWS. Let’s explore some key features that make Google’s scheduler stand out.
Instead of scheduling docker containers, Kubernetes schedules custom structures called Pods. A Pod simulates a group of containers running on the same host with a shared port range and storage. This gives developers the option of either scheduling services each in their own Pod or all in one Pod, depending on the level of coupling desired. This structure also fits into Kubernetes’ well designed IP per Pod networking model, allowing pods to be “treated much like VMs or physical hosts from the perspectives of port allocation, networking, naming, service discovery, load balancing, application configuration, and migration.”
Kubernetes also has two tools that allow devs to interact with their deployments, a web dashboard and a CLI (command line interface). The dashboard, much like the one provided by Marathon, allows developers to view, scale, and edit their deployments. The CLI on the other hand, is a much more powerful tool than any offered by Marathon with a wide array of commands that allow developers to monitor their deployments in great detail. For example, this command will retrieve application logs generated by containers running in the specified pod:
kubectl logs <pod-name>
Furthermore, unique to Kubernetes is a set of service exposure tools. These allow developers to expose services to the internet without creating custom bridge and edge routers. One way to do this is by creating a Load-Balancer service and assigning it to a deployment. This creates a cloud network load balancer with an externally-accessible IP address that sends traffic to the correct port on our cluster nodes. For example, if Kubernetes is configured to run on AWS, it would provision an ELB with an elastic IP.
Other useful Kubernetes features include Secrets for providing sensitive information like passwords and certificates to pods, and DaemonSets for ensuring a service is running on every node. All this indicates that Google has prepared their scheduler for a multitude of situations, and outclasses Marathon in many areas.
Nomad?As the possibility of migrating away from Mesos grew stronger, we expanded our pool of alternatives to include Hashicorp’s Nomad. I didn’t have a chance to experiment with Nomad myself. Fortunately, Hashicorp’s CTO, Armon Dadgar, held a product talk mid October this year at Hootsuite HQ, which I was able to attend. Here are some of Nomad’s selling points that appeal to Hootsuite.
Nomad comes with built-in integration with other Hashicorp tools that we use: Consul and Vault. Consul, as discussed previously, is the tool we use to enable service discovery in our Mesos setup. Since Nomad is designed to rely on Consul for service discovery, we could keep most of our existing setup during a migration. We use Vault to store keys and certificates, so again we would be able to utilize much of our existing infrastructure.
Nomad’s C1M, a challenge to schedule 1,000,000 containers on 5,000 nodes in under 5 minutes, yielded impressive results. This gives credit to Nomad for being very scalable, and an excellent choice for enterprises conducting large scale deployments. If Hootsuite have plans to expand to that scale in the future, Nomad is a safe choice.
Criteria Comparison – A ChecklistAt this point, we’ve done enough research to make a decision between the three container orchestration platforms. There were many criteria by which to judge our options. The two main ones were developer experience and operator’s ease of maintenance. Here is how Marathon, Kubernetes, and Nomad compared to each other under these criterion.
Developer Experience (DX)
The first criteria in our decision is DX . The goal of our team is to enable Hootsuite developers to build and deploy their applications as quickly and as smoothly as possible. There are so many layers of abstraction between a service and its developer — it runs in a docker container, on a node running in AWS, scheduled by an orchestration platform — and we want these layers to be as transparent as possible. Marathon, Kubernetes, and Nomad all provide a dashboard UI (Nomad through a third-party) for developers to control their services at a high level. For more detailed commands, however, Kubernetes’ CLI
kubectl is an excellent secondary tool, providing Docker-like commands not found in Marathon nor Nomad. Thus, for DX considerations, Kubernetes was the most favorable option.
Ease of Maintenance
Our second criteria is ease of maintenance for operators. This correlates with how easy the scheduler is to set up and manage from a devops’ perspective. To determine this, we look at the complexity of each scheduler’s architecture. Mesos/Marathon is made up of 3 key components: the Mesos kernel sitting atop a cluster for resource management, the Marathon framework running on Mesos for scheduling logic, and ZooKeeper for persistent state store. Kubernetes is similar in that it is made up of several components: master nodes to run the Kubernetes API, a cluster of worker nodes to run the containers, and a cluster of etcd nodes for persistent key-value store. All of these come as a package of Go binaries, so it’s easier to manage than the Mesos stack. Finally, Nomad comes as one single Go binary, making it extremely simple to manage and use, and making it best in the ease of maintenance department.
What Did We Decide?All things considered, Kubernetes was decided as the best option for Hootsuite. A migration away from our Mesos setup is justified by Kubernetes’ flexibility, and satisfaction of our criteria for DX and easy maintenance. Kubernetes’ flexibility comes from its myriad of unique features, like service exposure and pods; it was the reason we decided to reconsider container schedulers in the first place. Its dashboard and CLI gives developers many levels of control over their apps, which adheres to our DX standards. And although its package of Go binaries are more difficult to manage than Nomad’s single binary, Kubernetes’ many useful features are worth the compromise.
The decision to choose Kubernetes over Nomad was also justified. The opportunity cost of losing out on Nomad’s Consul and Vault integration is made up for by Kubernetes’ native service discovery and secrets handling, albeit with a slight operational overhead. Nomad’s enterprise scale scheduling speed also doesn’t matter for Hootsuite’s use cases; we don’t plan on scheduling thousands of services per second, yet.
The next step for us is to provision new infrastructure to run the three Kubernetes components. All of our services running in production on Marathon, like Jenkins, must be eventually ported over. It will be a lot of work, but we believe we made the correct decision in the end, and we look forward to fully leveraging Google’s Kubernetes container orchestrator on our continuing microservices journey.