Posts from December 2017


As many companies tend towards a service oriented architecture, developers will often wonder whether more and more parts of their service could be moved into the cloud. Databases, file storage, and even servers are slowly transitioning to the cloud, with servers being run in virtual containers as opposed to being hosted on a dedicated machine. Recently, FaaS (function as a service) were introduced to allow developers to upload their “application logic” to the cloud without requiring the server, essentially abstracting the servers away. Despite not having to worry about the servers, developers find that they now have to deal with the cloud. Complexity with uploading, deploying and versioning now become cloud related. This, along with several current limitations of the FaaS model, has often positioned serverless technology as being best suited towards complementing a dedicated server.


Recently, the Serverless Framework was introduced allowing us to abstract even the cloud part of the development process away. Now, with just about everything server or cloud related hidden away, developers have the ability to directly write code related to the actual application. The serverless model also offers several other advantages over traditional servers such as costs and ease of scaling. So would it be possible to completely replace the server with a serverless model? Taking into account the limitations of the FaaS model, we set out to build a fully functional, cloud based, serverless app. Read More …

Why Migrate? – The Non-Technical Parts

Earlier this year, our Product Operations and Delivery team decided to migrate services from our Mesos cluster to Kubernetes. George Gao wrote a post detailing the technical reasons  for the move. On the non-technical side of things, the tooling around Kubernetes was more developer friendly than what Mesos offered, which bode well for our dev teams. Additionally, only the core operations team that originally implemented Mesos understood it. When problems arose, they were the only ones capable of troubleshooting.

Following an evaluation of alternatives, the team made a bet on Kubernetes and started migrating services on Mesos with the goal of moving all fifteen to Kubernetes. The team gave themselves three months to complete the migration, but thanks to our service mesh, the project only took two!

This was because our service mesh decoupled microservice networking from the application code. As a result, the migration process was limited to simple routing changes. To fully appreciate this, we need understand how our service mesh works.

What is a Service Mesh?

Imagine you’re writing a new service. Let’s say that your service has a bunch of microservices it needs to talk to. Do you hardcode the URLs to these dependencies into your service? What if there are multiple instances of each service so requests are load balanced? How will your service continuously discover these new instances if they could go down and be brought back up anytime with different URLs?

Adding logic for these considerations would bloat your application code and you’d have to do the same work for every service in your architecture. This work is only compounded as the number of services and languages grows.

One solution is to move this responsibility from the clients to the networking layer. By doing so, we have a ‘thin client’ and ‘fat middleware’ model. This is a service mesh. Practically speaking, this means setting up lightweight proxies between the origin and destination services to take care of service discovery and routing. Service discovery is the mechanism that allows services to dynamically track other services and route to them.

Once the service mesh has been set up, adding a new service to it makes the service automatically routable from existing services. You can then focus on writing application logic and trust that the network will route as you expect it to. At Hootsuite, this helped lower the barrier to writing new microservices. Mark Eijsermans gave a talk that goes into more detail.

Hootsuite’s Service Mesh

Our in-house service mesh is called Skyline. It uses Consul for service discovery and NGINX for routing.

Mesos to Mesos

On each node in the Mesos cluster, we run a Consul agent and an NGINX server. The NGINX config is kept up-to-date by fsconsul and consul-template.

Each container that runs on a Mesos node makes application requests to a Skyline  URL: http://localhost:5040/service/foo/endpoint. This request first goes to the local NGINX proxy at port 5040. The local NGINX then proxies that request to the destination NGINX proxy at port 5041, which routes the request to the correct application on the node. So, the Mesos service only needs to know the Skyline URL of its downstream Mesos service.

Mesos to Kubernetes

If the local NGINX proxy can’t figure out where to send the request, it just gets proxied to the Kubernetes cluster. All Kubernetes worker nodes are listed in Consul, so any calls to a service that isn’t found in the Mesos cluster will route to Kubernetes via a catch-all.

When a request comes in from outside the Kubernetes cluster, it will reach any Kubernetes worker node at random. On our Kubernetes nodes, we run a skyline-bridge Pod in a Daemon Set on all worker nodes. These Pods just run NGINX and listen on their container port 5041, which is mapped to the host port 5041. When a request comes into a Kubernetes node, the skyline-bridge Pod transforms the request URL into a kubedns name: http://foo.default.svc.cluster.local:8080/endpoint. After that, kubedns takes care of routing the request to the correct destination Service.

For example, say a Mesos service wants to reach the Giphy service sitting in Kubernetes. The origin service calls http://localhost:5040/service/giphy/media. The request gets proxied from the local NGINX to a dedicated set of gateway servers called the ‘service discovery bridge’ (SDB) and into a Kubernetes worker node.

The skyline-bridge Pod on that node receives the request called {NODE IP}:5041/service/giphy/media. It transforms the request into http://giphy.default.svc.local:8080/media. That request is then passed to kubedns to be routed to the Giphy Service.

Kubernetes to Mesos

If a request from within Kubernetes is destined for a Mesos service, the requesting service calls the kubedns name of a Service that represents the Mesos service. This Service targets skyline-bridge Pods.

For example, the Organization service lives in Mesos, but there is a Service named organization.default.svc.cluster.local to represent it in Kubernetes. The skyline-bridge Pods will then transform the kubedns name of the destination into a Skyline URL before proxying it to the Mesos cluster.

The server named is part of the ‘service discovery bridge’ (SDB) that acts as a gateway into both the Kubernetes and Mesos cluster.

Life of a Request through the Service Mesh

Let’s look at a more detailed example. Say that we have a request from a service outside the cluster to our Giphy service running in Kubernetes. This request will first be made to the local NGINX proxy at http://localhost:5040/service/giphy/media on the origin server. This request then gets proxied to the SDB.

Once the request hits the SDB, it will proxy the request to its known backends based on the request URL. If the destination service sits in our Mesos cluster, it would appear in the SDB’s NGINX config like this:

From this, we know that our request will be routed to one of these Mesos slaves. If the Mesos service goes down, its location block in the SDB will be removed by consul-template. In that case, the SDB will fallback to the Kubernetes upstream, as shown below.

On a side note, since the Mesos service location blocks come before the Kubernetes one, the SDB will always prioritize routing to Mesos over Kubernetes as long as a Mesos version of the service is healthy. Once the request reaches the Kubernetes cluster, the skyline-bridge Pods will take care of routing it to the correct Service.

How did this help with the migration?

Easy cutover

Consider a service’s dependency that was originally in Mesos and sitting on Skyline. If it got scaled down, the SDB would automatically route to the catch-all and into the Kubernetes cluster. Assuming that there was already a Kubernetes version running, the traffic cutover would happen seamlessly.

This is the power of a service mesh. Instead of having to SSH into each upstream service and make manual routing changes, we can just tell Mesos to scale down the service and our service mesh will correct the routing for us.

Easy fallback

If something went wrong with the Kubernetes version of the service while it was serving requests, falling back would be as simple as scaling the Mesos service back up. Since the SDB favours the Mesos cluster, it would automatically route traffic back into the Mesos cluster. This also sidesteps manual configuration in emergencies.

Minimal code changes

With this, the only code changes our migration team had to make were to the dependency URLs. These are usually defined in a service’s deploy configurations (.yml files).

If a service was in Mesos, it would reach a downstream service at http://localhost:5040/service/foo/endpoint. Once it gets deployed to Kubernetes, the downstream service URL would need to be http://foo.default.svc.cluster.local:8080/endpoint. Following our Giphy example, the code changes we made to its deploy configuration looked something like this:

Thus, pull requests were relatively small and simple to review, which made the migration smoother for all parties involved.

Towards the future

Migrating from Mesos to Kubernetes reduced a huge amount of technical debt across all teams. On the operations side, it didn’t make sense for us to maintain two separate container schedulers and deploy environments. It also saved us $67k a year. On the development side, service owners have a more robust and developer-friendly tool in kubectl to debug and maintain deployments, enabling them to troubleshoot problems instead of relying on the core operations team.

For our next step, we are looking at second generation service meshes. Compared to Skyline, these service meshes promise out-of-the-box features like smart retries, circuit breaking, better metrics and logging, authentication and authorization, etc. Currently, Istio and Envoy are our strongest contenders. As our microservices grow, this will give us even more operational control over our service mesh to empower fast and safe deployment of services at Hootsuite.

Big thanks to Luke Kysow and Nafisa Shazia for helping me with this post.

About the Author

Jordan Siaw is a co-op on the Product Operations and Delivery (POD) team. He is a Computing Science major at Simon Fraser University (SFU). When he’s not busy with code, he enjoys reading, podcasts, playing guitar and programming jokes. Find him on Github, Twitter, Instagram or LinkedIn.

The Hootsuite App Directory is a collection of extensions and applications which Hootsuite users can add to their Hootsuite dashboard to create a customized experience. Since its launch in 2011, it has been used by millions of Hootsuite customers. In the past 6 years, it has accumulated gigabytes of data, from information about the apps that people can install to information about which apps are installed for each user. As more apps are released into the app directory and more customers install and use apps, it becomes all the more necessary to have a database which is easy to maintain and scale. The requirements for our App Directory database are:

  • The ability to handle relational data.
  • The ability to easily ensure data integrity.
  • The ability to handle many simultaneous read requests.

When the Hootsuite App Directory was introduced in 2011, we did not know how it would develop over the next six years. MongoDB was chosen to hold the app directory data, at the time MongoDB was Hootsuite’s primary database. MongoDB provided us with much needed flexibility during the early stages of the App Directory. We were able to store data in Mongo without focussing strictly on the structure of the data, allowing us to quickly handle rapidly changing requirements as we experimented with various apps and integrations.


We have previously written about our move from a PHP monolith to a microservice architecture. As a part of the larger-scale migration project, we are moving the App Directory logic from the monolith into a Scala microservice. Along with the logic, we are moving the data from the MongoDB Database connected to the PHP monolith, into a new database connected to our microservice. This has given us an opportunity to revisit our database choice and the data’s schema.

Relational Data

Our App Directory has matured since its launch and we have a more stable data model than we had when the project began. This stability allows the model to be represented well by a schema. The model consists of various relationships which can be used to reduce the complexity of the business layer.

We currently have these collections among others in our MongoDB:

  • An App collection which contains data related to Apps.
  • An InstalledApp collection which contains data related to installed Apps.
  • An InstalledAppComponent collection which contains information related to external developer.
By taking the advantage of features built into Mongo like document embedding and document references, we could possibly have used MongoDB much more efficiently. These techniques could be used with one-to-many relationships (for more details refer to this link). An example of this relationship in our database in our database is between App and InstalledApp:

We have denormalized InstalledApp and InstalledAppComponent, and referenced the InstalledApp to App collection.

The complexity of the queries increases dramatically when it comes to more involved relationships such as multi-layer hierarchy or many-to-many relationships. You can see this in the example above. When the document starts growing, performing update or search operations on the data becomes more difficult. Though it is easier to update documents using document referencing, it would require us to use multiple queries to retrieve the related data. In the end this leaves us with both ugly documents and highly convoluted queries.

In newer versions of MongoDB, they have introduced “lookup”, a feature analogous to SQL left outer joins. There are two main reasons why we are reluctant to use lookup. Firstly, lookup is a relatively new feature and would require us to a upgrade our Mongo version. Secondly, it only performs left outer joins, so performing inner joins and full joins would still result in messy, hard to maintain code.

The complex Mongo query above is expressed relatively simply in SQL. Here is the same query, retrieving the total number of app installs for each app developed by a certain app provider:

The above query shows how easily relations can be handled in MySQL. These are the benefits of using a robust query language like SQL. It allows for operations such as joining, filtering and grouping. MySQL is a relational database and our data model consists of complex relations, therefore we feel that MySQL is more suitable database for our use case in the Hootsuite App Directory.

Data Integrity

Data integrity is defined as the overall completeness, accuracy, and consistency of the data. This is highly valuable to us, as it increases the maintainability, reusability, stability, and performance of the service.

MongoDB follows the BASE approach, which sacrifices consistency in favor of making the database more partition tolerant. As a result, performing operations on more than one document at once may lead to corrupted data in the database. MongoDB provides two phase commits which allows transactions to be performed similar to transactions in SQL. If there is an error during the transaction a rollback is performed. One important difference from SQL is that a user can still access the intermediate state while the operation is in progress. The Mongo documentation warns:

It is possible for applications to return intermediate data at intermediate points during the two-phase commit or rollback. [1]

This is not the case with SQL as it adheres to ACID, having the properties of Atomicity, Consistency, Isolation, and Durability. This ensures that the data always remains in a valid state, both consistent and accurate.

Being schema-less and without the presence of referential integrity, MongoDB shifts the burden of maintaining data consistency onto the developers. A side-effect of the lack of strict schema and referential integrity is that bugs in code can result in inconsistencies in the database. These inconsistencies may not surface as errors until something breaks. It is certainly possible to prevent data inconsistencies in database by thoroughly designing and testing the software that reads from and writes to it. However, as we are moving from PHP to Scala, we would not only have to rewrite all the models, but we would also have to write extra code to ensure consistency. We reasoned that this would slow down the migration process as well as adding to the difficulty of maintaining the code. With the relations in our data, we would like to have referential integrity so that we don’t create orphaned data. Implementing referential integrity in MongoDB would require the following steps:

Inserting an InstalledApp

  • Insert InstallApp
  • Search Mongo for correct App using the appId.
  • Insert the installedApp in the installedApp array in App collection
Deleting an App
  • Fetch the the right App
  • Get all the InstalledApp ids
  • Remove all of InstalledApps
  • Delete the App from App collection
There are many other scenarios that would have to be covered for our use cases, and there are only 3 collections! To make things worse, if anything fails, we would end up having faulty data.

MySQL requires us to define a schema, declare data types, nullable fields, etc. We can declare foreign keys (referential integrity) while creating the schema itself. The schema does not reside in the business layer, it is part of the database itself. If the data does not agree with the defined schema, it will not be added to the database. This lessens the burden on the developer to implement logic ensuring the consistency and correctness of the data.

Read Requests

Our service receives many simultaneous read requests, and also relies heavily on relational data. We need a database that performs very well under such conditions. General speaking, Mongo outperforms MySQL when the service is exposed to high volume of write requests. This is because things like referential integrity and ACID compliance have a cost. Being horizontally scalable, MongoDB can deal with an even higher volume of requests by taking advantage of the benefits of distributed systems.

When it comes to read requests, especially when dealing with relations, MySQL often outperforms MongoDB. Moreover, with proper indexing, the performance of operations, such as joins, can be improved drastically in MySQL. One reason why MongoDB is slower in these cases is that some of the logic handling the relations resides in the business layer, which is not the case with MySQL. Because we experience a high volume of read requests, allowing for slower write requests in favor of faster read requests is a reasonable trade off.

Although MySQL is generally vertically scalable, there are still ways to make it horizontally scalable. Features like replication, and products like dbShards can be used if needed. With our requirements, replication is a good option as we can balance high read requests between various slave MySQL databases.

Migration Progress

Anyone who has done a data migration knows that it is not an easy task. We are dealing with customer data and we want to ensure that the integrity of that data is maintained throughout the entire process. Our strategy is to write to both our MongoDB and MySQL databases, and then compare if the data matches. For historical data, we use a migration script which exports the data from mongo and then imports it into SQL using the new schema. Any mismatches are fixed by the team, either by adding more validation checks in the business layer or by updating the migration script.

This migration project has given us some good insights into our legacy code. This has enabled us to write much more efficient and more maintainable code for our microservice. It is a win-win situation for us, we are storing clean data in the database and we have higher quality code.

Remarks and conclusion

In the end, both MongoDB and MySQL have their strengths and weaknesses. However, the differences between MongoDB and MySQL are lessening with new features that continue to be released. Newer versions of MongoDB can use join-like operations and MySQL now has the ability to store JSON data. Along with that, there are many integrations available which can be used to improve the performance of MySQL, or to handle transactions effectively in MongoDB. It ultimately depends upon the data, and what do you want to do with that data to determine which database is the right choice.

For our App Directory service we have a well defined relational data model. We want to ensure that the principles of data integrity are offered by the database itself, and that it can also handle many simultaneous read requests. These requirements led us to choose MySQL for our new App Directory service database.

Shoutout to Neil, Sim, Jody, Steve and Isha for helping me with the blog post.

Preetkaran Rawal is a Co-op Software Developer on the Developer Products team. He currently attends University of Waterloo for Computer Engineering.

Traditionally, native Android development was done using Java. Kotlin is a relatively recent JVM language developed by JetBrains and was first released in 2011. The Android team at Hootsuite was one of the earliest adopters of Kotlin. In April 2016, the first Kotlin commit was added by converting a Java class into a Kotlin data class. It was shipped to production, and Kotlin was officially introduced into the Hootsuite app. Kotlin turned out to be a boon for the codebase and it’s quickly becoming the go-to programming language for Android Development at Hootsuite. I’ve spent the majority of my time at Hootsuite working with Kotlin and I’ve absolutely enjoyed using it. The amount of thought that has gone into making the language is impressive and it’s got some neat features that come in very handy when coding. Here are some interesting features of Kotlin that I came across during my co-op term:



  1. Data Classes
Data Classes only contain state and don’t perform any operation. The advantage of using data classes instead of regular classes is that Kotlin gives us a ton of self-generated code. For instance, the properties declared in the constructor help avoid the boilerplate code of getters, setters, the constructor and copy() method. Let’s see an example –

A Java Class used for storing data about a car:

The equivalent single line class in Kotlin:


  1. Null Safety
Null Safety is one of the most useful features of Kotlin. In Java, any object can be null. This means that runtime checks must be added throughout a codebase in order to prevent NullPointerException crashes, which has often been called a Billion Dollar Mistake by language designers.

The code fragment below shows an example of a null check in Java. They are cumbersome and easy to miss.

Kotlin’s type system is designed to eliminate the danger of null references from code. There are two distinct types of references in Kotlin – nullable references and non-nullable references.

As we can see from the code fragment above, if a developer attempted to pass a nullable object to the second class, it would result in a compile time error.


  1. Kotlin Standard Library
The Kotlin Standard Library contains some very powerful functions. They are concise, intuitive and help consolidate the code. Some of the most commonly used ones are:


  • let()

let() is a scoping function used whenever we want to define a variable for a specific scope of the code but not beyond. It can also be used to test for null. In the code fragment below, the variable is no longer visible outside the scope of the block.


  • apply()
The apply() function calls the closure passed in parameter and then returns the receiver object that closure ran on. In the code fragment below, we create a create a new ‘Person’ object with ‘name’ argument, call the method foo() in the closure on it and return the the resulting object instance afterwards, all in a single line!


  • with()
The with() function is used to factor out the variable when multiple different methods are called on the same variable. In code fragment below, we call the 3 methods on object w without having to repeat ‘w’ for each method call.


  1. Extension functions
Extension functions enable you to add functions to a class without actually writing them inside the class. They are used like regular functions and are statically resolved. They come in especially handy when working with library code that you cannot change. For instance, here’s how you can extend the View class to add a method that makes a view object visible:


  1. Kotlin Android Extensions
When using Java for Android Development, in order to reference a view from its layout file, you have to call the findViewById() method.

The code fragment below shows how to set the text of a TextView (with id ‘text_foo’). It needs to be referenced using ‘findViewById’ before calling any method on it.

It gets very repetitive when you have to reference a large number of views this way. One common alternative is to use ButterKnife, a view binding library that uses annotation to generate boilerplate code. However, that has the drawback of introducing an additional library into the project.

Kotlin Android Extensions is a Kotlin plugin, that enables you to recover views from Activities, Fragments and Views in a seamless way. It allows you to access views in the layout XML, just as if they were properties with the name of the id you used in the layout definition. This no need for lengthy findViewByIds or additional third-party libraries. For instance, in the code fragment below, we simply address a TextView with its id, to set its text.


  1. Java Interoperability in Kotlin
Kotlin boasts 100% interoperability with existing Java code. All Kotlin classes can inherit from Java classes, implement Java interfaces, call Java methods, and so on. Conversely, Java code can inherit from Kotlin classes, implement Kotlin interfaces and call Kotlin methods. This means that you could write code in Kotlin, without having to jump through hoops to make it compatible with pre-existing Java code or frameworks. It is one of the key reasons why experimenting with Kotlin in large codebases is possible.



In this blog post, I’ve presented some of the neat features of Kotlin that I enjoyed working with during my co-op term. Kotlin focuses on many important topics like interoperability, safety and clarity. I’d definitely recommend trying out Kotlin! Here are a few resources to get started:


About the Author

Shruti Basil is a Co-op Software Developer on the Core Android team. She currently attends University of Waterloo for Software Engineering.



During my co-op term, I got the opportunity to work with the creators of Atlantis to contribute a Slack integration feature, providing value to both Hootsuite and the open source community. I learned significantly and enjoyed developing the feature all the way from design to implementation and testing.


At Hootsuite, we write our infrastructure-as-code using Terraform. To effectively collaborate on Terraform, we created and open sourced Atlantis. Here’s a brief introduction for those unfamiliar with Terraform or Atlantis.

What is Terraform?

Terraform is a tool for writing infrastructure-as-code, controlled via a Command Line Interface (CLI). This allows users to review, version, reproduce and automate changes in infrastructure.

Using Terraform, we describe infrastructure resources such as compute instances, storage, security groups, DNS entries and more, using HashiCorp Configuration Language (HCL).

For example, we can describe an AWS security group with:

Then run terraform plan to create a plan for modifying infrastructure – creating a security group in this example. Finally run terraform apply to apply the plan.

What is Atlantis?

Atlantis is a tool for teams to effectively collaborate on Terraform. It enables Terraform plans and applies to happen directly on Version Control System (VCS) platforms such as Github via pull request comments. As a result, both operations engineers and developers can collaborate, discuss and review Terraform outputs right on the pull requests. Other cool features include a unified location for credentials so no need to worry about distributing credentials and locking workspaces to prevent concurrent modification.

Atlantis and Terraform in action

We’ll create a pull request on Github describing a security group:

Then comment atlantis plan to make a Terraform plan for creating the resource:

Finally comment atlantis apply to actually create it. After doing so, we can see the newly created security group on AWS:

Working in the Dark is Bad

It’s awesome that our workflow for modifying infrastructure happens directly on pull requests, but changes such as modifying security group rules, database rules, and DNS entries inherently come with the risk of causing outages. Ideally, all recent infrastructure changes should be highly visible and easily trackable, so we can minimize the time spent searching for potential breaking changes when a problem arises.

To address this issue, I first asked “where do we currently look for information about recent production deployments?” – Slack! Leveraging this, it makes sense for all infrastructure changes to be available there too. This will increase visibility on recent changes in infrastructure and reduce the time spent tracking them down.

Atlantis + Slack Integration to the Rescue

An Honourable Attempt

My first approach was to configure a Slack Incoming Webhook which provides an URL endpoint to send a message to a Slack channel. Then use Atlantis’ project specific configurations that allows a shell command to be run after an apply executes for a specific Terraform project. With these two functionality, I can configure a Terraform project to curl the Slack URL to send a message after an apply is executed. This was an appropriate proof of concept but this only works for a single project so the project specific configurations would need to be duplicated for all 100+ Terraform projects at Hootsuite to catch all infrastructure changes. Following the Don’t Repeat Yourself (DRY) software engineering principle, I knew this was unacceptable.

Second Time’s the Charm

For my next approach, I reached out to Luke and Mishra, the maintainers of Atlantis, about adding Slack integration to Atlantis natively. Luckily, they indicated this was a requested feature by users of Atlantis and endorsed the idea!

With the new approach, we can:

  1. configure the Slack webhooks for an Atlantis server rather than for each Terraform project
  2. support sending messages to different channels based on the workspace (dev, staging, production, etc)
  3. contribute to the open source community and provide an easy way for Atlantis users to set up slack notifications
To enable sending Slack messages from Atlantis at a server level, I implemented the logic to execute any webhooks and a Slack webhook that sends a request to Slack’s chat.postMessage API to post a message to a Slack channel. Finally, I exposed a configuration that allows the Slack webhook to be triggered by apply events.

After implementing this feature, it’s simple to configure Atlantis to use it. First, we create a Slack Workspace Token for accessing Slack APIs, then specify to use a Slack webhook in Atlantis’s configuration. It’ll look like:

Now we’re ready to go! Whenever Atlantis executes an apply, it’ll trigger the Slack webhook and send a beautiful message to the channel specified:

With this in place, all infrastructure changes done through Atlantis will now be visible in a Slack channel – success!

For bonus points, we can configure certain webhooks to trigger based on workspaces by specifying a workspace-regex in an event. This is useful for teams with multiple workspace environments.


Developing this feature and contributing it for the open source community was super fun. Big thanks to Luke Kysow for walking me through the codebase, reviewing all my code and helping me throughout the whole process from design to implementation and testing!

About the Author

Nicholas Wu is a Software Developer Co-op on the Production Operations and Delivery team. He is currently studying Computer Engineering at the University of British Columbia. Connect with him on Linkedin or find him on Github, Instagram, YouTube.

Here at Hootsuite we’re always looking for better ways to utilize our time and new technology. That’s why we’re looking at building an environment that helps developers leverage machine learning (ML) in production and minimize the overhead (i.e. amount of technical debt) required.

The Problem

Currently, ML is often done on a very ad-hoc basis. That’s because there is no standardized workflow—as of yet—to deal with many of the unique difficulties associated with ML. At Hootsuite, we identified four key components that are needed:

  • Data: All training and verification data needs to be validated and versioned so if something goes wrong, we can effectively track down the issue if it’s related to our data.
  • Model Training: We don’t want to be reinventing the wheel every time we need to train a new model (even if the APIs for TensorFlow and scikit-learn do make that process easier).
  • Model Validation: Processes need to be put in place to make sure that when we update models in a production environment they actually perform as expected and bring measurable benefits.
  • Infrastructure: Broadly, this is everything from being able to easily switch out models in production to making sure that they can be accessed in a uniform format so we’re not writing new code when we want to add ML to an existing product (i.e. it should be as easy as making an API call with the data that we want analyzed).
When we lack a standardized process for those four key components we have two issues: teams replicating code when they don’t need to be and fragile infrastructure that is based off of unique constraints rather than being robust and extendable across multiple issues. This is the technical debt that we want to minimize from the outset.

Technical Debt

Photo source: Hidden Technical Debt in Machine Learning Systems by D. Sculley et al.

Given the complexity of ML systems, it’s unsurprising that they can contain some areas of technical debt. What was surprising to me however, was the sheer number of ways that that technical debt can arise. This comes from the actual ML model only being a tiny fraction of the code required to put it into production. D. Scully et al. from Google give a detailed overview of the numerous pitfalls that one has to be aware of when designing a ML system here. In short, they identify seven keys areas with numerous subcategories:

  • Complex Models Erode Boundaries
    • Entanglement: Since ML systems mix all of the information they receive together in order to understand it, no input is actually independent. This means that  Changing Anything Changes Everything.
    • Correction Cascades: Sometimes models that solve slightly different problems are layered on top of each other in order to reduce training time by taking the original model as an input. This creates dependencies that can prevent a deeply buried model from being upgraded because it would reduce system-wide accuracy.
    • Undeclared Consumers: Similar to visibility debt in more classical software engineering, if a consumer uses outputs from a ML model, this creates tight coupling (with all of the potential issues arising from that) and potential hidden feedback loops.
  • Data Dependencies
    • Unstable: If ownership of the service producing the input data and ownership of the model consuming it are separate, then potential changes to the input data could break the model’s predictive abilities.
    • Underutilized: Some inputs provided limited modelling benefit and if they are changed, there can be consequences. There is a tradeoff to be made about complexity and accuracy (as in most software systems).
  • Feedback Loops
    • Direct: Occasionally, such as with bandit algorithms, a model may be able to influence what training data it will use in the future.
    • Hidden: More difficult to recognize than direct loops, these occur when two or more systems interact out in the real world. Imagine if the decisions made by one system affect the inputs of another, changing its output. This output then affects the inputs of the first system, leading each system to optimize behaviour for each other.
  • ML-System Anti-Patterns
    • Glue Code: Any code that is needed to transform data so that it can be plugged into generic packages or existing infrastructure.
    • Pipeline Jungles: These appear when data is transformed from multiple sources at various points through scraping, table joins and other methods without there being a clear, holistic view of what is going on.
    • Dead Experimental Codepaths: The remnants of alternative methods that have not been pruned from the codebase. The expectation is that these will not be hit but they could be used in certain real world situations and create unexpected results.
  • Configuration Debt
    • This is oftentimes viewed as an afterthought even though the number of lines of configuration required to make a model work in the real world can sometimes exceed the number of lines of code.
  • Changes in the External World
    • Fixed Thresholds in Dynamic Systems: If a decision threshold is manually set and then the model is updated using new training data, the previous threshold may be invalid.
    • Monitoring and Testing: Since ML models operate in the real-world, they need real-world monitoring to make sure they work. Three areas where it may be useful to focus on monitoring are:
      • Prediction Bias: the distribution of predicted labels should be equal to the distribution of observed labels
      • Action Limit: systems that take actions in the real-world should have a broad limit on how often that can (or cannot) happen
      • Up-Stream Producers: data pipelines that lead to the model needing to be tested and maintained
  • Other
    • Data Testing: Data should be tested on a continuous basis using standardized tests rather than one-off analysis when initially set up.
    • Reproducibility: In an ideal world, the results of a ML model should be reproducible by anyone with the same data and the same source code.
    • Process Management: As many system-level processes, such as updating models and data, assigning computing resources, etc. should be as automated as possible.
    • Cultural: Teams need to reward the cleaning up of models and technical debt as well as improving accuracy when it comes to allocating development time and resources.
As D. Scully et al. have shown, there’s a lot more than just tweaking the model that we need to be concerned with when we want to use ML in production environments. The more work we can standardize, the less technical debt we will need to deal with on an ongoing basis.

Current Offerings

Currently, there are limited options for all of the infrastructure and versioning needs around trying to deploy a model in production. One of the options out there is Amazon Machine Learning (AML). The nice thing about AML is that it plays nice with data imports from S3 and Redshift, both services currently in use at Hootsuite. The not-so-nice things about AML is that you can only build one kind of model (a logistic regression) and any tuning you might want to make needs to be done through their platform using their built-in ‘recipe’ language. This is less than ideal for the range and complexity of models that we want to be deploying across Hootsuite.

(Side note: AML has been supplanted by AWS SageMaker which was announced at AWS re:Invent this year. It looks like it could have some useful applications and does seem to handle many of the issues raised here.)

So, there’s not a commercial off-the-shelf solution that would solve our problems. What to do? Build one ourselves! This lack of availability means we are evaluating open-source alternatives that we could cobble together into a cohesive package that will fit all of our needs. That search is what led us to TensorFlow Extended.

TensorFlow Extended

A long time ago at KDD2017 in August 2017, a team of Google developers presented their solution to the problem we are having: building a production-scale platform for effective ML. While, unfortunately for us, not all of their work was released as open-source, their analysis and framework gave us a starting place and guidance on how to do this at Hootsuite.

Denis Baylor et al. presented a similar (and more detailed) framework compared to what we had already developed internally about what exactly was needed to build a ML platform. They also had a number of constraints for what they wanted at Google:

  • Building one ML platform for many different learning tasks
  • Continuous training and serving
  • Human-in-the-loop
  • Production-level reliability and scalability
Their eventual system design ended up looking like this:

Photo source: TFX: A TensorFlow-Based Production-Scale Machine Learning Platform by Denis Baylor et al.

While this is an ideal Google-scale solution to our ML problem, it’s not necessarily a Hootsuite-scale solution. So we now have an idea of what components we want; this idea has been validated as being similar to the metrics Google used for the design of their own system. Next steps: figure out what open-source tooling is out there for each of those components so we don’t reinvent the wheel.

Hootsuite Requirements

Since we’re operating at significantly different scales, there’s a number of components that Google is using that we can drop from consideration for now (though who knows where we’ll be at in 5 years). We don’t need to be building an integrated front-end since there simply is not enough demand for it as of yet. Garbage collection is not important given the size of data we are working with and data access controls are already being implemented. And finally, for the moment, each team will need to be responsible for maintaining and documenting their own ML pipelines. While we’re ignoring pipelines for the moment, there is interesting work being done on trying to simplify construction on a distributed system and even automating pipeline construction entirely.

So that leaves us with categories to focus on: All Things Data; Cradle-to-Grave Models; Serving It Right.

All Things Data

Data is what drives ML so obviously, there are some pretty extensive infrastructure needs when it comes to handling it.

  • Data Versioning: The short story is versioning data is hard, really hard. The long story is slightly more complicated. There’s a good overview of why it’s hard and some principles about how to tackle it here. What it boils down to is that because there are so many different formats data can be stored in, there is not an easy-to-use and efficient option yet, like a git for data. Git works by tracking changes in a line in a text file, and it’s just not feasible to store all of the data we may want to use as a csv file. GitHub does offer Git Large File Storage, though it has some issues. That big issue is that it stores each version of the data as another file rather than just tracking differences. That means having 10 versions of a 10GB dataset would take up 100GB of space! That’s going to get very expensive very quickly. As a result, we’re currently doing the same thing in principle but in S3 because that’s where our data lives anyways. However, the process is by no means ideal because it opens up the opportunity for more human error and takes up lots of space.
  • Data Ingestion: Since we’re storing everything in S3 at the moment and not dealing with datasets that are too, too big (yet), right now we’re just pulling in the data directly from there. This works given our needs and workflow but could definitely use some optimization as the data we handle gets bigger and bigger. Connecting compute resources in Redshift directly to storage in S3 through Redshift Spectrum definitely seems one promising avenue to pursue here.
  • Data Cleaning: Cleaning data is normally a very manual process: clean some data, test it, clean some more, do a bit of analysis, and on and on. Depending on the nature of the project, this could also be beneficial as you get to understand the data better. However, there is tooling out there that could help with process (as long as the ML model being built is convex loss). That tooling is ActiveClean. For those specific use cases, this could help clean the data quicker and more efficiently than manual processing.
  • Data Analysis: This is where things start to get a bit interesting. Every dataset is unique and every business need it is being used to solve is different. However, there are a number of descriptive statistics that could be calculated on included features and over their values to provide insight into the shape of each dataset. This would give us a good overview of how the data looks which means anomalies could be spotted sooner. A number of existing libraries in various languages could be used to do this efficiently.
  • Data Transformation: This is going to be very specific to each ML model developed so limited optimizations can be done. However, making sure that any assets that need to be reused for both training and prediction (such as vectorizers) are automatically exported would remove one more place for human error.
  • Data Validation: To do data validation, we (and Google) would rely on having a schema that defines what kind of data should be in the dataset. Some examples that Google uses to define its schemas are:
    • Features present in the data
    • The expected type of each feature
    • The expected presence of each feature
    • The expected valency of the feature in each example
    • The expected domain of a feature
As part of a separate project to organize all of the data in use at Hootsuite, we’ll have schemas for existing datasets and new ones could be produced automatically as data is fed into a ML model. Not all of the definitions are necessary but at a minimum, the features present and the expected type would be a good starting point to confirm.

Cradle-to-Grave Models

  • Model Versioning: Just like with software, we need a way to be able to keep track of how models evolve over time so that if a new version is less effective than before, it’s easy to roll back. Otherwise, it’s just a free-for-all of updates with no governance. This is still a relatively new area of focus; however, there are some interesting open-source solutions, such as Data Version Control, out there that at least attempt to solve this problem.
  • Model Training: When training a model, a process that could potentially take days, we want to be able to streamline the process as much as possible. Google has a great idea on how to do that: warm-starting the models. Basically, this means relying on transfer learning to generalize a set of parameters that represent the base state in order to help initialize the new training state. Rather than starting a new version of model from scratch, it is instead given a fuzzy view of the world. Thus, training time is significantly shorter. Being able to optimize this time spent is definitely something to keep in mind as we build production infrastructure.
  • Model Evaluation: Using Google’s definition of evaluation as “human-facing metrics of model quality”, we want to be able to easily help teams prove their models work. This would basically be applying the model to offline data (so we’re not having to deal with real-time user traffic) and figuring out a way to translate training loss to whatever business metric is most relevant to the experiment. While this can—and is—done on an ad-hoc basis, it could also be automated in order to save developer time and reduce glue code.
  • Model Validation: Again, from Google, validation can be seen as “machine-facing judgment of model goodness”. These metrics basically boil down to tests that can be run on models in production against a baseline (i.e. the previous version of the model). If the new version is not beating the previous one, it gets taken offline and the previous version is rolled out. Automating this would help make sure that models are actually providing value to the business when used.
Serving It Right
  • Development Workflow: We want to make sure that the development workflow is as seamless as possible. This means making sure that devs can hook Jupyter Notebooks up to GPUs on the cloud and that they have all the libraries they need pre-installed and ready to run. We also need to be putting some guidelines in place about how to be developing models at Hootsuite so we don’t end up with a different standard for each team.
  • Serving: Once the model has been developed, we need to be able to actually serve it in production. This is where Google—again—comes in. With TensorFlow Serving, we’re able to create a docker container that can be deployed on Kubernetes and an API endpoint we can hit with any of our microservices in order to get predictions. This means that any ML that happens can be easily folded into our existing microservice architecture.
Where we’re at

Right now, we’re working on developing the base framework that will underlie the eventual goals outlined here. We need to make sure the foundation is solid in order to build on it. It’s great to have the example of companies like Google that have put their infrastructure details out there so that we can take inspiration from them when we’re building our own. Machine learning definitely has a place at Hootsuite; we just need to make it as painless as possible.

Further reading

If you’re interested in a much deeper dive on this topic, check out Machine Learning Logistics: Model Management in the Real World by Ted Dunning and Ellen Friedman from O’Reilly Media. They examine some really interesting nuances of serving models in the wild as well as introduce a new design pattern: the rendezvous architecture. It’s a worthwhile read.

About the Author

Rob Willoughby is a software developer co-op on the DataTech team at Hootsuite. He is a Bachelor of Computer Science student at the University of British Columbia with an interest in all things data. Connect with him on LinkedIn.

The Inbox Project

For my coop term, my team and I tackled the Inbox project which is a platform that allows users to look at private messages across multiple social networks – think a Gmail or Facebook Messenger type experience, for all of your social networks. On top of basic messaging features, we wanted to integrate Inbox with existing Hootsuite functionality, including Hootsuite tagging feature. This would let users add and remove tags to private messages and filter over them. This was an interesting problem that I tackled, that involved some high level architecture decisions. Here is a general overview of all the relevant pieces before we made any changes.


  • We must be able to see tags on messages even if it was applied in another piece of the Hootsuite system
  • We must be able to apply tags and they should show up in other parts of Hootsuite
  • We want to be able to filter and search on tags relatively quickly
  • This should ideally all be event based and any changes to tags elsewhere should reflect in the inbox without refreshing the browser


The dashboard is our original monolith service that services a good portion of Hootsuite. It is already very complex and we do not want to add any functionality to the dashboard – as a company we’re trying to reduce our PHP dashboard to a thin layer, with most work being done by smaller services written in statically typed languages. However, part of the tagging infrastructure is currently in the dashboard and is exposed through a set of endpoints called Service Oriented Monolith or SOM. The tag service only contains the tags themselves (i.e. tag 123 has label “customer complaint”), but not the association between tags and messages (i.e. message 789 has tags 123 and 456). It’s a relatively new service and will have quick response times. Finally, all tagging related events are published on the message queue and are consumed by other areas of Hootsuite.

Solutions We Considered

We considered three approaches to adding tagging functionality to Inbox with considerations to time needed, code quality, and performance.

The first solution was to simply ingest tagged messages from the message queue and store them with our own messages. This solution is relatively simple to implement because any actions on tags are published over the message queue. We simply need to listen for any CRUD operations on tags and store them in our own messages store. However, this plan falls apart when we have to write our own tags. We would have to make sure that the dashboard listens to any events performed on the Inbox which is adding functionality to the dashboard. This is largely frowned upon because Hootsuite as a whole is trying to break features off the dashboard instead of adding features. Furthermore, we would be trying to keep two separate databases synchronized which is redundant and a waste of money.

The second solution was to go through SOM which is a set of endpoints on the dashboard exposed to internal services. This solution was appealing because it involves almost minimal code. We simply need to call the api everytime we wanted to create or delete an event. This way, every time a message passes through the inbox, we would enrich it with tags from SOM and every time someone tags a message, we would send the write request through SOM. However, developers are discouraged from using SOM endpoints as it is considered legacy and Hootsuite as a collective is moving towards a microservice architecture. Furthermore, search and filter by tags would entail that we would have to add more functionality to the dashboard which is not ideal since we want all tagging logic to live inside the tag service. We would also be hitting this endpoint with requests anytime anyone looks at a message which is bad since the dashboard is already under heavy load, and as a large PHP monolith, each request it services is relatively expensive.

The third solution was to migrate message tags into the Tag Service. This is intuitive because the tag service should be responsible for anything tag related and it would simply be a one stop destination for tags. However, the issue with this solution is that it involves a database migration which is non-trivial to execute. It would also make the work a little more complex since we would have to port all our logic from PHP to our new service. Finally, it could also affect other consumers of tagging related events if we accidentally or purposely make any changes to the logic on migration. This solution would take the longest to implement by far, but it will significantly separate logical components of the codebase.

The Solution We Chose

We went with a mixture of the first solution and the third solution. We decided to go forward with the db migration because message tags should not live in the dashboard’s DB. This would clean up a lot of the tech debt and ensure that any new integration with tagging would be straightforward and clean since the Tag Service will be completely decoupled from the dashboard. However, we would still consume tagging events to allow Inbox be event-driven, and to allow for complex search queries on tags.

Our plan of action would be

  • migrate all the logic to the Tag Service
  • migrate the tagging data to the tag service db
  • point existing clients to the Tag Service
  • deprecate the dashboard endpoints

Current Progress

We have already begun to consume tag events and have already started to migrate logic to the Tag Service. Once we are done, our tagging infrastructure will looking like this

We can see that centralizing all tagging functionality greatly reduces the dependencies and complexity of the system. Now any feature that requires tagging data can simply talk to the tag service, and the dashboard will have nothing to with tagging, which falls in line with the principles of microservices in that services mark a distinct separation in responsibility.

About the Author

Henry is a coop :p. He enjoys powering the human connection through social media. I’m on the right.



At Hootsuite, our value proposition lies in the ease of managing multiple social media accounts in one central place. This means that we have to build a platform to integrate the numerous social media that we support. As part of my team’s effort in social channel optimization, we found 38+ hard coded places across various large codebases that we had to modify to support a brand new social network.


To improve the development process as we seek to add the next N social networks, we want to find an efficient and elegant way to support new channel integration. The 38+ hard coded places we had to modify greatly hinders our efficiency and is a tech debt we seek to tackle in order to effectively manage the supported social networks. Having to add an extra hard coded checks like ‘case Twitter, case Facebook’ is a painful and long process that’s cumbersome, therefore our team decides these problems can be solved with a few solutions including generic networks configuration that’s accessible by all the relevant components that need to handle the social network, and using a few methods to break down a monolithic code base.

Overview of the problems

Monolithic Code

To give a brief overview of how the monolith looks like without showing the actual code, consider this: there’re 514,697 lines of code (as of Dec 2017) in the platform and 120,449 LoC in the social network management service.

 Hard Coding

As of 2017, the main channels that Hootsuite supports include Twitter, Facebook, LinkedIn, Google, etc. To give an example of how we determine whether the code handles a social network right now, we have something like:

Codes like these are intended to ensure that the correct social network is selected. There’s a caveat though, giant social networks often have other smaller offerings like Facebook Group, Facebook Page, and other products they produce. The current solution is to have a plethora of if statements to make sure that the target social network product is properly caught in the if statements. Extensive defensive coding like these can be a concern if the data structure is edited in the future, or when it misses a subtle edge case that has never occurred in the past, which is a hard problem on its own.

To illustrate a potential problem when the data structure changes, consider this example code

One problem arises if in the future, the networks’ prop, or object, is modified to not have a key of socialProfile or type, this code will break and if you’re lucky, you get an undefined error.

Our Solutions

On Decomposing the Monolith
My team is working on adding a brand new social media. We firmly believe in letting the microservices/components decide whether they can handle a specific data. This way, we avoid doing extensive ‘if’ checks, and in the case that the component cannot deal with the input, do nothing.

To achieve that, we relied heavily on the delegation pattern. It is an object composition technique that allows a second object to handle the request.

We employed this pattern to allow a parent class or object to delegate the responsibility of handling the request to a child object. This way we hide the complexity and allow the subclasses to decide whether they understand the request. This is important as Hootsuite has a large number of services that are interconnected and the ability to delegate the corresponding child classes or services that can take care of the relevant requests lets us achieve code reuse. Therefore, this pattern a monolithic code base can effectively delegate its requests to other microservices, reducing the size of the monolith.

On Hard Coding
As mentioned above, one major pain point our team encounters is to know where to insert an additional ‘if’ statement when we add a new social network. For instance, if we were to add TheNextCoolSN, we would have to do

To avoid this, we created a networks configuration file. In this file there are network specific details that are different for every social networks. For example, for Twitter, the word limit you can put in a tweet is different from that of Facebook. We categorize this effort as our goal to achieve data driven development where we allow the data to decide the flow of our program.

The networks configuration file passes its data down to a parent class that determines which child classes to call based on the data that’s given to itself.

We can define a network specific structure such as

Using the same technique, we can define a network specific structure as properties that should be passed into the component easily, and allow the child class to get its correct properties or parameters. This allows the data that’s passed in to determine which child classes should be rendered, effectively eliminating the need for if statements.

There are tradeoffs to be made with this programming approach though. The code is less readable due to the abstraction created by using data, and the code can suffer run time errors.

On General Optimization
There is another problem my team wants to solve. We realized that due to the many social networks we currently support, there are many network specific message models that cannot be reused with other networks. Therefore, we decided to build our generic model that will support all future social networks.

There are drawbacks to this model though, including the lack of type safety and the incompatibility with Google’s protocol buffer. Protocol buffer is used extensively at Hootsuite due to its lightweight and versatility, however, they do not support generic data structure such as the one we developed for our generic social network object.

We decided that the tradeoffs between ‘genericity’ and code reuse outweigh the drawback.


We pushed towards Data Driven Development, and ‘data driven’ is unfortunately vague, hence for our purpose, we define it as ‘network driven’ development. We allow a networks configuration file to dictate the flow of our program and it in turn, enables a smooth development experience that is virtually free of hard coded values. With Data Driven Development, our team’s effort will be critical in enabling external developers to contribute to Hootsuite without needing us to change our codebase.

About the Author

Jonas is a Full-Stack co-op on the Strategic Integrations team at Hootsuite. He studied at the University of Toronto. Connect with him on LinkedIn.

What is Protocol Buffer?

Protocol buffer (protobuf) is an IDL (Interface Definition Language) by Google that describes the API. IDL sets up the interface for clients and servers so they can communicate using RPC (remote procedure call). RPC allows calling remote services as simple as calling local functions.  gRPC is Google’s implementation of RPC that uses protobuf by default, and uses HTTP/2 for transport. The reason why we use protobuf as our serialization framework for SDK generation, instead of another framework like Apache Thrift or Fast Buffer, is because of protobuf’s support of gRPC. gRPC is much faster and safer than REST while requiring less boilerplate code, allowing the code to be less verbose.

At Hootsuite, we use protobuf because of its support of gRPC, along with its efficiency and flexibility. Protobuf, unlike some other data serialization formats (such as XML or JSON), utilizes binary serialization which makes .proto file small and fast to generate and parse. Additionally, protobuf is both language and platform independent, which supports Hootsuite’s diverse technology stack.

Why does Hootsuite use Protobuf to Generate SDK?

Using protobuf to generate SDK helps to create faster to produce, less erroneous, yet more maintainable and flexible code. By generating the SDK instead of hand writing it, we save a lot of time. Theoretically, generated code should have less errors, although that is not always true in reality. Protobuf helps document and describe the API specifications, so additional documentations like swagger would not be necessary. As well, using protobuf to describe the API makes it possible to refactor existing services for new services, as long the it conforms to the proto defined API. Another advantage protobuf has is its ability to generate SDKs in multiple programming languages, Protobuf’s easy to refactor nature and multiple languages support allow Hootsuite to continue to embrace new technologies.

How do we use Protobuf to Generate an SDK?

Now that we understand how useful generating SDK from protobuf can be, the question becomes: How do we use protobuf to generate an SDK? First we need to create the .proto IDL file. Afterwards, we use the IDL to generate the SDK with the help from Hootsuite’s internal IDL-to-SDK repository.

Diagram 1. IDL to RESTful SDK generation

The IDL-to-SDK repository contains a Makefile that compiles the IDL repository using protoc in order to generate swagger. Swagger is a set of rules for documenting API to ensure all API are standardized and nicely visualized for easy understanding. At Hootsuite, we use swagger to describe our RESTful API. Protoc is the Protocol Buffer compiler that generates stub gRPC classes in various languages. These languages include Go, Scala, Php, and more. We need to first use IDL to generate swagger since protobuf does not support REST/HTTP v1.1 SDK generation directly. However, protobuf can generate gRPC SDK directly.

While REST and gRPC both allow clients to communicate with servers, they have distinct differences. REST uses HTTP v1.1 whereas gRPC use HTTP/2. gRPC allows you to define any kind of function calls (ie. synchronous, asynchronous, unidirectional, bidirectional) whereas REST, being a design pattern, is constricted to a limited set of function calls (ie. GET, DELETE). Also, since gRPC calls remote service calls like local calls, it is faster than REST implementation.

After the swagger gets generated, it is passed to swagger-code-gen to generate the REST SDK in a new repository. Swagger-code-gen is a tool used to generate documentation, server stubs and client SDK by parsing swagger. Afterwards, the SDK repository is ready to be used in production.

Despite the advantages gRPC offers, and the additional steps it take to generate REST SDK from protobuf, at Hootsuite we continue to use REST SDK instead of making a complete switch to gRPC SDK. The reason is because a lot of other external and internal services still rely on REST. Thus, in order to maintain flexibility, Hootsuite generates REST SDKs and in the future, will transition to gRPC by generating gRPC SDKs.

Diagram 2. Hootsuite’s current IDL to SDK generation that supports only RESTful SDK compared to Hootsuite’s future IDL to SDK generation will support both RESTful and gRPC SDK

What are the Disadvantages to Using Protobuf to Generate SDK?

Protobuf to SDK generation has its advantages, but also disadvantages. Some downsides to using protobuf to generate SDK involves its generated nature and its lack of existing support and usage within the Hootsuite ecosystem. While generate SDK is time efficient, it could be harder to understand, debug, and test because it is not written by a human. Involving newer technology into older codebase can require extra steps. In this case because there is no direct protobuf to REST generation, we need to generate swagger as an intermediate step. Eventually, Hootsuite’s goal is to transition towards using gRPC. However, until the transition is complete, both gRPC SDK and the REST SDK has to be maintained which will require more resources than using a single SDK.

Overall, protobuf allows for a flexible and fast way to generate SDK with ease. Despite Hootsuite’s lack of support for gRPC services and SDKs currently, this is the direction we are heading towards.


Sam Reh, Brandon McRae, Adam Arsenault

About the Author

Joy Zhang is a Software Developer Co-op on the Strategic Integration team at Hootsuite. During her work term, she worked on a major LinkedIn V2 migration and saw it to completion. She studies Computer Science at the University of British Columbia. When she’s not at school or at work you can find her attending hackathons, seeking out delicious food and travelling the world. Connect with her on LinkedIn, Github , Twitter ,Medium and Devpost.

React drives front-end development at Hootsuite.  It is a modern, powerful javascript framework that allows the view to be easily rendered as a function of the state of the application.  Components can pull information about state from the props their parent passes them, from the global application state (enabled typically with a flux framework), or remain stateless.  We’ll take a look at how we solved a limitation with Google Analytics tracking in a React + flux context.

The Problem

The message composition flow allows for the application of link settings on URLs for both scheduled and instant posts.  URLs can be shortened (i.e. or and tracked with Google Analytics, Adobe Analytics, or any other custom web analytics tracker.  This is facilitated by passing utm parameters to the URL, and allows the user to track traffic sources and other data to analyze the effectiveness of advertising and marketing campaigns.

However, Google Analytics only recognizes five parameters:

  • utm_source
  • utm_medium
  • utm_campaign
  • utm_term
  • utm_content
and only officially supports one value per parameter.  This limits the flexibility and customizability of tracking – for example, when requiring multiple values for a campaign or content.

The solution is to concatenate additional values onto each parameter in the query string, separated by a user-specified character delimiter (defaulted to an underscore).  While simple, it enables filtering and other functions using these additional values within the Google Analytics dashboard.  To implement this change, the model representing link settings needs to be updated to map multiple values to a parameter, but existing link settings will necessitate that the old model be supported alongside the new one.

Updating the Model

Link tracking parameters are stored as an array of objects, each representing a parameter.  In the legacy model, each parameter object is associated with a single value.  (If you’re curious about what those fields refer to, the type field indicates whether it is a manual or dynamic value, while the typeValue field distinguishes between different dynamic values.)

With the new model, each parameter object is mapped to an array of objects representing parameter values.  These parameter values retain their type and typeValue fields to accommodate dynamic and manual values.  For backwards compatibility with existing link settings, the parameter object is enumerated as a new “Compound” type.

Updating the code

To see why React is well-suited to our multiple value aspirations, we first have to get a sense of how React works.

React is designed with flux in mind, which entails uni-directional data flow.  And the converse is true too; while you could theoretically use a flux pattern with any other frontend framework, it is usually coupled with React.  The flux library we use at Hootsuite is flummox, but since its deprecation we have been transitioning to redux.

A user interaction with the view will trigger an action, which modifies the data in the store.  React will handle the store changes and automatically re-render the view with the modified data.  Data from any API calls are also typically saved to the store via actions.

Since the view follows directly from the state (stored in the store), this allows us to easily support the legacy model as well as the new model.  Instead of having to construct two components that handle each model separately, we can refactor the existing component to render different views depending on whether or not the state possesses certain properties – in this case, whether or not it contains compoundTrackers.  In addition, all the existing structures we have set up for the service calls and stores can be reused.  These are the pros of React, and enforce the concept of the view being a function of the state.

Here, we can see in the code that the rendering component has been updated to detect the new model.  If multiple values exist for a parameter, their input fields are rendered below each other.  If, instead, we are dealing with parameter(s) stored with the legacy model, it simply defaults to the else branch and renders as expected.

With this change, the creation UI is modified.  On the left is the view for customizing/creating a link setting without multiple values per parameter, and on the right is the same view modified to support parameter concatenation.

We also update the generated sample URL to reflect the additional parameters values concatenated.  This is an example of what a URL with link settings applied will ultimately look like.

Challenges and improvements

Because the view is designed to be driven by the model, a problem exists with legacy link settings.  Although they will maintain their existing functionality, they will not be able to support multiple values per parameter.  This is because they will still be represented by the old model, lacking the compoundTracker field.

While it would be easy to add a check for this in the code, the series of seemingly arbitrary if-else branches would pollute the readability of the code as well as affect the scalability of the component in the future.  An alternative would be to migrate existing link settings to the new model, but that would be slightly overkill for the task.

Final Thoughts

For this project, we elected to have the state exist in a store to make it globally accessible.  This is necessary since the link settings component exists in multiple contexts.  However, React gives you the flexibility to store state in different areas depending on the scale of the application and your needs.  For example, very specific or niche data can usually be stored directly in the state of the component itself.  A component could even be stateless and solely responsible for “dumb” rendering.

With React, we were able to easily extend link tracking functionality by updating the model and refactoring the link settings component.  In general, React is flexible enough to scale up or down – partly due to its coupling with flux, and partly due to modular component design being one of its central tenets.  Hopefully, this has given you a little more insight into how and why we use it at Hootsuite.

About the Author

Patrick is a full-stack developer co-op on the Plan and Create team at Hootsuite.  He is a 3rd year student at the University of Waterloo studying Computer Science.  His favourite superhero is Jean Grey.  Connect with him on LinkedIn.

Loading ...