Lessons Learned in Building Hootsuite’s API
One memorable moment I had while on the APIs and Integrations team at Hootsuite was trying to debug a mysterious HTTP 500 response coming from one of our endpoints. Normally, one would check the logs of the API service, or perhaps navigate to its error handling code to see under which circumstances a 500 should be returned. The problem was that errors were generated at several different levels, from several different services and even worse, were modified repeatedly as they propagated to the end user.
As you can imagine, debugging this was a nightmare and involved some creative usages of prints scattered throughout the code base. I did eventually figure out the problem (converting years previous to 1970 didn’t play well with UNIX timestamps), but boy was it a lot more difficult than it should have been. It got me thinking and it led to some conversations with the team about how the decisions we made led us to where we are today. Some of those inspired this blog post to share the rationale behind our design choices and perhaps share our experiences for anyone in the same position as we were.
The first run of the API was built for a single customer and contained some basic user management functionality across five endpoints. It met the customer’s needs and worked well enough. However, due to inexperience and time pressure, the endpoints contained several inconsistencies in data types, error models, and field names. Looking back, it’s clear now that some endpoints also lacked RESTful patterns. For example, we often didn’t embed resource IDs into URIs, preferring to pass them as parameters instead.
At the time of writing, our API is comprised of 30 endpoints and we have both expanded our original user management API and added support for messages, social profiles, and media as well. Along the way, we’ve also made significant improvements in our API’s consistency, RESTfulness and developer experience. Here are four lessons we’ve learned as a result.
Lesson 1: Focus on your errors and focus on them early
While not usually a major talking point during the preliminary design, focusing early on hammering out an error schema can prevent a lot of headaches down the line. We fell victim to the trap of not establishing clear guidelines of how we’d like our errors to look. Without an overarching plan our errors grew organically as we supported integration with our various microservices.
This eventually resulted in a tangle in our error cases. Below is an example of the confusion that developers had to deal with when trying to decide how to return an error to the end user. Some of these errors were throwable exceptions, some were used to map to internal dependency errors and some were user-facing errors. On top of that, duplicated codes and inconsistent error messages would have made it difficult for end users to handle the errors.
This is still an area of active improvement for us, but the key realization we came to was the importance of establishing a protocol early on and sticking to it. Doing so doesn’t necessarily require a high level of investment, either. There are several good open-source community best practices on how to handle errors that can used as a strategy. We’ve adopted a simple scheme with incrementing codes with generic error messages, which combined with reusing the same error object in exceptions, mapping, and responses, has significantly cleaned up our errors.
Lesson 2: Give your documentation some loveThe APIs Team is in a unique position to be both consumers of internal APIs and producers of external APIs. While our experience reading internal documentation has been mostly positive, we have encountered difficulty integrating services because of roadblocks like incomplete or missing documentation. In order to make sure that our end users don’t have a similar experience, we have endeavored to always provide current and accurate documentation.
The benefits of good documentation extend beyond smooth developer experience. They can be used to rapidly prototype for customers and with the right tools, can be used to ensure the correctness of the APIs that they describe. It’s even possible to automatically generate SDKs just from documentation.
We have tried several different API documentation generators, but ultimately decided on using Swagger. It allows endpoint definitions in plain YAML and uses templates such as ReDoc to easily generate attractive documentation pages. Most importantly, Swagger also allows code generation and can create SDKs automatically. We have also incorporated the usage of an automated API testing tool, Dredd, which will read our documentation, make API calls, and ensure the response matches our sample.
Lesson 3: Avoid fragmentationWe wanted to improve on the first version of our API, but we couldn’t directly modify it without potentially breaking its existing integrations. Instead, we elected to create a new API version with our improvements and support both versions concurrently. We quickly grew our second version, but our changes introduced more and more technical debt in the form of duplicated logic and legacy dependencies. Our debt was growing too much for comfort, so we took the first opportunity we got to merge and we promptly took it in order to start reducing our accrued technical debt.
On the technical side, collapsing the two versions together turned out to be an ongoing task. We managed to update the endpoint URIs and corresponding tests but uncovered various discrepancies in message schema and dependencies that also needed to be addressed. While we are still working on complete unification, we’ve already significantly reduced the cognitive load for developers and opened the door to further improvement. Most importantly, users get to enjoy the benefit of a cleaner, harmonized API.
While this approach did preserve existing integrations of our API, we also accrued some significant technical debt as a result. Instead of the two extremes of abandoning and revamping the first version, a balanced approach would have been the most effective. Working closely with users and iterating quickly with prototypes would have allowed us to make all the improvements we wanted while taking care to meet customer needs.
Lesson 4: Remember to test in productionIt’s rare to see any modern software development team neglect the importance of tests. Everyone knows how critical they are and how not having them allows all sorts of bugs to leak out. However, even excellent development teams will sometimes skimp on adding tests in production. There are all sorts of reasons for this, such as test flakiness or developer time crunch, but their importance cannot be understated. Our production tests have helped catch several critical errors, including once when our OAuth token exchange process broke.
If writing a full integration test suite for production is too time consuming, there are several tools that can be used to write happy path tests without much work. One example is Runscope, which allows test cases to be written with a few dropdown menus. In case webhook testing is also required, we have made use of RequestBin to receive webhooks.
Moving forwardWe’re proud of the progress we’ve made since our initial API release, but we still have our work cut out for us to cut back on the inconsistencies mentioned above. We’re constantly trying to balance work reducing technical debt and adding new features, but the lessons we’ve learned have helped tremendously moving forward. Check out our current API documentation at https://developer.hootsuite.com/docs/api_v1.html.
If you have any questions or thoughts about the lessons we’ve learned or about our API itself, feel free to leave a comment below. We would love to hear them!