Posts from August 2017

Background:

On the datalab team at Hootsuite, we help the rest of the company make data driven decisions. A big part of this is the collection of data. We collect the data from external vendors that we work with such as Salesforce and Marketo. We also collect data on the Hootsuite product itself. We load all of this data into a single data warehouse, where our data scientists, business analysts, and product teams can easily use.

The engineering side of the datalab has 3 main tasks:

  • Collecting data
  • Enriching the data
  • Storing data in different formats
In order to do all this datalab develops specialized apps that do one thing very well. These apps get input from one or many internal and external data sources. These sources are a mix of internal APIs, databases and data streams as well as external APIs. This first step is known as the extract step. The apps then process the data. This could range from enriching one data source with data from other data sources, or doing some calculations and enriching the data with the results. This second step is known as the transform step. The app finally loads the data into another data source. This last step is knows as the load step. In data engineering parlance these apps are called ETLs: Extract, Transform, Load.

Problem:

We are dealing with large volumes of data and all the data operations are subject to a high standard of data quality. The result of one data operation is often an input to multiple other data operations. Therefore, one problem can easily cascade into decreasing the data quality of other parts of the system. There are a large number of stakeholders consuming this data every moment. These stakeholders use the insights for making critical decisions. Therefore, it is imperative that the data is always correct and complete.

These are some of the technical difficulties that the datalab had to solve to create a reliable data pipeline:

  • How can we quickly spot any anomalies in the system?
  • How can we easily troubleshoot and fix the problems?
  • Given that the output of some jobs are the input to others, how do we make sure that the jobs run in the correct order every time? Additionally, if one of the components in the pipeline fails, how do we prevent it from affecting other parts of the system?
  • How should we manage our infrastructure? Some apps run continuously and some run periodically. How should we schedule and deploy these apps on our servers? We want to make sure that all the apps have enough computing and memory resources when running. But we also want to make sure that our servers are not sitting there idly (cloud bills can be expensive!).

Technologies:

1- Docker:

Datalab packages all of its apps as Docker containers. In practice, you can think of a Docker container as a Linux machine that runs on your development machine or on your server. Your app will run inside of this container, which is completely isolated from the host machine it is running on.

Docker has a much lower overhead than a virtual machine, as it does not require running an entire kernel for each container. In addition,while you can put limits on the resources for each Docker container, the container only uses those resources if it needs to. The resources dedicated to a VM, on the other hand, are not usable by the system anymore.

This lightweight environment isolation enabled by Docker provides many advantages:

  • Environment setup: For an app to run properly it requires that the server has all of the right versions of the dependencies installed. Furthermore, the file system has to be set up as the app expects it too. Finally, environment variables have to be set correctly. All of this configuration can lead to an app working perfectly fine on my computer, but not on yours, or on the server. Docker can help with these issues. A developer can write a dockerfile which is very similar to a bash script. The dockerfile has instructions on what dependencies need to be installed, and sets up the file system and environment variables. When a developer wants to ship a product, s/he will ship a Docker image instead of a jar file or some other build artifact. This will guarantee that the program will behave exactly the same irrespective of what machine it is running on. As an added bonus, Docker has developed tooling so you can run Docker containers on Windows and Macs . So you can develop on a Mac or a Windows machine, but rest assured that your program will work on a server running Linux.
  • Easy Deployment: By running the dockerfile the developer can create a Docker image. You can think of a Docker image as the harddrive of a Linux computer that is completely set up to run our app. The developer can then push this Docker image to an image repository. The servers that actually run the app, will pull the image from the image repository and start running it. Depending on the size of the image, the pull can take a bit of time. However, one nice thing about Docker images is their layered architecture. This means that once the server pulls the image for the first time, the subsequent pulls, will only pull the layers that have changed. If the developer makes a change to the main code, it will only affect the outermost layer of the image. So the server will only need to pull a very small amount of data to have the most up-to-date version. If an app needs an update, the server will pull the recent version and starts it in parallel to the running old version. When the new version is all up and running, the old container is destroyed. This allows for zero downtime of the server.
  • Running Multiple apps on the same server: Docker allows us to run multiple containers on the same server, helping us maximize the value that we get out of our resources. This is possible thanks to the low overhead of these containers. Moreover, If each app needs a different version of Python, or a different version of SQLAlchemy, it does not matter, as each container has its own independent environment.
So now that we have come to resource allocation, one might wonder, how do we make sure that each container has enough resources and how do we manage all the servers required for running our apps.

2- ECS:

Companies that adopt the Docker technology and microservice architecture along with it, will end up with many specialized small containers. This calls for technologies to manage and orchestrate these containers. There have been an array of technologies that make managing and deploying of containers easier. Some of the more well known names are Kubernetes, Amazon ECS, and Docker Swarm. Here at datalab we have picked Amazon ECS as our primary container orchestration solution.

Amazon ECS provides a cluster of servers (nodes) to the user and takes care of distributing the containers over the cluster. If servers are running out of memory or computing resources, the cluster will automatically create a new server.

We have some apps that are constantly running on this cluster. But there are also apps that only run periodically (once a day or once a week). In order to save on costs, we destroy the containers that have completed their job and will only create them again when we want to run them. So you can imagine that the number of containers and the type of containers running on the cluster is very dynamic. ECS automatically decides which server to schedule new containers on. It will add a new server if more resources are required. Finally, It will phase out a server if there is not enough work for it.

In short ECS takes care of distributing containers over the cluster, and helps us pack the most containers onto the fewest number of servers possible. But how do we actually schedule a container to run?

3- Airflow:

Airflow is a tool developed by Airbnb that we used to help us with a few tasks. Firstly we needed a way to schedule an app to run at a certain time of the day or week. Secondly, and more importantly, we needed a way to make sure that a job only runs when all the jobs that it is dependent on have completed successfully. Many of our apps’ inputs are the outputs of our other apps. So if App A’s input is the output of app B, we have to make sure that app A only runs if app B has successfully run. Airflow allows the developer to create DAGs (Directed Acyclic Graph), where each node is a job. Airflow will only schedule a job if all of its parent nodes have run successfully. It can be configured to rerun a job in case it fails. If it still cannot run the job, it will send alerts to the team, so that the problem can be fixed as soon as possible and the pipeline can continue its operation where it left off. Airflow has a nice UI that shows all the dependencies and allows the developers to rerun the failed jobs.

So airflow will schedule our periodic jobs and it will notify us if things go wrong. But how can we screen the apps that run constantly?

4 and 5- Sumo Logic and Sensu:

As a developer the most useful thing that I use to debug my apps is the logs. However, accessing the logs on servers is usually hard. It is even harder when using a distributed system such as Amazon ECS. The dynamic nature of ECS means that an app could be running on a different server on each given day. In addition, if a container is destroyed for any reason, all its logs will be lost too.

To solve the complexity of capturing and storing logs on a distributed system, we have made the system even more distributed! Sumo Logic is a service that accepts logs from apps over the network and will store them on the cloud. The logs can easily be searched using the name of the app. They can be further narrowed down with additional filters. So if a developer needs access to the logs for a specific app, s/he can get them with only a few clicks.

This still means that in order to quickly identify a broken app, someone has to be constantly looking at the logs. That sounds super boring, so we have automated the process by using Sumo Logic’s API and another technology called Sensu. Sensu is an open source project that allows companies to check the health of their apps, servers, and more. Sensu regularly runs the defined checks and alerts the team if something is wrong. At a high level, Sensu has two components: Sensu server and Sensu client. The Sensu server regularly asks for updates from the clients. The clients run some tests and return a pass/fail message back to the server. The server can then notify the team if the check has failed.

One of the use cases of Sensu in datalab is monitoring the health of our apps. One thing that I found particularly interesting is that the system is designed in a way that no extra code needs to be added to the apps. This is done through monitoring the logs of an app. This is how it all works: Our Sensu client is always running on the ECS cluster. Let’s say that Sensu server wants to check on the status of app X. It will send a request to the sensu client asking for updates on the status of app X and a regex expression of what a success message looks like. Sensu client will then send a request to Sumo Logic API asking for all the recent logs for app X. Next, Sensu client will search through the logs and see if they include the success expression. The client will send a success message back to the server if it can find the success message, and will send a failed message otherwise. The server will send an email to the team when a check fails or if the client is unresponsive, and the engineer on call can take measures to resolve the issue quickly.

That was a long post, so I’ll stop here. Hopefully you have gotten an idea of how Hootsuite manages its big data. To summarize, we use Docker containers for the ease of development, deployment, and the modularity it provides, ECS to orchestrate these containers, Airflow to manage scheduling and enforcing the order in which apps should run, and Sensu and Sumo Logic for monitoring and troubleshooting our apps.

About the Author

Ali is a co-op software developer on the data lab. He studies computer science at UBC. Connect with him on LinkedIn.

Overview:

On the Plan and Create team we were using CommonJS as our module system until very recently when we ran into a circular dependency issue with our React components. For one of our new features we had a component that needed to reference a modal and that modal in turn needed to reference a new instance of the original component. This circular relationship caused a problem and webpack could not resolve the dependencies correctly. Our solution was to upgrade our modules to use ES6 import/exports. This enabled us to reuse the react components and avoid circular dependencies while moving us closer to ES standards. We upgraded as much as we could without affecting other teams.

What is a module?

A module is a reusable block of code that with data and implementation details for a specific functionality that is exposed as a public API to be loaded and used by other modules. The concept of a module stems from the modular programming paradigm which says that software should be composed of separate components that are responsible for specific functions that are linked together to form a complete program.

Why are modules useful?

Modules allow programmers to:

  • Abstract code: hide implementation details from the user, so they only have knowledge on what the object does, not how it’s done
  • Encapsulate code: hiding attributes in programming so they can only be accessed via methods of their current class
  • Reuse code: avoid repetitiveness in code by abstracting out methods and classes
  • Manage dependencies

ES5 Module System – CommonJS

ES5 was not designed with modules, so developers introduced patterns to simulate modular design. CommonJS modules were designed with server-side development in mind, so the API is synchronous. Modules are loaded at the moment they are needed in the order they are required inside of the file. Each file is a unique module with two objects, require and module.exports, used to define dependencies and modules.

Exports

Exports or module.exports is used to export module contents as public elements and a module identifier (location path of the module).

Require

Require is used by modules to import the exports of other modules. Every time you use require(‘example-module’) you get the same instance of that module ensuring the modules are a singleton and state is synchronized throughout the application.

ES6 Module System

ES6 introduces a standard module system based on CommonJS. In ES6 the module system operates differently to the mechanism above. CommonJS assumes that you will either use an entire module, or not use it at all whereas ES6 modules assumes that a module exports one or more entities and another module will use any number of those entities exported.

The two core concepts of the ES6 module system are exporting and importing. Each file represents a single module which can export any number of entities as well as import any other entities.

Exporting

Variable and functions that are declared in a module are scoped to that module, so only entities that are exported from the module are public to other modules and the rest remain private to the module. This can be leveraged for abstraction and to explicitly make elements publicly available.

Importing

The import directive is used to bring in modules to the current file. A module can import any number of other modules and refer to none, some, or all of the objects. Any object that is referred to must be specified in the import statement.

Other Benefits

  • In ES6 you get strict mode for free, so you do not explicitly have to say ‘use strict’ in every file. Strict mode is a set of rules for JavaScript semantics that help eliminate mistakes that prevent compilers from performing optimizations.
  • Because import and export are static, static analyzers can build a tree of dependencies.
  • Modules can be synchronously and asynchronously loaded
Limitations

At this time not all browsers implement ES6 module loading. The workaround is to use transpilers such as Babel to convert code to an ES5 module format.

Difficulties/Changes

Updating to use ES6 modules in our code base led to some problems in our test suite, causing majority of the tests to fail. We soon realized that this is because in our current test suite we were using the JS Rewire library to mock modules in tests which does not support the new ES6 module syntax causing everything to explode.

Rewire is important because it provides an easy way to perform dependency injection by adding getter and setter methods to modules us to modify the behavior of imported modules in order to test the component that imports those modules in isolation.

Luckily there is a an alternative to the JS Rewire library, babel-plugin-rewire, that works with Babel’s ES6 module by adding methods to modules allowing us to mock data and test components in isolation. To make this change you must include the babel-plugin-rewire in your package.json file in the dependencies section amongst other changes.

Example using the JS Rewire Library: *Note the CommonJS require syntax

Example using babel-plugin-rewire: * Note the ES6 import syntax – you can import named exports which only import the methods, not the entire library.

About the Author

Sonalee is a Co-op Front-End Developer on the Plan and Create team at Hootsuite. She is a Bachelor of Science in Computer Science and a minor in Commerce at the University of British Columbia. In her spare time, Sonalee enjoys hiking, snowboarding and foosball.

Follow her on Instagram or connect with her on LinkedIn!

In the summer of 2017, I had the pleasure of joining Hootsuite’s Product Design team as a UX designer for all things mobile. I first came to Hootsuite looking to get a taste of the tech industry and am so thankful to have spent 4 months in such an inclusive, collaborative, and welcoming environment. Even though I was just a co-op student, there were endless projects and opportunities that I was able to have full ownership over. This creative freedom and level of responsibility was a rewarding learning experience, mainly thanks to the amount of mentorship I received along the way. By working alongside UX designers, user researchers, developers, and product managers, I got to see my designs being developed, learned the process and what it takes to ship a product.

 

One of the most valuable lessons I learned at Hootsuite was the importance of evaluating your design before breaking it down into phases for release.

 
Often as design students, we’re taught to create meaningful, delightful end-products. We develop a vision for our designs and create mockups of this ‘ideal state’. But what we fail to consider is how to follow through with our ideas. How will your designs be implemented? How will they be built? How will it be adopted by users? How will it scale over time?

All this requires careful consideration between multiple stakeholders, including you, the designer. A fellow UX designer at Hootsuite, Guillaume D’Arabian, describes it with this pyramid-shaped model that illustrates the relationship between business, design, and technology. By finding the right balance, you’re able to meet the needs of your customers. The real challenge however, is maintaining this balance over time and adapting to unexpected circumstances–from customer feedback, unaccounted use cases, and edge cases.

 

So what do we mean by phases?

It’s great to run developers through mockups and the vision of your design, but when it comes to putting a product out into the real world, it takes time and multiple iterations. In order to achieve your vision, you need to break your design down into phases to account for how it will be released.

Phases represent the scope of a particular sprint or release. They’re helpful for everyone because they define the priority of what needs to be built, show how each element or feature will be implemented, and create a shared understanding of how your product will evolve. In order to determine the priority of what needs to be built, it takes a level of strategic, technical, and customer understanding. This is where the pyramid comes into play.

A lot factors emerge when defining the scope for each phase: from the business strategy of your product, the complexity of each feature, the bandwidth and resources available within your team, the known behaviors among customers, and adapting to customer feedback.

Here are 3 key considerations to keep in mind when designing in phases:

Create An Easy Transition

  • If you’re proposing a redesign or a new feature, how will you transition core users to your design?
  • How will you introduce new users to your design?
  • What will you introduce first and how?
  • Are you replacing a behaviour or creating one?
  • Have a general understanding of how your design will be built. What are the more complex pieces? What are the easiest pieces to develop?
  • Are there elements within the existing product that can be reused in your designs?
  • What pieces are crucial in solving the problem you’re designing for?
Ideally, this should help you visualize the steps you need to be take in order to jump from point A (the existing product) to point B (your new vision).

Consider How it’ll Scale

Design is never finished. It’s important to consider how your designs might adapt or scale over time, while keeping your vision in mind. In general, this is a good exercise to test the longevity of your designs. And even though you might not be able to predict it all, this might just help you define potential phases down the road and identify future areas of opportunity.

  • How will your users’ behaviours change as they transition from being a new user to an everyday user?
  • How will your design scale to a high volume of users?
  • How will your design scale as it adopts new features?

Creating your MVP

By now, you should have a general idea of how to create your MVP (minimum viable product) for the first phase/release. Coined by Frank Robinson and popularized by Steve Blank and Eric Ries, the term MVP refers to “the smallest thing that you can build that delivers customer value.” Your MVP should be able to address the core problem and provide basic functionality, but most importantly, create a lovable experience. With every release, your customer feedback will either validate or refine your ideas, helping you plan for the next phase of release.


As UX designers, we’re trained to be considerate of our audience and our users—but this also applies to our own teams. The most successful teams are built out of strong relationships. In order to build a successful product, it’s important that we stay mindful of everyone’s needs.

 

About the Author

Amanda is a Co-op UX Designer on the Product Design team at Hootsuite. Working closely on Hootsuite Mobile, she’s worked across multiple teams from Publisher, Plan & Create, the iOS and Android core team, as well as the first team for Hatch–an incubator program at Hootsuite. During her off time, you can find her visiting local events, thrift stores, art exhibitions, or taking street photography.

Follow her on Instagram, Twitter, or connect with her on LinkedIn!

 

 

While working on Hootsuite’s Facebook Real-Time service for the past few months, I have had an extremely mind-opening experience dealing with back end development and the underlying architecture which makes all of the internal services work together. I will highlight the architecture of a Hootsuite service with reference to Facebook Real-Time.

Overview

The Facebook Real-Time service consists of two microservices; Facebook Streaming and Facebook Processing. A microservice is an approach of service oriented architectures which divides a service into smaller, more flexible, and specialized pieces of deployable software. This design feature was decided to ensure each service does one thing very well. It allows scaling of modules of a service according to the resource usage it needs, and allows greater control over it. One key reason is that Facebook Processing does a lot of JSON parsing which consumes more CPU. The way we utilize microservices, is by implementing each microservice as a specialized piece of software. In the case of Facebook Real-Time, Facebook Streaming specializes in collecting event payload data from Facebook, while Facebook Processing parses this data into specific event types.

Facebook Streaming

Facebook Streaming collects large payloads of information from Facebook when certain real-time events occur; a streaming service. Data streaming, in this case, is the act of dynamic data being generated by Facebook on a continual basis and being sent to the Facebook Streaming Microservice. These events can include actions such as a post on a Facebook page, or the simple press of the ‘like’ button on a post. Facebook Processing parses these large payloads of data into specific event types.

Webhooks are used in the collection of events from a Facebook page. They register callbacks to Facebook to receive real-time events. When data is updated on Facebook, their webhook will send an HTTP POST request to a callback url belonging to the Facebook Streaming Microservice and this will send the event payloads.

 

Facebook Processing

The Facebook Processing Microservice parses the event payload from Facebook Streaming into more specific events. So a published post will be one kind of event, and a ‘like’ will be another type of event. Here, the number of events which are assigned event types, can be controlled. This is important as event payloads are large, and requires a lot of CPU to parse. Limiting the set of event payloads being parsed at once, reduces CPU usage. So instead of parsing the event payloads at the rate they are received from Facebook Streaming, it can consume a set of these event payloads at a time, while the rest are in queue until the set of consumed events are already parsed.

We have also built a registration endpoint into the Facebook Processing service. Instead of manually adding facebook pages to the database of pages registered for streaming, the endpoint can be called by a service and will register the specified Facebook page.

 

Event Bus

A payload consists of batches of events from different Facebook pages; we call this a ‘Raw Event’. These raw events are published from Facebook Streaming to the Event Bus. The Event Bus is a message bus which allows different micro-services Apache Kafka technology and consists of a set of Kafka clusters with topics. A topic corresponds to a specific event type and consumers can subscribe to these topics. The event data corresponding to these topics will be collected by the consumer. A service can consume events from the event bus or produce events to the event bus, or both! Each service is configured to know which topics to consume or produce.

Event messages are formatted using protocol buffers. Protocol buffers (Protobuf)  are a mechanism for serializing structured data. So the structure of a type of event only needs to be defined once, and it can be read or written easily from a variety of data streams and languages. We decided to use protocol buffers because it has an efficient binary format and can be compiled into implementations in all our supported languages. These include Scala, PHP, and Python. Event payloads can also be easily made backwards compatible, which avoids a log of ‘ugly code’ compared to using JSON. These are just some key examples, but there exist many other benefits to using Protobuf. With the growing number of services at Hootsuite, there was the problem where the occurrence of an event would need to be communicated to other services, asynchronously. The value which the Event Bus provides is that a service where the event is happening does not need to know about all the other services which the event might affect.

Giving an overview of how the Facebook Real-Time Service interacts with the Event Bus, raw events are first published to the Event Bus from the Facebook Streaming Microservice and consumed by Facebook Processing, where the events are assigned specific event topics and sent back to the Event Bus. This allows for specific events to be consumed by other services. Additionally, the events stay on the event bus for a period of time until they expire. In our case, because we wanted to publish raw events and then parse them at a different rate than the events are being sent to us, this allowed us to use the Event Bus as a temporary storage tool and enable Facebook Real-Time to consist of two, separate microservices. It also allows us to offset the producer versus consumer load used for processing events in the Facebook Processing service.

 

Service Discovery

Skyline routing is used to send HTTP requests to the Facebook Real-Time service. Skyline is a service discovery mechanism which ‘glues’ services together; it allows services to communicate with each other and indicate if they are available and in a healthy state. This makes our services more reliable and we are able to build features faster. Skyline routing allows a request to be sent to a service without knowing the specific server which the service is hosted on. The request is sent, and redirected to the appropriate server corresponding to the service. It routes requests from clients to instances of a service, so it can reroute the request to another server if the service worker fails. This also includes when a service is being restarted, so if there are several instances of the service running, then requests will go to another instance of the service while the other instance is restarting. This functionality also allows for new instances of services to be set up if it is being overloaded and there are too many events in queue, improving response time.

In addition, the client can access all the services routed by skyline via a single URL based namespace (localhost:5040), by specifying the service name and associated path. The request will be routed to the corresponding address where the service is hosted.

In conclusion, a microservice publishes events to the event bus, which contains a large pool of events. These events can be consumed by another microservice, which will effectively use the event data. And the services can communicate with each other via HTTP using Skyline routing, a service discovery mechanism.

 

About the Author

Xintong is a Co-op Software Developer on Hootsuite Platform’s Promise team. She works closely with the Facebook Streaming Microservies which is built on Hootsuite’s deployment pipeline and uses the event bus to communicate event information to the Inbound Automation Service. Xintong also enjoys the arts and likes to invest her creativity into fine arts and music.