By Daniel Zhang on August 30, 2017
Over the past two months, I’ve had the chance to work on Hootsuite’s Amplify team to implement a system for detecting buying intent in tweets, in order to help Amplify users track prospective customers. While traditionally Amplify has relied on user-defined keyword matching to filter tweets, that user would still have to sift through these potential buying signals in order to find leads. By integrating a system that intelligently scores tweets and ranks contacts, we make Amplify more effective at automating the social selling process. This post details the decisions and challenges I’ve come across in implementing these changes.
Initial ConsiderationsOne of the defining factors of any scoring system is how it chooses to interpret its data. In our case, this was deciding on how to evaluate the stages of the buyer’s lifecycle. Kulkarni, Lodha, & Yeh (2012) described three main discrete steps in online buying behavior, where a customer:
- Declares intention to buy (express intent, or EI)
- Searches and evaluates options (purchase intent, or PI)
- Makes post-purchase statements (post-evaluation, or PE)
We decided to evaluate tweets which fell into those categories differently. For example, a post such as “Should I get a Mac or a PC?” expresses a much higher intent to buy than a post where somebody expresses their thoughts on a product they just bought. There was also the problem of ambiguity – for example, in the case that somebody states “I am going to buy X type of product”, it would be difficult to know for sure whether they were simply expressing an intent to search for X type of product, or if they were literally steps away from purchasing that product. For these reasons, we decided on ‘base scores’ for posts such that PI > EI > PE.
Choosing a Service
|IBM Watson||✔ Natural Language Understanding Service||✔ Implied within Tone Analyzer||✔ Tone Analyzer or
Natural Language Understanding Service
|✔ Natural Language Classifier|
|Microsoft||✔ Text Analytics API (in Preview as of June 2017)||✕||✕||✔ Language Understanding Intelligent Service (in Preview as of June 2017)|
|Converseon||✔ ConveyAI||✔ ConveyAI||✔ ConveyAI||? Waiting for their response|
Another initial consideration was which machine learning service to use. While looking at available services, we searched primarily for services which provided not only text classification services, but also insight into sentiment and intensity. We decided on using IBM’s Watson Developer Cloud for their large ecosystem of services.
ImplementationAs Hootsuite continues to transition from a monolithic architecture to microservices, our scoring system, the Intelligent Contact Ranking Engine (ICRE), was implemented as a Flask web service within Kubernetes. This provides a layer of abstraction between our existing Ruby back-end and the handling of asynchronous requests and scoring of posts done by the ICRE.
While the ICRE acts as an adapter between our back-end and IBM’s Natural Language Classifier (NLC), it handles some functionalities, as well as technical hurdles that we encounter along the way. Here are a few:
Batch RequestsOne challenge we came across was a lack of support for processing batch requests. Potential buying signals (tweets) are collected in batches by a job in Amplify’s back-end. This would be fine, except it takes a total of, on average, 0.8 seconds for a request to be sent to Watson’s NLC, and for a response to be returned. Given the nearly 2.6 million keyword-matched tweets stored in our database, it’s clear that sending these tweets one by one would be a major bottleneck. The ICRE optimizes this process by using a thread pool. Though parallelism in Python is limited by the Global Interpreter Lock, most of the processing is done on IBM’s side, so any inefficiency is minimized.
Re-training & ScoringAnother minor challenge came with training our ML model. An interesting aspect of Watson’s NLC is that once a model (termed ‘classifier’) is trained, it cannot be retrained. This means that if we ever wanted to retrain our model, we would have to initialize the training process on IBM’s side, wait for that classifier to finish training, and then switch the classifier_id in our code to use that new classifier. ICRE reduces this complexity for developers in two ways:
- Allowing devs to call a training event with a simple command when running the ICRE
- Automatically detecting the latest available classifier each time it’s called
All of these implementation details allow the ICRE to reliably calculate signal scores given the information returned by the Natural Language Classifier.
Training & MethodologySo how did we train the model? As we didn’t have any previously available annotated dataset, I had to create the dataset myself. In essence, the idea was to collect data for as many feasible cases as possible. I collected training data by manually classifying information from:
- Keyword-matched tweets from our production databases
- Tweets grabbed through keyword searches
- Product reviews online
I found that there were a few patterns to tweets which fell in the same category – here are just a few examples of soft rules/guidelines I outlined while classifying training data:
E.g., “I’m looking to buy a new 6-10 seater dining table. Any recommendations?”
Express Intent is simply the declaration of the desire to purchase. This often includes:
- Keywords such as “want”, “wanna”, “desire”, “wish”, etc.
- Expression of anticipation
E.g., “I’ve decided my new PC is going to be #Ryzen based, unless someone can convince me to buy #Intel?”
Purchase Intent includes both the search and evaluation of options. This often includes:
- Asking for details on how to obtain something
- Asking about the ‘goodness’ of a product/service
- Asking for opinions of one purchase option versus another
E.g., “Solid purchase, no regrets.”
Post-Evaluation can be thought of as a review or statement after having purchased a service/product.
On ConsistencyWe decided that we would create a more effectively trained model by manually classifying data against a set of soft rules rather than crowdsourcing for labelled data. I found that in cases where context or images were required to fully understand a tweet, data classification could be ambiguous, especially when spread across a wide variety of people with different understandings of what “purchase intent” may mean. Consistency is key to training a good model. With an initial training set of over 1700 classified strings, we found that we were already getting good results.
While the original purpose of integrating intent analysis into Amplify was to score posts so that we could intelligently rank contacts on the front-end, we came across new possible use cases while implementing support in Amplify’s back-end. This led to some important decisions about how we should handle contact scoring.
Syncing Post Scoring and Contact ScoringIn an ideal situation, we would update contact scores whenever post scores are updated. However, this introduces unnecessarily high server load. Recalculating contact scores every time post scores are calculated (a job run on a fairly small interval) would mean running many postgreSQL queries involving both a join and a summation at that same interval. This is computationally inefficient.
Incrementing contact scores each time a post score is updated might have been ideal, but old post removal is automatically handled by our database – therefore making it difficult to track when we would need to decrement contact scores.
The most efficient way, then, of going about contact scoring would probably be to do a user-specific calculation every time they look at their contacts list, with a time limit between re-calculations. It would ensure that we don’t calculate contact scores for users who don’t need them… but we didn’t do that. Why?
We found that contact scores were valuable for features other than sorting the contacts list. Some features actively required contact scores in the background. For that reason, we needed to find a solution which would efficiently calculate contact scores for all users.
The Middle GroundUltimately, we decided to integrate contact score calculation into an existing daily job that already processes all contacts in our database. This allowed us to offload much of the calculation work to the database while adding only a few more calculations to an already existing and tested job. Now, incremental updates are done upon post scoring, allowing for immediate contact ranking throughout the day. This can be done because the daily job recalculates contact scores from scratch, therefore ignoring any scored posts which may have been removed due to age.
Integration into Amplify’s Front-endAfter integrating support for contact scoring in the backend, updating our app to support contact ranking simply consisted of:
- Updating the retrieval of contact profiles to include their contact score
- Sorting the contacts list by score and name, rather than by name alone.
ConclusionThe integration of contact ranking in Hootsuite Amplify demonstrates just one use case of machine learning for businesses. While purchase intent scoring was originally implemented solely for this use case, it has already proved useful in other features (e.g., alerting sellers about definite buyers with push notifications). The value of machine learning, in this case, isn’t a flashy new feature, but rather a subtle change which provides greater value to Amplify’s users. In this way, leveraging cutting edge technology in even subtle features serves as an indication of the potential for machine learning to drive intelligence in so many industries in the present, and in the future.