How to build a chat bot that actually works for your business

My professional career requires me to dabble a lot in support suites, and soon enough chat bot became a part of the equation. I’ve experimented with a lot of approaches in my quest to build a quote unquote magical bot —from building them inhouse complete with our own NLP library, exploring local and international vendors, to doing a hybrid approach — all in order to build a chat bot that we all dream will be smart enough (one day) to tackle the virtually all of incoming inquiries from customers; and smart enough to sound like a human — you know the dream. Many dreams shattered later, feet grounded to earth; these are my notes of the lessons I’ve learned the hard way from building a handful bad bots, to bots that are decent enough to earn a stable stream of good CSAT ratings even in their bad days (e.g. production issues).

How to build a chat bot?

First thing first — It’s important to understand that it would be extremely hard if your aim is to build your chatbot with the goal of your bot being able to answer ALL of your customer’s inquiries — at least with the current technology. At best, they can answer general questions or handle fixed-flow request (e.g. checking a payment status, simple onboardings). Hence, it’s always a good idea to build a chatbot with a fallback mechanism in place (e.g. live agent fallback), and give clear expectations from the very beginning to your users of what the chatbot can and cannot do. Think of it as a system of ‘chat support service’ — bot is just one half of the system, but what completes is the presence of measures to flexibly respond to unprecedented situations; only then the system is close to complete.

First and foremost — verify what are the capabilities that your vendor can support.

Obviously, unless your business has invested a lot in NLP technologies, your best bet will be to look for a vendor to build the base bot for you, before you fine-tune it on your own. Generally, a few rules of thumbs that I use when scouting for a vendor, aside from standard service quality checks (e.g. how fast they respond to your inquiries etc) :

- The flexibility of in designing the conversation flow and UI/UX elements, plus training your bot with new topics, questions, and utterances. If they say that you can customize on your own via the dashboard, drill it down during the demo / trial time. Otherwise, if they say that they will provide a dedicated account manager / product team for you, lock down their commitment on how fast they can deliver your request. This is incredibly important because there’s no prevalent formula in building a good bot; it all depends on the nature of your users and your business — which means you will be spending A LOT of time A/B testing UI/UX elements, scripts, conversation flows, etc. Minor details especially matter when it comes to building a chatbot. For example; we had issues for months in trying to boost the number of CSAT responses; we’ve tried everything — an input based CSAT where user needs to type the numbers, button-based CSAT, changing the number of rating options, etc. Accidentally, we found out during our nth test, that changing the CSAT format into clickable star ratings actually boosted our CSAT feedback to over 500%. This formula might not work for your chat bot, hence why you need to verify your vendor’s flexibility in supporting your A/B tests. Imagine if just to ask for a UI change in CSAT format, it’ll take you two months to get it implemented (we’ve been there!).
The bot vendor’s NLP proficiency in your target language. This might not be too relevant if you’re only supporting English language bot, but you might have realized during your search for a great NLP library, that outside of the English community, it’s usually few and far between; depending on how active and prolific the NLP community for your particular target language. You can also opt for a strictly input/button based bot, but I found out that new user / repeating user are less willing to engage with this type of bot except for very simple queries because these users already have the perception that the bot will not be able to help with their questions. Hence supporting a free form, NLP based input from your customers actually will help with your bot’s engagement rate, and to make your bot sounds natural at least during the happy paths. My recommendation is to actually test your vendor’s live bot in their customer’s web/app, or at the very least, ask them to provide you with an interactive demo. This is incredibly important because improving NLP is a long term investment; this can’t be done overnight (or even within a few weeks/months!). Once you’ve locked in a vendor, switching cost is high especially because of the number of resources you will have to pour to train or improve the bot, so make sure you know what you’ll get before you sign the deal.
Explore the vendor dashboard’s analytical capabilities — a few things that I usually look for when analyzing a potential vendor’s dashboard includes the bot performance statistics (e.g. handover rate, CSAT rating, unidentified utterance, and the capability to see individual messages / download raw data). Either they provide this by default, or the vendor has the capability to build custom analytics. This is monumental because a good analytical capability makes a world’s difference in helping you train the bot post live. Why? With most products, it’s easy to get product feedbacks, be it bug reports, or feature requests via app reviews, or support channels, etc. With chat bots, the only way to get direct feedback is by observing your analytics. You’ll be surprised how many vendors actually do not provide CSAT analytics by default.

Your work doesn’t end when your bot finally goes live. In fact, it has only just begun.

Expect to put a lot of resources to finetune your bot after it goes live. In fact, be prepared to pour even more time and work to finetune your bot. Your bot vendor most of the time does not have your domain/industry’s specific knowledge, hence the task to fine-tune your bot to cater for your customer’s needs fall upon you. Your vendor might be able to help you implement your plan, but the analysis, research, scripting, conversation flow design, training, fallback design and many other inputs will need to come from you.

Design your conversation flow with the intention to help your customer

If there’s a frustrated user who needs to talk to a human agent, don’t make it difficult for them to do so. Understand that these user segments are not intended to be handled by your bot; forcing them to talk to the bot does not help either them or your business. While this doesn’t necessarily mean that you should put the option to connect to Agent directly from the get-go, provide an easy to access fallback option on every scenario when bot fails to answer.

For example ; let’s say that your bot is unable to identify your customer utterance. In cases like this, offer your frustrated customers a few options — a. Bot can’t understand your question, is this what you mean? b. If not, do you want to talk to our agent?

You might think that your users by default will choose to talk to agent, but think of it this way :

Your bot being unable to answer, frustrating your customer, AND your customer getting frustrated due to not being able to talk to agent are actually two different problem statements. Solve them separately instead of looking for a single solution that solves both. Your customer might want to talk to an agent because your bot is unable to answer their questions — hence the solution is not preventing them to talk to your agent because the root cause is your bot capability itself. On the other hand, a frustrated customer because he/she’s facing a technical issue or fraud are not your bot’s target user anyway. Don’t frustrate them even more by making it difficult for them to get the help they need. Remember the reason you’re building the bot in the first place; it’s to cater to your customer needs better.
The percentage might varies, but typically 40% of inquiries that come to businesses are actually frequently asked questions that do not need a human touch. If you are optimizing your bot to handle these commonly asked questions, there is no reason for them to talk to the agent anyway.
Sometimes we underestimate the number of users who actually prefer not to talk to human unless they really have to. They actually comprise a larger percentage of your user than you might have imagined.
If the majority of your users ask to talk to the agent, you might want to analyze further what went wrong. It could be that they might be repeating customers who have built a distrust over the capability of your bot to solve their problem, perhaps due to a bad first experience with your bot; or your bot has gained notoriety of being incapable to solve the customer’s issues, thus is being perceived as a hindrance instead of a helping hand.
Making it easier for users to talk to human actually helps you to buy some time to improve your bot. Your bot will not be immediately smart from Day 1. Undoubtedly, you will need to pour a lot of effort, resources and time to tweak your bot; with the presence of a human as a fallback option, this will actually buy you time to be able to think, research, and experiment what will actually work to improve your bot’s accuracy and coverage.

Opt for interactive, visual-based call-to-action (CTA) inputs for flows like CSAT to improve rate of CSAT survey

I mentioned it above, but this is something that I learned from my past experience building a few bots — the principle is that your users actually have zero incentive (unless they’re a disgruntled customer) to provide you with feedbacks; hence make it as easy and painless as possible for them to provide their feedback. Opt for a way that requires almost no effort from user to provide their feedback — I found that visual-based CTA inputs that’s commonly used / easy to understand intuitively actually works much better than type-based CTA (which requires actual effort from users to a) comprehend how the rating scale works, is 1 the best, or is 5 the best? b) how should I provide the input? Numbers only? numbers with description? c)and finally, type the numbers).

The nuances in your copy are important. A/B test them.

I’ve handled multiple products in my professional career years, be it an onboarding tool or a fraud detection system, but I found that chat bot is actually that one product that heavily relies on frequent A/B testing for the little details and nuances, much much more than any other products. You’ll be surprised by how little changes to your script (e.g making it in bullet point format instead of paragraphs, tweaking word choices, etc) can make a major impact on your user’s engagement metrics.

I usually do it several ways; going through the users’ list of unidentified utterances or negative feedback analytics, but I found that context is also incredibly important to understand the nuances of the user’s feedback; hence I often go through our user’s actual conversation with the bot in order to understand the conversation nuances and context. For example, a particular script that we use to prompt users to provide input — either click the button to choose to connect to agent, or type your questions if you want to access the FAQ; often leads to a bad CSAT rating. Going through the conversations, we found out that users often misinterpret the instruction; thinking that if they typed the questions they’ll be connected to the agent. A simple formatting change in that particular question actually reduced the occurrence of bad CSAT ratings for that particular flow.

Finally, design a fallback mechanism for every possible turn of events that your bot can’t handle

This does not just refer to human fallback, but includes various types of possible scenario that your bot might not be able to handle. Some example scenarios :

Your user wants to connect to human, but your agent is only available during business hour. This is a tough scenario to crack; hence you’ll need to experiment with what will work for your business; either by managing your users expectation upfront that your agent is only available during business hours, provide a contact form they can use during non-business hours so that they can still ask their questions, etc etc.
Your bot can’t exactly identify your user’s utterance. You might want to explore a few flows; such as prompting users to the closest topic that bot think might be what your customer is looking for; ask them whether they want to connect to an agent, have your bot ask your user to paraphrase their questions, etc.
Your bot got unexpected input for that particular conversation junction. Some kind of input validations might be useful to implement in this kind of scenario.

Chatbot best practices

This is not a post about Google Dialogflow, Rasa or any specific chatbot framework. It’s about the application of technology, the development process and measuring success. As such it’s most suitable for product owners, architects and project managers who are tasked with implementing a chatbot.

I’m stating the obvious here, but it’s really important to know what you want to achieve, and how this can be measured. Many of your KPIs will be sector or domain specific, but I will give you some chatbot specific KPIs to think about. Listed in order of importance:

User feedback — Ask your users if they are satisfied. This can be done during the dialog flow or at the end. Asking for feedback will give you two KPIs:

Feedback rate — the percentage of users who provided feedback.
Satisfaction rate — the percentage of those users who were satisfied. I generally recommend offering a binary choice “satisfied” vs “not satisfied” instead of a rating from 1 to 5, but the choice is yours.

Bounce rate — The percentage of users who abandon conversations midway.

Self service rate — The percentage of users who were able to achieve their goal without needing to chat, phone or email a human agent.

Return rate — The percentage of users who return to use the chatbot again.

Hourly usage distribution — When are people using your bot? A bot that serves customers 24/7 is especially valuable as it’s usually prohibitively expensive to provide human cover 24/7.

There are also a couple of technical KPIs to think about:

Model accuracy — If you are using natural language understanding to understand messages you will need to measure the accuracy of your models. This is a complex topic and is best left to data scientists and experienced developers. One word of caution — don’t get too hung up on the technical measures of accuracy. Ultimately it’s the business KPIs that matter. A model that is 98% accurate is no use if users are dissatisfied with the service.

Response times — Speed is not so important for a chatbot. You can work around slow performance by adding a “bot is typing” type message to dialogs. Sometimes we even fake a delay to make the bot seem more human. Nevertheless, you want to keep an eye on performance to maintain acceptable response times.

Finally, it’s important to know which channel your users favour if you deploy an omni-channel chatbot. Being dependent on one third party channel, e.g. Facebook Messenger puts you in a vulnerable position.

Like most technology, a bot is designed to automate tasks that would otherwise be done by a human operator. Before embarking on a chatbot it’s essentials that you know exactly what you are trying to automate. The best way of doing this is to first employ human agents to respond to your users’ messages. Why do this?

To validate assumptions — you (or your product owner) may have established your use cases or user stories, but how valid are they? How many users actually choose live chat to check their order status?

The 80/20 rule — users will make all sorts of requests. Even for a particular use case, some requests will be so esoteric that it makes little sense to automate. For example — you will most likely want to handle postage and delivery type queries, but does your bot really need to handle queries about VAT on BFPO shipments? probably not.

To understand the tone of conversations — How do your users interact over live chat? are they friendly or professional, do they use colloquialisms? Do they use emojis and text speak? Understanding the tone of conversations will allow you to develop a chatbot that your users are comfortable using.

To get training data — This is probably the most important reason why you should use humans first. Even if you’re not planning to use natural language understanding you will still need data for keyword matching. If you are using NLU/NLP you will certainly need good training data. From what we’ve seen, 80% of chatbots fail to meet expectations because they used synthetic training data, or a limited sample of real world data.

You may already employ human agents to serve your customers. If so, you probably need to tweak the data you log, and the way it’s structured (see below). If you don’t yet employ human agents you can actually do this on a (relatively) small scale.

Conclusion

Finally, one lesson of great importance that I’ve had as a take away from all these learnings is that to QC often by going through the actual messages. Some issues or nuances are hard to capture using analytics alone. Make it a habit to go through your user’s end-to-end conversations, you’ll be surprised how much more issues you can capture (that might not be apparent in your analytics), or how straightforward it is to solve a particular problem that you’re trying to solve.