Building an offline chatbot
Sun, May 20, 2018
~ Read in 6 minutes ~
Having a borderline unhealthy obsession with making stuff run offline (my “dayjob” is partly to blame here), an idea started keeping me up at night - why not a chatbot? The idea turned into a talk (not english tho) I gave at a local meetup but it still feels like it deserves a more in-depth treatise on the why and how of building an offline chatbot.
I mean, why not? 😎 With PWAs becoming more mainstream, we can expect to see more “edgy” stuff in the browser, working offline and acting like a desktop app. So if we can have various games, tomato timers, file sharing and even google drive / docs running offline, why not a measly chatbot? For whenever you feel like talking to a not-particularly-smart glorified if / else statement? No? Doesn’t matter, read on 😆.
Take one part create-react-app, one part compromise.js with a sprinkle of react for UI and “brains” on top, et voila - a chatbot basis. The
create-react-app bit is not particularly interesting here, so we’ll skip that and get into
compromise.js a bit before we get into the actually interesting parts.
How it achieves that takes a bit of reading and theory to understand fully, but the gist of it is:
- 80% of the used english language consists of the top 1000 words.
- Statistically, the most common word type is a noun, so it makes sense to assume that any word unknown to the library is a noun.
- With some word stemming / lemmatization, we can reduce the size of the dictionary needed for the library to run (word suffixes, for example).
- And some sentence-level postprocessing on top, leads us to the numbers mentioned above.
A thousand word dictionary is perfectly acceptable for in-browser use, considering modern JS library sizes. A more in-depth explanation on how compromise works here and here. Oh, did I mention it also does plugins and custom lexicons?
2. The “brains” part
Compromise.js sounds great and all, but how does that help us build a chatbot? Well it doesn’t directly, but we’ll get to that. Fundamentally, a chatbot can be described as a program that responds to natural language via request / response cycles. So, a very dumb and basic chatbot would simply be a “reactive1” program that can respond to given strings. Ergo, we need a function that responds to a given user input:
Now, this gives us a one-trick-pony of a chatbot that only knows to return the response above. To make it smarter we’ll “pirate” a page off of Amazon Alexa - we’ll organize everything the bot knows into skills:
The theory here is that we’ll have an index of all of the skills our bot knows, and we’ll have a way of looking up the most appropriate one by using the magic of compromise.js. Now, before we get any further into making the above mentioned lookup work, we need a bit more info on what compromise can do to simplify that otherwise tedious task.
When we give an input to compromise.js, it’s nice enough to tag all Parts of Speech and give us a tool to match against them, together with regex, plain words and our own custom tags if we happen to need them. I mean, if you decide “glue” is a preposition, you can tell compromise to treat it is such. But just so you know, “glue” is not a preposition. So, with that out of the way, we can put the lookup part together:
So, we improved our decision-maker to look for whether an input matches against a skill’s match config. The whole process can be described as:
- We import all of our skills into the “brain” (line 1)
- We look through each of a skill’s match rules to find what works for the given input (lines 4-12);
- If we happen to find a skill, we use it’s
replyfunction to return a reply (lines 16-19)
- Since compromise comes with a nice debugger, we use it to give us more info (line 14)
To make the puzzle complete, let’s take a look at a full skill:
What the above does, following the bare structure we defined previously is:
- Define match rules (lines 7-9), giving our “brain” something to match against. So, whenever the brain gets a “hi”, “hello”, “ahoy” or “greetings” as input, that is going to trigger this skill, because compromise’s
.match()matches it here. As a last ditch effort to make it work, whenever compromise recognizes something as an “#Expression” we trigger on that too (not ideal, but works suprisingly well).
- In order not to get too boring with repetition, we randomize stuff a little bit with the “replies” array, and pick a random one on each trigger (lines 11-18).
With that done, we have a basic bot that’s not as dumb as it’s first iteration. This one can reply to greetings with a greeting, making it at least somewhat context-sensitive. It’s still too dumb for anything more sophisticated, but the basics are there.
There are some glaringly obvious shortcomings with our bot right now - it doesn’t do too much right now, it’s not aware of historical data or context and doesn’t do fallback answers when it doesn’t match any of the given skills. And I plan on making it better in part 2 of this thing, very soon 😎. For the impatient, have a look at my example bot from the talk I mentioned above here or see it live here. That versions has a few more tricks up it’s sleeve, but by the time we’re done with this verions of the bot, it will be a lot smarter than the deployed “beerbot” there.
Thanks for reading, and join me in the next installment when we improve the bot’s “smarts” significantly.
- Not reactive as used in programming, but reactive as in “readily responsive to a stimulus” - although the two feel very similar here.