Building a Helperbot

Part 3: Improving Search Accuracy

In part 1, I built a simple front-end concept for a chatbot named Helperbot.

In part 2, I built out my idea for a a fully functional backend that can use Helperbot. Then I created a very basic concept for making the bot actually respond to user input with somewhat-relevant answers.

Next, I’m going to try to improve the accuracy of Helperbot’s search results.

There’s a couple big issues with how the search was implemented at the end of the last post. To recap what search looks like right now:

  • 1) I’m breaking the user message apart into an array of the words in the message.
  • 2) Then I’m iterating each element through a query using `like` and wildcards to try to find a pattern match within the current projects’ FAQ’s.
  • 3) If there is a match, the match gets appended to a response array which gets returned back to Helperbot.
  • 4) Also somewhere in the mix, I’m running the search query through a filter to try to strip all generic words.

Problems

It only takes a few tests to see that this approach is pretty error prone.

First of all, because the current search is passing each keyword individually, and because the bot is currently responding with the first matched item in the array, it only takes a single word used in two different FAQ answers to potentially throw off the accuracy of the search results.

Also, going back to my test I set up to filter generic terms – if a single word slips past my generic keyword bank, and if that same word happens to exist in any less relevant answers, it can easily throw off Helperbot’s response accuracy.

Example

So now, a quick example to demonstrate why this sucks.

Let’s say a project has these 3 FAQ’s:

-“How do I get started?”
-“How do I create an FAQ”
-“How do I launch a project?”

And let’s say a user searches for the following:

“How do I create a project?”

Based on the current settings, if the search runs through “How do I create an FAQ” before “How do I launch a project?” and matches based off the word “create”, the first response the bot will provide is the FAQ response even though the user wanted to know about projects.

Or if any of the words ‘how’, ‘do’, or ‘i’ aren’t in my generic word filter, the bot might respond by telling the user how to get started, which is irrelevant to their actual question.

So that’s not great..

The Fix

I think I could modify the current approach by adding a series of AND/OR’s into the query to search for multiple parts of a string, where the logic might be a little more like this:

But instead of that I think I want to scrap the whole generic filter thing, scrap the keyword search, and do some sort of full text search instead.


pg_search

After a little research, I’m going to try to use pg_search for this next test and use what they call ‘search scopes’.

I’m kind of learning as I go here, but from what I can tell this will be better for natural language search and should sort the resulting matches by relevance to the query.

That means that until/unless I figure out a more natural way for the bot to display multiple responses(if there’s more than one match), the first result that he returns should be the most relevant.

So Let’s Go

I’m already using postgresql, so setting up pg_search in my project looks like it should be pretty simple.

First, I’ll just add the gem:

Then I’ll include pg_search and add a search method to the model I want to use this on with ‘pg_search_scope’:

That should be all I need to set up this new test. Now I should be able to use my new search method on the project faqs.

Putting it to the test

Finally it’s time to test our new search accuracy out. To test for more complex matching, I’m going to create 3 different FAQ’s with very similar language but focused on 3 completely different topics.

1) How do I get started?
2) How do I create a project?
3) How do I add a new faq?

Now remember, I removed the generic words filter, so if I was still testing the v1 approach the “how do I” that is in each question here would completely destroy any search accuracy.

If that wasn’t the case, the search could still easily be thrown off.

If, for example, you asked Helperbot “how do I create an FAQ?”, the match for ‘create’ as a keyword could trigger Helperbot’s first response to be about creating projects instead of creating FAQ’s.

However now, several of these problems should theoretically be resolved.

To test this hypothesis out, I’m going to feed Helperbot some mixed questions, like:

-“How do I add my first project to my account?”
-“How do I create an FAQ?”

That’s pretty cool!

This approach is definitely a big step up from the first test.

But after running through a few more complex test questions, it looks like there’s still some room for improvement.

My first thought was to try to add and run pg_search in the topics class before the FAQ’s class (recall FAQ’s are grouped under topics/sections within a project) to try to narrow down the group of searchable FAQ’s into a more targeted segment:

But this actually had a negative impact on the quality of Helperbot’s responses. So I’m scrapping that approach for now.

My next idea was to go back to the pg_search documentation to see what I else I might do.

Weighting & Keywords

Looking back at the documentation, there’s actually a way to weight how important each field is to the search.

What that means is that I can add an extra, optional “keyword” field to the form for when you create a new Q/A.

Then in my FAQ model I can modify my pg_search method to make sure that if a keyword exists, it outranks the other fields being searched in the table:

With these tweaks, Helperbot is now able to respond to questions with impressive accuracy.

You can check out helperbot live and create your own projects here.