As all machine learning based models, Smartly.AI algorithms requires good data to deliver good results. As we often say, if the algorithm is the rocket, the data is the fuel.
Before reading this, make sure you have checked the Intent Detection article.
Here is 6 questions you should ask yourself while designing your intents
In our experience, you should at least have 20 samples per intent to get ~ 80% accuracy.
After your beta test session you should be around 50 samples per intent to get ~ 90% accuracy.
Then once your bot is in production keep adding new samples via the Inbox module.
Some figures to check before launching your bot
Minimum: 5 sentences / intent
Maximum: 200 sentences / intent
Optimal: It really depends 🤓
To find this optimal number, check the dataset balance section.
Your utterances should be diverse, use paraphrases and different grammatical variations of the same sentence. Use real life data if available. Ask your self how you would express this intent to someone else. Try crowd sourcing methods (Mechanical Turkey,...). The Wizard of Oz approach can also help.
Here is a good example.
- Where is the printer - Where can I find the printer - How can I get the printer - I need the printer - Do you have a printer in this building? - When can i print documents - I have a few docs to print - I want to make copies - Is there a printer in here - How do you print stuff here - where is the damn printer - i want to print something - i was looking for the printer - can you help me found the printer - could you tell me where is the printer located? ...
The idea here is to avoid two extremes:
- Don't be too short (keywords). Use natural sentences.
- Don't be too long & don't use multiple sentences per sample. Try to be concise and only embed the core of your message. For instance, if your users tends to be polite with greetings and thanks before asking their true question, keep the question and remove the rest.
Try to to maintain a rough balance of the number of examples per intent. Because they are more common or easy to feed, some intents can outgrow the training examples of other intents. While in general more data helps to achieve better results, a strong imbalance can lead to a biased classifier which in turn affects the results negatively.
Conflicts or ambiguities happen when you have one or many identical sentences in two different intents.
Don't share important words between two or more intents
If you want some of your intents have very similar utterances, consider using intent and manage the difference with an entity.
For example don't create an intent
Search Printer and
Search Coffee Machine,
rather create one intent called
Search Something where the @Something is an entity with a printer, a coffee machine or whatever thing your user may want to look for.
- Where is @Something - Where can I find @Something - How can I get @Something - I need @Something - Do you have @Something in this building? - i was looking for @Something - can you help me found @Something - could you tell me where is @Something located?
So that will be your generic intent to find something (printer included).
Then you can still use the printer specific sentences in a dedicated intent.
- When can i print documents - I have a few docs to print - I want to make copies - Is there a printer in here - How do you print stuff here - where is the damn printer - i want to print something
You now have a good intent architecture that will prevent you from conflicting intents!
Never let an entity alone!
Never use an entity alone in your intent, make sure to have words before or after the entity as a context. In particular beware of the @Any entity, used alone, it will simply be always prioritized and break your bot.
Entities are not a shortcut
We have seen some customers use combinations of entities to quickly "generate" many phrases combinations for their intents. The problem is that the classifier is not trained with the content of the entities and thus this tactic will always fail.
There is no shortcut, if you want good results, you need to feed your intent with words.
Get the right mix between text and entities
4 entities per phrase is the max number of entities that we recommend you to use in your intents. If you need to capture more information, consider to make this happen via conversation design rather.
Updated about a year ago