HomeGuidesAPI ReferenceChangelogsDiscussions
GuidesChangelogsAPI ReferencePublic RoadmapService StatusLog In

Data augmentation

Here we look at how data augmentation can help the typo robustness of your models

What are typos?

Typographical errors are errors introduced by using a character instead of another, duplicating , inverting character and any unintentional errors that can occur while typing a text.

What is an efficient way to deal with typos ?

We can use data augmentation to simulate typo errors and then add this data to the training data set so that the bot can "learn" the mispelled words too.

What is data augmentation?

Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data.
In the illustration below, we can see how a single document can be slighly changed to generate many new documents that can be really usefull to feed a machine learning model.

850

Is data augmentation possible for chatbots?

We have seen above how we can generate new images programatically, guess what, the same is possible with text!
Using different techniques, typos can can be introduced in your intents, leading to more robustness to the said typos.

1920

🚧

What type of text augmentation will we use?

The generated text is currently only based on typos simulations as it is the problem we are trying to solve here

Leveraging data augmentation to improve your bots

Our bot platform offers a unique data augmentation component that can improve the robustness of the machine learning models of your chatbots.
To use go to your bot settings and pick a model featuring data augmentation save the settings and train your bot.

4400

By default, the augmentation factor is set to 3, but you can define this value in the bots settings