Regular expression is a way to search through a stream of text. It allows you to create patterns that help match and locate the pattern you are looking for.
As an example, if the virtual assistant asks the end-user about their Social Security Number (SSN), a regex can be added to limit the user to type a certain format. In the USA, this would be a 9-digit value; while for France this would be a 13 digit value and a two-digit key.
In this case, if the Social Security Number for an American citizen is asked, it is expected to get a 9-digit value. If the end-user enters anything but a 9-digit value, the assistant can warn the end-user thanks to the regex set. This warning can be done either through adding a fallback action or adding an error message & fallback (see How to Use Error Messages).
expressions in the input action.
A regular expression consists of two main parts:
This flag is used to ignore case sensitivity. Let's say you want to check if the text contains the word cat, but you are okay with any of the following: Cat, cAt, caT, CAt, cAT, CaT, and CAT. In other words, you just want to know if the word is present regardless of the case.
The i flag specifies that the case sensitivity should not play role in matching the words. In other words /cat/i
will match any of the above.
This flag specifies whether or not the regex validation should stop after the first match, and is referred to as the global flag.
To continue with the cat example, let's look at:
The cat looked at the other cat and said: "Meow".
By default, once the regex matches with the word "cat" in the text, it would stop. But say you want to get all the matches in the text. Then this flag is here to help you to continue the text search even after the first match, i.e., to search globally throughout the text.
This flag specifies how ^
and $
behave. Let's dig into what they mean:
In regular expressions, ^
is used to match the only beginning of the string, and $
is used to match only the end of the string, and the newlines are treated as part of the string.
Regex | What it checks: |
cat | Does the text have the word cat? |
^cat | Does
the beginning of the text have the word cat? |
cat$ | Does
the end of the text have the word cat? |
Although it seems like the regex should match the word cat
at the end of the first sentence, it does not. This is because the pattern set as cat$
only checks the end of the string, and the whole text from this to old counts as one string, so the word cat is not at the end. If you specify the m flag, the newlines are treated as a separate string, and your pattern will have a match.
If you do not specify the flag as "m", then the newlines would be treated as one string, i.e. the sentence above would be counted as one sentence.
By default, the dot character matches any character except the newline. So for example the pattern \\\\d.*cats
would match everything that has the following shape:
A digit, followed by any character any number of times, which is then followed by the words cats. For example, it would match 9 cats, 9 fluffy cats, and so on. However, the newline character, which is denoted as \\n
is an exception.
The following strings will not match:
You might think that why a person would type \n out of nowhere. But when you hit enter and start a new line, that newline is implicitly encoded as \n. So the above statement is the same as the one below:
This is an advanced flag to specify how some characters like ბ,ㄱ should be treated. Using this flag is very rare.
Strings are ordered sequences of characters, and they are discretely indexes starting from zero.
So, for the string cat
c
a
,t
has the index of 2.In a nutshell, all of the letters in the string get an index from left to right starting from zero and incrementing by 1.
When you search with regular expression, the search starts from the index zero. This flag helps you start from the index you specify instead of the beginning of the string, i.e., index zero.
Most of the time the regular expression is used to validate the user input. In other words, the virtual assistant does not need to know how many times the pattern occurs in the user response, or where exactly in the sentence the pattern occurs. Since the validation result is binary, i.e., it is either valid or not, most of the above flags are not needed when validating the user input.
The most useful flag would be i flag to account for case insensitivity.
| Meaning | ||
| Ignores
case sensitivity | ||
g ​ | Specifies whether or not the regex validation should stop after the first match | ||
m | Specifies how ^ and $ at the end of regex behave. | ||
s | Chooses
whether the special dot character to match the newline character | ||
u | An advanced flag to specify characters like ბ,ㄱ | ||
y | Helps
you start from the index you specify instead of the beginning of the string,
i.e., index zero. |