What is Regular Expression (Regex)?

What is Regular Expression (Regex)?

Regular expression is a way to search through a stream of text. It allows you to create patterns that help match and locate the pattern you are looking for.

As an example, if the virtual assistant asks the end-user about their Social Security Number (SSN), a regex can be added to limit the user to type a certain format. In the USA, this would be a 9-digit value; while for France this would be a 13 digit value and a two-digit key.

In this case, if the Social Security Number for an American citizen is asked, it is expected to get a 9-digit value. If the end-user enters anything but a 9-digit value, the assistant can warn the end-user thanks to the regex set. This warning can be done either through adding a fallback action or adding an error message & fallback (see How to Use Error Messages).



In MindBehind, you can add regular expressions in the input action.

A regular expression consists of two main parts:
  1. Pattern: This is the actual pattern that you want to search for in the text. This basically defines what to search for. Example: Searching the 9-digit values in an end-user input for the SSN of American citizens through the regex \d{9}\s?
  2. Flags: The flags modify the searching behavior of the given pattern. This basically defines how to search. All of the flags are optional. They can be used together and in any order.
MindBehind allows you to change the flags through the "Validation regular expression" in the Input Module.


Flags

i

This flag is used to ignore case sensitivity. Let's say you want to check if the text contains the word cat, but you are okay with any of the following: Cat, cAt, caT, CAt, cAT, CaT, and CAT. In other words, you just want to know if the word is present regardless of the case.

The i flag specifies that the case sensitivity should not play role in matching the words. In other words /cat/i will match any of the above.

g

This flag specifies whether or not the regex validation should stop after the first match, and is referred to as the global flag.

To continue with the cat example, let's look at:

The cat looked at the other cat and said: "Meow".

By default, once the regex matches with the word "cat" in the text, it would stop. But say you want to get all the matches in the text. Then this flag is here to help you to continue the text search even after the first match, i.e., to search globally throughout the text.

m

This flag specifies how ^ and $ behave. Let's dig into what they mean:

In regular expressions, ^ is used to match the only beginning of the string, and $ is used to match only the end of the string, and the newlines are treated as part of the string.


Regex
What it checks:
cat
Does the text have the word cat?
^cat
Does the beginning of the text have the word cat?
cat$
Does the end of the text have the word cat?

Suppose that you want to match the word cat at the end of the string. Hence, this is your pattern: cat$ And this is your string:

This is my cat and it is 4 years old.

Although it seems like the regex should match the word cat at the end of the first sentence, it does not. This is because the pattern set as cat$ only checks the end of the string, and the whole text from this to old counts as one string, so the word cat is not at the end. If you specify the m flag, the newlines are treated as a separate string, and your pattern will have a match.

If you do not specify the flag as "m", then the newlines would be treated as one string, i.e. the sentence above would be counted as one sentence.


By default, the dot character matches any character except the newline. So for example the pattern \\\\d.*cats would match everything that has the following shape: A digit, followed by any character any number of times, which is then followed by the words cats. For example, it would match 9 cats, 9 fluffy cats, and so on. However, the newline character, which is denoted as \\n is an exception.

The following strings will not match:

I have 9 fluffy \n cats 

You might think that why a person would type \n out of nowhere. But when you hit enter and start a new line, that newline is implicitly encoded as \n. So the above statement is the same as the one below:

I have 9 fluffy cats

When the flag s is specified, the special dot character will match the newline character as well.

Difference between the flags m & s: When the flag s is chosen, then the newline would be included. This flag could be used anywhere in the string. On the other hand, the flag m is used to define how the ^ and $ operators should behave, and these operators are mostly used for the beginning and the end of the string. While the flag m is mostly about the beginning & the end of a string, the flag s could be anywhere.

u


This is an advanced flag to specify how some characters like ბ,ㄱ should be treated. Using this flag is very rare.


y


Strings are ordered sequences of characters, and they are discretely indexes starting from zero.


So, for the string cat



  • The first index, the index 0 (zero), is the letter c

  • The second index, index 1, is the letter a,

  • and the last index, which is the letter t has the index of 2.


In a nutshell, all of the letters in the string get an index from left to right starting from zero and incrementing by 1.


When you search with regular expression, the search starts from the index zero. This flag helps you start from the index you specify instead of the beginning of the string, i.e., index zero.

When to use what


Most of the time the regular expression is used to validate the user input. In other words, the virtual assistant does not need to know how many times the pattern occurs in the user response, or where exactly in the sentence the pattern occurs. Since the validation result is binary, i.e., it is either valid or not, most of the above flags are not needed when validating the user input.


The most useful flag would be i flag to account for case insensitivity.

To sum this up:





Flag




Meaning



i
Ignores case sensitivity
g

Specifies whether or not the regex validation should stop after the first match
m
Specifies how ^ and $ at the end of regex behave.
s
Chooses whether the special dot character to match the newline character
u

An advanced flag to specify characters like ბ,


y
Helps you start from the index you specify instead of the beginning of the string, i.e., index zero.





    • Related Articles

    • How to Use Input Action

      Action usage Input action is used to get input from the user. The input can be free or constraint with regular expression. You can use the action by drag and drop it from the actions menu (or simply by clicking on it). Inside its configuration menu, ...
    • What is GDPR Masking?

      GDPR Filters If you want to mask personal data when it reaches agents, you can add regular expression (regex) rules to the channels. Let's examine some example messages that a user might send: // User is sending an IBAN John Doe, ...
    • AI Designer Tricks

      General Settings 1. How to use 'Global Keyword' 1. Go to 'Settings' gear icon so see more options of customization. Functionality: Global Keyword helps you to achieve immediate bot reaction without AI . The condition is 1 score* wording resemblance. ...