Searching for a particular text is one of the important applications in text processing. Searching becomes more difficult when the size of text increases. Regular expression is a method used to reduce the time and complexity of searching.
Regular expressions also called as RegEX which describes the pattern (set of strings which needs to be searched).
Regular expression is a string and contains list of characters which represents the set of search result.
Text : this sentence has more than 10 letters.
Regular expression: \d
Result : 1,0
In the above example, there is a simple text, the regular expression we are using is \d. \d corresponds to a single digit. There are 2 digits in the text, so the output contains 2 matches (1 and 0).
Regular expression
Matches one digit
Hi, my roll number is 53.
5, 3
Matches one non-digit
15 cars
c, a, r and s
Matches one word (except spaces)
15 cars
1, 5, c, a, r, s
Matches one non word
15 cars
Contains one space
Matches one single character except line breaks
15 cars
1, 5, , c, a, r, s
Matches a new line
This is line 1
This is line 2
Contains a new line character
There are few quantifiers which are used to match more than one characters, here are the list of quantifiers.
One or more occurrences
Zero or more occurrences
Zero or one occurrence
Exactly n number of occurrences
n to m number of occurrences
Zero to n number of occurrences
n to more number of occurrences
Matches either beginning/ending of a word
Few examples of regular expression
1) Find all the 3 digit numbers in the string “this sentence has numbers 1, 45,103, 53, 2456, 23”
Regular expression: /d{3}
Result: 103, 245.
Explanation: the regular expression \d matches one digit, but this has {3} quantifier added with it, so this matches all the numbers where there are 3 consecutive digits. The first digit 1 is followed by , (which is a non-digit) so 1 is rejected. The next number is 45. This has 2 consecutive digits, but the 3ed one is a comma, so it is also rejected. The next number 103 satisfies the condition, so it is accepted. The forth one 53 and the sixth one 23 will be rejected. The number 2456 satisfies the condition of consecutive 3 digit, so it is also accepted.
2) Find all words that ends with at. Input string “cat hat wet sit fat”
Regular expression: (/w)*at/b
Result: cat, hat, fat
Explanation: /w matches one word, the * quantifier represent 0 or more, so (/w)* matches zero or more words. The /b is placed after at, this means that the string at should be at last. So the above regular expression matches all the string which ends with at.
3) Find all words that begins with st. input string “horse stable, stars in sky, working staffs”
Regular expression /bst(/w)*
Result: stable, stars, staffs
Explanation: \b is present before st, so it matches all characters following by st. The quantifier * used along with \w matches all words followed by st.
