- Pranav Kulkarni
An Introduction to Regular Expressions
The purpose of this blog is to give those with no prior experience or knowledge of regular expressions in Python a simple & intuitive introduction. Regexes, or regular expressions, are character sequences that are used to find a pattern in a string or series of strings. Several examples below help us better understand regex and how it works.
Accessing the Regex Module in Python
The regex module and the search function can be imported as:
import re
re.search(<regex>, <string>)
Or we can directly import the search function:
from re import search
search(<regex>, <string>)
In both of the blocks of code, <regex> refers to the pattern that needs to be
searched and <string> refers to the string in which the search is to be conducted.

Using re.search()
Let's look at an example of how regex is exactly used to search for a specific pattern
import re
s='xYzff123'
print(re.search('123', s))
Output - <re.Match object; span=(5, 8), match='123'>
What this output gives us, is that the pattern was found in the given string and the
'==span==' attribute gives us the start and end indexes of the pattern's position
inside the string.
The search function is a useful tool since it allows one to see if a string sequence
is part of a bigger string sequence and, if it is, it notifies of the search query's
relative location.
Combining Boolean Statements and Regex
The regex search function can also be integrated into code using boolean statements:
from re import search
str = "Twitter is a platform that runs on the concept of #s"
if search('#', str):
print("# found in the string")
else:
print("No # found in the string")
Output - `# found in the string
Complex Regex Queries with Metacharacters
The building components of regular expressions are metacharacters. Regex considers
characters to be either metacharacters with special meanings or regular characters
with literal meanings.
The following table gives us more insight into what exactly each metacharacter is used
for:
Metacharacter | Description | Examples |
\d | ​Whole Number - 0 to 9 | \d\d\d = 444; \d\d = 21; \d = 8 |
\w | Alphanumeric Character | \w\w\w = dog; \w\w\w = 467 |
\W | Symbols | \W = %; \W = #; \W\W\W = @#$ |
[a-z] | Character set, at least one of which must be a match | pand[ora] = panda, pando & pandr(Since the pattern specifies any 1 character) |
[0-9] | Numeric Set with the exact same logic | 012[12] = 0121 & 0122 |
(abc) | Character Group matching in the exact order | pand(ora) = pandora |
(123) | Numeric Group matching in the exact order | 0123(456) = 0123456 |
| | Fulfills the Boolean OR condition | pand(ora|123) = pandora OR pand123 |
? | ​Matches when the character preceding occurs 0 or 1 time, making match optional | colou?r = colour(u found once); colou? r = color(u found 0 times) |
* | Asterisk matches when the character preceding * matches 0 or more times | tre* = tree(e found twice); tre* = tre(e found once); tre* = tr(e found 0 times); tre* != trees(s doesn't match regex) |
+ | Matches the character preceding + 1 or more times, + makes match mandatory | tre+ = tree(e found twice); tre+ = tre(e found once); tre+ != tr(e found 0 times hence no match) |
. | The period matches any alphanumeric character or symbol | ton. = tone, ton4, ton@ but ton. != tones(only single character matching) |
.*​ | Combines the functionalities of . & * | tr.* = tr, tre, tree, trees, trough, treadmill |
Regex Quantifiers
Quantifiers specify how many instances of a character, group, or character class must
be present in the input for a match to be found, the following table describes the
quantifiers used in Regex and their usage.
Quantifier | Description | Example |
{n} | Matches when the preceding character(or group) occurs exactly n times | ​\d{3} = 123 & 456 & 789; pand(ora){2} = pandoraora |
{n,m} | Matches when the preceding character(or group) occurs at least n times and at most m times | \d{2,5} = 97430 & 9743 & 97 |

Pattern Usage In Python
Let's assume that one wants to search for specific snippets of strings in a certain
input, or even put strict rules for the input inside a textbox, like detecting if the
email entered is valid or not; then in this case specific pattern usage is used which
combines metacharacters and quantifiers in a specific sequence that matches the
specified pattern that is to be matched in the string.
Let's take the example of email validation, say you want to create a form and inside
that a text box, which will ask the user to enter their email in the box, now how will
you check if the entered email is valid or not?
Let's see!
The Python code for checking the correct email pattern is:
from re import search
s = "anyemail@regex.com"
match = search(r'[\w.]+@[\w.]+', s)
if match:
print(match.group())
else:
print("Match not found, ")
In the above code, we can see that the r'[\w.]+@[\w.]+' snippet is the pattern
matching statement used for email checking inside the string s (for convenience I
have directly taken s as a string).
Now let's try to breakdown the pattern that has been used for email detection:- It
starts with the double quotes, and immediately inside we can see that first there is a
square bracket, and then a + and then the @ symbol and then another square bracket.
Let's try to breakdown the square brackets first and then proceed to the inside
contents.
From the metacharacters table, we can see that the square brackets [a-z] are used
for the pattern of a =="character set"==, which implies that the first part of the
email should compulsorily be only a character set, then the + indicates that the
pattern will be matched for 1 or more instances(so repeated characters will be
allowed), then the @ sign indicates that only @ is permissible and no other
character will be allowed, and after that we can see that there is another character
set, and then another + which has the same functionality as the first one.
Now let's dive inside the square brackets, in the 1st one we can see that it contains
\w . . Looking at the metacharacters, we can see that \w is the metacharacter for
alphanumerics, meaning that both letters and numbers are allowed in the pattern(as
they should be, it's an email after all!), and the . after is used for matching any
alphanumeric character, as evident from the metacharacters table.
In the second [] pretty much the same thing happens, except it has to be preceded by
an @ character for the email to be completely valid.
The result of the above program comes out to be - Output - anyemail@regex.com
The .group() method in the program returns the pattern that is found in the string,
if and only if it matches the regex pattern, else it returns Match Not Found
And we get the desired email by using the pattern r'[\w.]+@[\w.]+'
However, the best part of the above program is that, even if we consider s as an
entire string, that contains a sentence along with an Email ID, the program will
filter out only the email ID and give that out as the output due to the pattern only
searching for immediate character sets preceding and succeeding the @ character.
So, if we consider s = Something Something Something anyemail@regex.com , then the
output of the program is only anyemail@regex.com