A Regular Expression (Regex) is a rule defining how characters can appear in an expression. In essence, it is a sequence of characters or text, which determines the search pattern.
The following are some of the use cases for regular expressions:
- To identify the data using combinations, such as credit or debit card numbers, email addresses, or telephone numbers.
- To find a specific text pattern or apply a filter to the text, numeric, or special character data.
- To parse the data in ETL by creating rules for inbound and outbound traffic and finding patterns in the code.
In SQL databases, selecting the values based on regular expressions defined in the WHERE condition is very useful.
How to use Regex in SQL Server?
Unlike MySQL and Oracle, SQL Server databases don’t support built-in RegEx functions. However, SQL Server offers the following built-in functions to tackle such complex issues:
- LIKE
- PATINDEX
- CHARINDEX
- SUBSTRING
- REPLACE
We can combine these functions with others and create more complex queries. However, such queries are difficult to maintain and require more time and effort to develop. If we are querying a large table, they can produce a huge impact on performance.
Here, we are going to deal with the LIKE operator that can be used to match the patterns (the next article relates to the SUBSTRING, PATINDEX, and CHARINDEX functions).
We’ll clarify the essence of the LIKE operator and illustrate some use cases concerning searching for the data from a table based on a specific pattern.
LIKE operators
The LIKE operator uses a combination of a matching expression and a pattern and supports the following valid wildcard characters’ pattern.
[table id=64 /]
Now, let’s demonstrate the LIKE operator’s use cases.
Prepare Demo Setup
First, we create a demo table named “Patient_Addresses.” Execute the following query:
USE DEMODATABASE GO CREATE TABLE Patient_Addresses ( ID INT IDENTITY(1, 1), TEXTDATA NVARCHAR(MAX) )
Now, we need to insert the data into the “Patient_Addresses” table:
USE [demodatabase] GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'KALOLI GAM TA-KHEDA DIST- KHEDA') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'PATHAR KUVA RELIEF ROADA''BAD') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'TARA APPTS, GURUKUL ROAD AHMEDABAD') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'1278, HOJAVALIGALI GOMATIPUR A`BD') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'DHOLKA') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'KHODIYAR NAGAR BEHRAMPURA A,BAD') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'2/27 ASHPURI SOC. GHODASAR A`BD') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'GHEE KANTA') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'GAM; BODIYA TALUKO; LIMADI DIST; SURENDRANAGR') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'ELISE BRIDGE') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'GJ') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'MP') GO INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'Q') GO
Once the data is in place, we need to review it. Execute the following query:
USE DEMODATABASE GO SELECT * FROM [PATIENT_ADDRESSES]
Data should look like the following.
Now, let me explain the use cases.
Different use cases of the LIKE operator
Example 1: Populate rows with one specific identifier
I want to populate only those rows which start with PA. To populate the data, we can use the [XY]% regex.
Execute the following query:
SELECT * FROM PATIENT_ADDRESSES WHERE ADDRESS LIKE '[PA]%'
The output:
As you can see, our query retrieved only that record where the “Address” column value starts with “PA”
Example 2: Populate records which start with two specific characters
I want to populate the records, which start with the specific combination of two characters. In our case, the first character must be “E,” and the second character must be “L.”
Execute the query:
SELECT * FROM PATIENT_ADDRESSES WHERE ADDRESS LIKE '[E][L]% '
The output:
As you can see from the image above, the query retrieved only record where the value of the address column has “E” as the first character and “L” as the second character.
Example 3: Retrieve rows consisting of two characters within a definite range
We want to retrieve those rows only, which have two characters in the A to Z range.
We need the following query structure:
USE demodatabase go SELECT * FROM [patient_addresses] WHERE address LIKE '[A-Z][A-Z]'
The output:
This query returned the results, each consisting of precisely two characters. The values of both characters are between A and Z.
Example 4: Get the data with the first character within a definite range
We want to retrieve the data where the first character is between K to P, and the rest of the string is the same.
Use the following structure:
USE DEMODATABASE GO SELECT * FROM [PATIENT_ADDRESSES] WHERE ADDRESS LIKE '[K-P]%'
The output:
Similarly, we can retrieve data where the last three characters will be “BAD,” and, except those characters, the string will remain the same.
To do that, execute the following query:
USE demodatabase go SELECT * FROM [patient_addresses] WHERE address LIKE '%BAD'
Example 5: Retrieve the data with the concrete first character of the string
We want to retrieve the list of addresses where the first character of the string must be between E and H. The rest of the string must remain the same. Here, we need to use regex [X-Y]%
Execute the following query:
SELECT * FROM [PATIENT_ADDRESSES] WHERE ADDRESS LIKE '[E-H]%'
The output:
Similarly, we can retrieve the list of addresses where the last character of the address column is between A and C, and the rest of the string remains the same. There, we must use regex %[X-Y]
Execute the following query:
SELECT * FROM PATIENT_ADDRESSES WHERE ADDRESS LIKE '%[A-C]'
The output:
Now, refer to more complex examples.
Example 6: Populate the data with the identifier, excluding certain ranges
We want to populate the records from the address table where the last character must not be between B and D. To do that, we use regex %[^X-Y]
Execute the query:
SELECT * FROM [PATIENT_ADDRESSES] WHERE ADDRESS LIKE '%[^A-D]'
The output:
Similarly, we want to retrieve those records with the first character of the address not belonging to the A to Z range. To do that, we should use the regex [^X-Y]% pattern.
Execute the following query:
SELECT * FROM [PATIENT_ADDRESSES] WHERE ADDRESS LIKE '[^A-Z]%'
Find the specific string pattern
Using Regular expression, we can find a specific text pattern. For example, we want to populate records that match the following patterns:
- Any characters are allowed at first (first %).
- The third character should be either I or S.
- The fourth and fifth characters will be SE – this combination is static.
- The sixth character will be a space.
- Any character is allowed after that (last %).
To populate the record, execute the following query:
SELECT * FROM [patient_addresses] WHERE address LIKE '%[IS]SE[ ]%'
The output:
Summary
This article covered the definition of a regular expression and its application. Its main goal was to present an overview of the LIKE operator and illustrate its usage as a regular expression in different use cases.
Tags: sql, sql operator, sql regular expression, sql server Last modified: August 08, 2022
Great article. Thank you!