T-SQL Regular expression: LIKE Operator and its use-cases

Total: 13 Average: 3.8

A Regular Expression (Regex) is a rule defining how characters can appear in an expression. In essence, it is a sequence of characters or text, which determines the search pattern.

The following are some of the use cases for regular expressions:

  1. To identify the data using combinations, such as credit or debit card numbers, email addresses, or telephone numbers.
  2. To find a specific text pattern or apply a filter to the text, numeric, or special character data.
  3. To parse the data in ETL by creating rules for inbound and outbound traffic and finding patterns in the code.

In SQL databases, selecting the values based on regular expressions defined in the WHERE condition is very useful.

How to use Regex in SQL Server?

Unlike MySQL and Oracle, SQL Server databases don’t support built-in RegEx functions. However, SQL Server offers the following built-in functions to tackle such complex issues:

  • LIKE
  • PATINDEX
  • CHARINDEX
  • SUBSTRING
  • REPLACE

We can combine these functions with others and create more complex queries. However, such queries are difficult to maintain and require more time and effort to develop. If we are querying a large table, they can produce a huge impact on performance.

Here, we are going to deal with the LIKE operator that can be used to match the patterns (the next article relates to the SUBSTRING, PATINDEX, and CHARINDEX functions).

We’ll clarify the essence of the LIKE operator and illustrate some use cases concerning searching for the data from a table based on a specific pattern.

LIKE operators

The LIKE operator uses a combination of a matching expression and a pattern and supports the following valid wildcard characters’ pattern.

Wildcard CharacterDescription
%A string of zero or more characters. For example, Employee_Name %Nisarg% will populate those records having the 'Nisarg' word anywhere in the string.
[ ]Any single character within a specified range. The characters within the square brackets ( [ ) and ( ] ) are used for the matching process.
[^]None of the single characters within the specified range. For instance, Employee_Name LIKE 'Ni[^A]%' will populate the records starting with Ni, provided that the following letter isn't A.

Now, let’s demonstrate the LIKE operator’s use cases.

Prepare Demo Setup

First, we create a demo table named “Patient_Addresses.” Execute the following query:

USE DEMODATABASE 
GO 

CREATE TABLE Patient_Addresses
  ( 
     ID       INT IDENTITY(1, 1), 
     TEXTDATA NVARCHAR(MAX) 
  )

Now, we need to insert the data into the “Patient_Addresses” table:

USE [demodatabase]
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'KALOLI GAM TA-KHEDA DIST- KHEDA')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'PATHAR KUVA RELIEF ROADA''BAD')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'TARA APPTS, GURUKUL ROAD AHMEDABAD')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'1278, HOJAVALIGALI GOMATIPUR A`BD')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'DHOLKA')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'KHODIYAR NAGAR BEHRAMPURA A,BAD')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'2/27 ASHPURI SOC. GHODASAR A`BD')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'GHEE KANTA')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES 
( N'GAM; BODIYA TALUKO; LIMADI DIST; SURENDRANAGR')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'ELISE BRIDGE')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'GJ')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'MP')
GO
INSERT [dbo].[Patient_Addresses] ( [Address]) VALUES ( N'Q')
GO

Once the data is in place, we need to review it. Execute the following query:

USE DEMODATABASE 
GO 

SELECT * 
FROM   [PATIENT_ADDRESSES]

Data should look like the following.

SQL Query to insert data in “Patient_Addresses”

Now, let me explain the use cases.

Different use cases of the LIKE operator

Example 1: Populate rows with one specific identifier

I want to populate only those rows which start with PA. To populate the data, we can use the [XY]% regex.

Execute the following query:

SELECT * 
FROM   PATIENT_ADDRESSES 
WHERE  ADDRESS LIKE '[PA]%'

The output:

Query to populate only those rows which start with PA

As you can see, our query retrieved only that record where the “Address” column value starts with “PA”

Example 2: Populate records which start with two specific characters

I want to populate the records, which start with the specific combination of two characters. In our case, the first character must be “E,” and the second character must be “L.”

Execute the query:

SELECT * 
FROM   PATIENT_ADDRESSES 
WHERE ADDRESS LIKE '[E][L]% '

The output:

Query to populate the only records which start with only two characters

As you can see from the image above, the query retrieved only record where the value of the address column has “E” as the first character and “L” as the second character.

Example 3: Retrieve rows consisting of two characters within a definite range

We want to retrieve those rows only, which have two characters in the A to Z range.

We need the following query structure:

USE demodatabase 
go 

SELECT * 
FROM   [patient_addresses] 
WHERE  address LIKE '[A-Z][A-Z]'

The output:

The output of the query to retrieve only those rows which have two characters, and those characters should be between A to Z

This query returned the results, each consisting of precisely two characters. The values of both characters are between A and Z.

Example 4: Get the data with the first character within a definite range

We want to retrieve the data where the first character is between K to P, and the rest of the string is the same.

Use the following structure:

USE DEMODATABASE 
GO 

SELECT * 
FROM   [PATIENT_ADDRESSES] 
WHERE  ADDRESS LIKE '[K-P]%'

The output:

The output of the script to retrieve data, where the first character will be between K to P and the rest of the string, will be the same

Similarly, we can retrieve data where the last three characters will be “BAD,” and, except those characters, the string will remain the same.

To do that, execute the following query:

USE demodatabase 
go 

SELECT * 
FROM   [patient_addresses] 
WHERE  address LIKE '%BAD'

The output ща the query retrieve data where the last three characters will be “BAD”, and except those characters, the string will remain the same

Example 5: Retrieve the data with the concrete first character of the string

We want to retrieve the list of addresses where the first character of the string must be between E and H. The rest of the string must remain the same. Here, we need to use regex [X-Y]%

Execute the following query:

SELECT * 
FROM   [PATIENT_ADDRESSES] 
WHERE  ADDRESS LIKE '[E-H]%'

The output:

The output of the script to retrieve the list of addresses where the first character of the string must be between E and H, and the remaining string must remain the same

Similarly, we can retrieve the list of addresses where the last character of the address column is between A and C, and the rest of the string remains the same. There, we must use regex %[X-Y]

Execute the following query:

SELECT * 
FROM   PATIENT_ADDRESSES 
WHERE  ADDRESS LIKE '%[A-C]'

The output:

The output of the script to retrieve the list of addresses where the last character of the address column must be between A and C, and the remaining string must remain the same

Now, refer to more complex examples.

Example 6: Populate the data with the identifier, excluding certain ranges

We want to populate the records from the address table where the last character must not be between B and D. To do that, we use regex %[^X-Y]

Execute the query:

SELECT * 
FROM   [PATIENT_ADDRESSES] 
WHERE  ADDRESS LIKE '%[^A-D]'

The output:

The output of the query to populate the records from the address table where the last character must not be between B and D

Similarly, we want to retrieve those records with the first character of the address not belonging to the A to Z range. To do that, we should use the regex [^X-Y]% pattern.

Execute the following query:

SELECT * 
FROM   [PATIENT_ADDRESSES] 
WHERE  ADDRESS LIKE '[^A-Z]%'

Find the specific string pattern

Using Regular expression, we can find a specific text pattern. For example, we want to populate records that match the following patterns:

  1. Any characters are allowed at first (first %).
  2. The third character should be either I or S.
  3. The fourth and fifth characters will be SE – this combination is static.
  4. The sixth character will be a space.
  5. Any character is allowed after that (last %).

To populate the record, execute the following query:

SELECT * 
FROM   [patient_addresses] 
WHERE  address LIKE '%[IS]SE[ ]%'

The output:

The output of the query to populate the record

Summary

This article covered the definition of a regular expression and its application. Its main goal was to present an overview of the LIKE operator and illustrate its usage as a regular expression in different use cases.

Nisarg Upadhyay

Nisarg Upadhyay

Nisarg is a SQL Server Database Administrator and Microsoft certified professional who has more than 5 years of experience with SQL Server administration and 2 years with Oracle 10g database administration. He has expertise in database design, performance tuning, backup and recovery, HA and DR setup, database migrations and upgrades. He has completed the Bachelors in Information Technology from Ganpat University.