Table partitioning is a technique used in SQL Server to physically organize the data stored in tables in different storage structures. In essence, it is splitting one large logical structure into smaller parts physically. The result is, we can improve performance for certain kinds of queries. More importantly, we can move data about using certain techniques.
We also need to clarify the filegroups concept because filegroups are that layer of abstraction used to separate data physically.
A filegroup is a logical construct that allows the SQL Server to see a collection of physical data files as a single logical unit. When an SQL Server writes the data into the filegroup, the data is spread across the files belonging to that filegroup. It is a File Group – a group of data files.
Tables Sit on File Groups
Whenever we create a table, it is created on a filegroup – the PRIMARY filegroup by default (if you don’t specify the filegroup at once). However, you can choose to place a table on a different filegroup if you wish.
It is different, though, when you create a partitioned table. Then it must sit on a Partition Scheme.
Partition Schemes and Functions
A Partition Scheme maps a table to a set of filegroups. A Partition Function defines the criteria by which data is distributed across the filegroups belonging to the Partition Scheme. Thus, when we create a partitioned table, we must first create a Partition Function, and then a Partition Scheme.
Why Partitioned Tables?
Partitioned tables are helpful for several reasons. First, it is performance benefits when filegroups are sitting on separate disks. Then, we work with the appropriate degree of parallelism and queries that span one partition. Another reason is the maintenance benefit, specifically data archiving achieved by switching out partitions. In the following article you can get the details on the entire process: Switching Out Table Partitions in SQL Server: A Walkthrough. Other benefits include online index rebuilds, parallel operations, piecemeal restores of filegroups, etc.
We start by creating a regular table as shown in the following code:
-- Listing 1: CREATE TABLE Statement use DB2 GO CREATE TABLE memmanofarms ( fname VARCHAR(50) ,lname VARCHAR(50) ,city VARCHAR(50) ,PhoneNo bigint ,email VARCHAR(100) check (email like '%@%') ,gender char(1) )
Since we did not specify any filegroup or partition function, the table sits in the PRIMARY filegroup:
Using the code in Listing 2, we populate the table with six unique records. These records are replicated a different number of times per row.
-- Listing 2: Populate Table USE DB2 GO INSERT INTO memmanofarms VALUES ('Kenneth','Igiri','Accra','23320055444','firstname.lastname@example.org','M'); GO 1100 INSERT INTO memmanofarms VALUES ('Vivian','Akeredolu','Lagos','2348020055444','email@example.com','F'); GO 720 INSERT INTO memmanofarms VALUES ('Emelia','Okoro','Port Harcourt','2348030057324','firstname.lastname@example.org','F'); GO 400 INSERT INTO memmanofarms VALUES ('Uche','Igiri','Enugu','2348030057324','email@example.com','M'); GO 1000 INSERT INTO memmanofarms VALUES ('Kweku','Annan','Kumasi','23354055884','firstname.lastname@example.org','M'); GO 150 INSERT INTO memmanofarms VALUES ('Aisha','Bello','Kano','2347088057324','email@example.com','F'); GO 890
We can confirm the row count for each city using queries in Listing 3. It also helps us to get a baseline of the way the query is executed when our table is not partitioned.
-- Listing 3: Count Rows USE DB2 GO SET STATISTICS IO ON; SET STATISTICS TIME ON; SELECT COUNT(*) FROM memmanofarms; SELECT COUNT(*) FROM memmanofarms WHERE city='Accra'; SELECT COUNT(*) FROM memmanofarms WHERE city='Lagos'; SELECT COUNT(*) FROM memmanofarms WHERE city='Port Harcourt'; SELECT COUNT(*) FROM memmanofarms WHERE city='Enugu'; SELECT COUNT(*) FROM memmanofarms WHERE city='Kumasi'; SELECT COUNT(*) FROM memmanofarms WHERE city='Kano';
Taking this further, we use the code in Listing 4 to set up objects required for Table Partitioning on DB2 databases. Notice that for N partitions (and N filegroups), there will always be N-1 boundaries.
-- Listing 4: Set Up Partitioning -- Create a Partition Function USE [DB2] GO CREATE PARTITION FUNCTION PartFunc (VARCHAR(50)) AS RANGE RIGHT FOR VALUES ('Accra' ,'Enugu' ,'Kano' ,'Kumasi' ,'Lagos' ,'Port Harcourt' ) GO -- Create File Groups USE [master] GO ALTER DATABASE [DB2] ADD FILEGROUP [AC] ALTER DATABASE [DB2] ADD FILEGROUP [EN] ALTER DATABASE [DB2] ADD FILEGROUP [KA] ALTER DATABASE [DB2] ADD FILEGROUP [KU] ALTER DATABASE [DB2] ADD FILEGROUP [LA] ALTER DATABASE [DB2] ADD FILEGROUP [PH] ALTER DATABASE [DB2] ADD FILEGROUP [OT] GO -- Add Files to the File Groups USE [master] GO ALTER DATABASE [DB2] ADD FILE ( NAME = N'AC01', FILENAME = N'C:\MSSQL\Data\AC01.ndf' , SIZE = 102400KB , FILEGROWTH = 131072KB ) TO FILEGROUP [AC]; ALTER DATABASE [DB2] ADD FILE ( NAME = N'EN01', FILENAME = N'C:\MSSQL\Data\EN01.ndf' , SIZE = 102400KB , FILEGROWTH = 131072KB ) TO FILEGROUP [EN]; ALTER DATABASE [DB2] ADD FILE ( NAME = N'KA01', FILENAME = N'C:\MSSQL\Data\KA01.ndf' , SIZE = 102400KB , FILEGROWTH = 131072KB ) TO FILEGROUP [KA]; ALTER DATABASE [DB2] ADD FILE ( NAME = N'KU01', FILENAME = N'C:\MSSQL\Data\KU01.ndf' , SIZE = 102400KB , FILEGROWTH = 131072KB ) TO FILEGROUP [KU]; ALTER DATABASE [DB2] ADD FILE ( NAME = N'LA01', FILENAME = N'C:\MSSQL\Data\LA01.ndf' , SIZE = 102400KB , FILEGROWTH = 131072KB ) TO FILEGROUP [LA]; ALTER DATABASE [DB2] ADD FILE ( NAME = N'PH01', FILENAME = N'C:\MSSQL\Data\PH01.ndf' , SIZE = 102400KB , FILEGROWTH = 131072KB ) TO FILEGROUP [PH]; ALTER DATABASE [DB2] ADD FILE ( NAME = N'OT01', FILENAME = N'C:\MSSQL\Data\OT01.ndf' , SIZE = 102400KB , FILEGROWTH = 131072KB ) TO FILEGROUP [OT]; GO -- Create a Partition Scheme USE [DB2] GO CREATE PARTITION SCHEME PartSch AS PARTITION PartFunc TO ( AC, EN, KA, KU, LA, PH, OT ) GO
Once we have the foundation, we can move our regular table from the PRIMARY filegroup to the Partition Function we created. We do this by rebuilding the clustered index as shown in Listing 5.
Notice that the column we used to create the partition column must be a part of the clustered index column listing. Also, when we specify the Partition Scheme, we must indicate this column – city.
-- Listing 5: Move table to New Partition CREATE CLUSTERED INDEX [ClusteredIndexCity] ON [dbo].[memmanofarms] ( [city] ASC, [PhoneNo] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PartSch](city); GO ‘
After running this code, the table sits on the Partition Scheme:
A Little on Performance
When we run both questions with the partitioned and non-partitioned tables, we see little difference in performance without the index. In both cases, SQL Server uses an index seek when the clustered index is in place and a full table scan otherwise. However, we have more reads when running these queries on the partitioned table. This is expected since we have in effect forced distribution of the data across “distant” pages.
We are using the following code in Listing 6 as a benchmark of sorts.
-- Listing 6: Querying Partitioned and Non-Partitioned Tables SELECT COUNT(*) FROM memmanofarms WHERE city='Accra'; SELECT * FROM memmanofarms WHERE city='Accra';
If you are interested in obtaining more information from the Query Plan Diagram, you may refer to its more advanced version available in dbForge Studio for SQL Server – it also includes the Query Profiler functionality for your needs.
We have explored the process of creating a partitioned table. The references section lists more resources related to the use of partitioning for archiving old data. We have also shown that partitioning does not necessarily introduce a significant improvement in performance without other enhancements such as the right CPU and proper MAXDOP configuration.