Written by 16:20 SQL Server

Primary Keys in SQL Server

Codingsight - Primary Keys in SQL Server

Across all major RDBMS products, Primary Key in SQL constraints has a vital role. They identify the records present in a table uniquely. Hence, we should choose Primary Keys server carefully during the table design to improve the performance.

In this article, we’ll learn what a Primary Key constraint is. Also, we’ll see how to create, modify, or drop Primary Keys constraints.

SQL Server Constraints

In SQL Server, Constraints are rules that regulate entering the data into the necessary columns.  Constraints enforce the accuracy of data and how that data match business requirements. Also, they make the data reliable to end-users. That’s why it is critical to identify correct Constraints during the design phase of the database or table schema.

SQL Server supports the following Constraint types to enforce data integrity:

Primary Key Constraints are created on a single column or combination of columns to enforce the uniqueness of records and identify the records faster. The involved columns should not hold NULL values. Hence, the NOT NULL property should be defined on the columns.

Foreign Key Constraints are created on a single column or combination of columns to create a relationship between two tables and enforce the data present in one table to another. Ideally, table columns where we need to enforce the data with Foreign Key constraints refer to the Source table with a Primary Key in SQL or Unique Key constraint. In other words, only the records available in the Source table’s Primary or Unique Key constraints can be inserted or updated to the Destination table.

Unique Key Constraints are created on a single column or combination of columns to enforce uniqueness across the column data. They are similar to the Primary Key constraints with a single change. The difference between Primary Key and Unique Key Constraints is that the latter can be created on Nullable columns and allow one NULL value record in its column.

Check Constraints are created on a single column or combination of columns by restricting the accepted data values for the columns involved via a logical expression. There is a difference between Foreign Key and Check Constraints. The Foreign key enforces data integrity by checking data from another table’s Primary or Unique Key. However, the Check Constraint does this by using a logical expression.

Now, let’s go through the Primary Key Constraints.

Primary Key Constraint

Primary Key Constraint enforces uniqueness on a single column or combination of columns without any NULL values inside the involved columns.

To enforce uniqueness, SQL Server creates a unique clustered index on the columns where Primary Keys were created out. If there are any existing clustered indexes, SQL Server creates a Unique Non-clustered index on the table for the Primary Key.

Let’s see how we Create, Modify, Drop, Disable, or Enable Primary Keys on a table using T-SQL scripts.

Create a Primary Key

We can create Primary keys on a table either during the table creation or after that. The syntax varies slightly for these scenarios.

Creation of the Primary Key During the Table Creation

The syntax is below:

CREATE TABLE SCHEMA_NAME.TABLE_NAME
( 
  COLUMN1 datatype [ NULL | NOT NULL ] PRIMARY KEY,
  COLUMN2 datatype [ NULL | NOT NULL ],
  ...
);

Let’s create a table named Employees in the HumanResources schema for testing purposes with the below script:

CREATE TABLE HumanResources.Employees
( Employee_Id INT IDENTITY NOT NULL PRIMARY KEY,
  First_Name VARCHAR(100) NOT NULL,
  Last_Name VARCHAR(100) NOT NULL,
  DOB DATETIME,
  Dept varchar(100),
  Salary Money
);

We have successfully created the HumanResources.Employees table on the AdventureWorks database:

HumanResources.Employees table on the AdventureWorks database

We can see that the clustered index was created on the table matching the Primary Key name as highlighted above.

Let’s drop the table using the below script and try again with the new syntax.

DROP TABLE HumanResources.Employees

To create the Primary Key in SQL on a table with the user-defined Primary Key name PK_Employees, use the below syntax:

CREATE TABLE HumanResources.Employees
( Employee_Id INT IDENTITY NOT NULL,
  First_Name VARCHAR(100) NOT NULL,
  Last_Name VARCHAR(100) NOT NULL,
  DOB DATETIME,
  Dept varchar(100),
  Salary Money,
  CONSTRAINT PK_Employees PRIMARY KEY (Employee_Id)
);

We have created the HumanResources.Employees table with the Primary Key name PK_Employees:

HumanResources.Employees table with the Primary Key name

Creation of the Primary Key After the Table Creation

Sometimes, developers or DBAs forget about the Primary Keys and create tables without them. But it is possible to create a Primary key on existing tables.

Let’s drop the HumanResources.Employees table and recreate it using the below script:

DROP TABLE HumanResources.Employees
GO
CREATE TABLE HumanResources.Employees
( Employee_Id INT IDENTITY NOT NULL,
  First_Name VARCHAR(100) NOT NULL,
  Last_Name VARCHAR(100) NOT NULL,
  DOB DATETIME,
  Dept varchar(100),
  Salary Money
);
GO

When you execute this script successfully, we can see the HumanResources.Employees table created without any Primary keys or indexes:

HumanResources.Employees table without any Primary keys or indexes

To create a Primary Key named PK_Employees on this table, use the below syntax:

ALTER TABLE <schema_name>.<table_name>
ADD CONSTRAINT <constraint_name> PRIMARY KEY ( <column_name> );

Execution of this script creates the Primary key on our table:

ALTER TABLE HumanResources.Employees
ADD CONSTRAINT PK_Employees PRIMARY KEY (Employee_ID);
HumanResources.Employees table - create Primary keys

Primary Key Creation on Multiple Columns

In our examples, we created Primary Keys on single columns. If we want to create Primary keys on multiple columns, we need a different syntax.

To add multiple columns as part of the Primary Key, we need to simply add comma-separated values of the column names that should be part of the Primary Key.

Primary Key During the Table Creation

CREATE TABLE HumanResources.Employees
( First_Name VARCHAR(100) NOT NULL,
  Last_Name VARCHAR(100) NOT NULL,
  DOB DATETIME,
  Dept varchar(100),
  Salary Money,
  CONSTRAINT PK_Employees PRIMARY KEY (First_Name, Last_Name)
);
GO

Primary Key After the Table Creation

CREATE TABLE HumanResources.Employees
( First_Name VARCHAR(100) NOT NULL,
  Last_Name VARCHAR(100) NOT NULL,
  DOB DATETIME,
  Dept varchar(100),
  Salary Money
);
GO
ALTER TABLE HumanResources.Employees
ADD CONSTRAINT PK_Employees PRIMARY KEY (First_Name, Last_Name);
GO

Drop the Primary Key

To Drop the Primary Key, we use the below syntax. It does not matter if the Primary Key was on single or multiple columns.

ALTER TABLE <schema_name>.<table_name>
DROP CONSTRAINT <constraint_name> ;

To drop the Primary Key Constraint on HumanResources.Employees table, use the below script:

ALTER TABLE HumanResources.Employees
DROP CONSTRAINT PK_Employees;

Dropping the Primary Key removes both the Primary Keys and Clustered or Non-clustered indexes created along with the Primary key creation:

HumanResources.Employees table - drop the primary key

Modify the Primary Key

In SQL Server, there are no direct commands to modify Primary keys. We need to Drop an existing Primary Key and recreate it with necessary modifications. Hence, the steps to modify the Primary key are:

  1. Drop an existing Primary Key.
  2. Create new Primary Keys with necessary changes.

Disable/Enable Primary Key

While performing bulk load on a table with many records, disable the Primary Key and enable it back for better performance. The steps are below:

Disable the existing Primary Key with the below syntax:

ALTER INDEX <index_name> ON <schema_name>.<table_name> DISABLE;

To disable the Primary Key on the HumanResources.Employees table, the script is: 

ALTER INDEX PK_Employees ON HumanResources.Employees
DISABLE;

Enable existing Primary Keys that are in the disabled state. We need to REBUILD the index using the below syntax:

ALTER INDEX <index_name> ON <schema_name>.<table_name> REBUILD;

To enable the Primary Key on the HumanResources.Employees table, use the following script:

ALTER INDEX PK_Employees ON HumanResources.Employees
REBUILD;

The Myths about the Primary Key

A lot of people get confused about the below myths related to Primary Keys in SQL Server.

  • Table with Primary Key is not a Heap Table
  • Primary Keys have the Clustered Index and data sorted in Physical Order

Let’s clarify them.

Table with Primary Key is not a Heap Table

Before diving deeper, let us revise the definition of the Primary Key and the Heap Table.

The Primary Key creates a Clustered Index on a table if there are no other clustered indexes available there. A Table without a Clustered Index will be a Heap Table.

Based upon these definitions, we can understand that the Primary key creates a Clustered Index only if there are no other Clustered Indexes on the table. If there are any existing clustered indexes, the Primary Key creation will create a Non-clustered index on the table matching the Primary Key.

Let’s verify this by dropping the HumanResources.Employees Table and recreating it:

DROP TABLE HumanResources.Employees
GO
CREATE TABLE HumanResources.Employees
( Employee_Id INT IDENTITY NOT NULL,
  First_Name VARCHAR(100) NOT NULL,
  Last_Name VARCHAR(100) NOT NULL,
  DOB DATETIME,
  Dept varchar(100),
  Salary Money,
  CONSTRAINT PK_Employees PRIMARY KEY NONCLUSTERED (Employee_Id)
);
GO

We can specify the NONCLUSTERED index option for the Primary Key (see above). A table was created with a Unique, non-clustered index for the Primary Key PK_Employees.

Unique, non-clustered index for the Primary Key

Hence, this table is a Heap Table even though it has a Primary Key.

Let’s see whether SQL Server can create a non-clustered index for the Primary Key if we don’t specify the keyword Non-clustered during the Primary key creation. Use the below script:

DROP TABLE HumanResources.Employees
GO
CREATE TABLE HumanResources.Employees
( Employee_Id INT IDENTITY NOT NULL,
  First_Name VARCHAR(100) NOT NULL,
  Last_Name VARCHAR(100) NOT NULL,
  DOB DATETIME,
  Dept varchar(100),
  Salary Money
);
GO

-- Create Clustered Index on Employee_Id column before creating Primary Key
CREATE CLUSTERED INDEX IX_Employee_ID ON HumanResources.Employees(First_Name, Last_Name);
GO

-- Create Primary Key on Employee_Id column
ALTER TABLE HumanResources.Employees
ADD CONSTRAINT PK_Employees PRIMARY KEY (Employee_ID);
GO
Create a clustered index before the Primary Key creation

Here we have created a clustered index separately before the Primary Key creation. And a table can have only one clustered index. Therefore, SQL Server has created the Primary Key as a Unique, Non-clustered index. Right now, the table is not a Heap table because it has a clustered index.

If I changed my mind and dropped the clustered index on the First_Name and Last_Name columns using the below script:

DROP INDEX IX_Employee_ID ON HumanResources.Employees;
GO

We have dropped the clustered index successfully. The HumanResources.Employees table is a Heap table even though we have a Primary Key available in the table:

HumanResources.Employees table - Heap table

This clears the myth that a table with a Primary Key can be a Heap table if there are no clustered indexes available on the table.

Primary Key will have a Clustered Index and Data Sorted in Physical Order

As we learned from the previous example, a Primary key in SQL can have a non-clustered index. In that case, the records wouldn’t be sorted in Physical order.

Let’s verify the table with the clustered index on a Primary key. We are going to check whether it sorts the records in Physical order.

Recreate the HumanResources.Employees table with minimal columns and the IDENTITY property removed for the Employee_ID column:

DROP TABLE HumanResources.Employees
GO
CREATE TABLE HumanResources.Employees
( Employee_Id INT NOT NULL,
  First_Name VARCHAR(100) NOT NULL,
  Last_Name VARCHAR(100) NOT NULL
);
GO

Now that we have created the table without the Primary Key or a clustered index, we can INSERT 3 records in a non-sorted order for the Employee_Id column:

INSERT INTO HumanResources.Employees ( Employee_Id, First_Name, Last_Name)
VALUES
(3, 'Antony', 'Mark'),
(1, 'James', 'Cameroon'),
(2, 'Jackie', 'Chan')
the table without the Primary Key

Let’s select from the HumanResources.Employees table:

SELECT * 
FROM HumanResources.Employees
table without the Primary Key

We can see the records fetched in the same order as the records inserted from the Heap table at this moment.

Let’s create a Primary Key on this Heap table and see whether it has any impact on the SELECT statement:

ALTER TABLE HumanResources.Employees
ADD CONSTRAINT PK_Employees PRIMARY KEY (Employee_ID);
GO
SELECT * 
FROM HumanResources.Employees
Primary Key on the Heap table

After the creation of the Primary Key, we can see that the SELECT statement fetched records in Ascending order of the Employee_Id (Primary Key column). It is due to the clustered index on Employee_Id.

If a Primary Key is created with the non-clustered option, the table data won’t be sorted based upon the Primary Key column.

If the length of a single record in a table exceeds 4030 bytes, only one record can fit a page. The clustered index ensures that the pages are in Physical order.

A page is a fundamental unit of storage in SQL Server data files with a size of 8 KB (8192 bytes). Only 8060 bytes of that unit are usable for data storage. The remaining amount is for page headers and other internals.

Tips to Choose Primary Key Columns

  • Integer data type columns are best suited for Primary Key Columns as they occupy smaller storage sizes and can help retrieve the data faster.
  • Since Primary Key Columns have a clustered index by default, use the IDENTITY option on integer data type columns to have new values generated in incremental order.
  • Instead of creating a Primary key on Multiple columns, create a new integer column with the IDENTITY property defined. Also, create a unique index on multiple columns that were originally identified for better performance.
  • Try to avoid columns with string datatype like varchar, nvarchar, etc. We can’t guarantee the sequential increment of data on these datatypes. It can impact the INSERT performance on these columns.
  • Choose columns where values won’t get updated as Primary Keys. For example, if the Primary Key value can change from 5 to 1000, the B-tree associated with the clustered index needs to be updated resulting in slight performance degrades.
  • If string datatype columns need to be chosen as Primary Key columns, ensure that the length of varchar or nvarchar datatype columns remains small for better performance.

Conclusion

We have gone through the basics of Constraints available in SQL Server. We examined the Primary Key constraints in detail and learned how to create, drop, modify, disable, and rebuild Primary keys. In addition, we have clarified some popular myths around Primary keys with examples.

Stay tuned for the next article!

Tags: , , Last modified: February 22, 2022
Close