SQL Server – Dissect the Internals of sp_spaceused

This article is an effort to dissect the output of the sp_spaceused stored procedure.

Introduction

Understanding the database usage internals and the growth trends play a vital role in the defining the right sizing of the database. sp_spaceused is probably an administrator’s most widely-executed system stored procedure to find the disk space used by a database. This helps get a quick glimpse of the database usage. statistics. sp_spaceused is used to display the number of rows, the data size, index size, amount of used space, unused space by each object, and the unallocated size of the database. Although looking at the values given by sp_spaceused, one shouldn’t think of shrinking the database or data file or log file. Many a time, we are unaware of what we are doing. Many a time, we don’t know what would be the aftereffects of doing such resource intrinsic operations. The output of sp_spaceused tells us a lot about the current performance of the database. The unallocated column and the unused column tell us the free space left at the database and the table levels.

This article considers:

  1. A peek into sp_spaceused
  2. Impact of the auto-growth setting on the columns, unallocated and unused
  3. Finding the space usage details at the database and the instance levels
  4. Measuring the auto-growth events
  5. Finding the mdf and ldf file sizes
  6. Factors determining the performance of the database
  7. And more…

Internals of sp_spaceused

Capture space usage details of all the tables

In the below T-SQL, the undocumented stored procedure sp_MSforeachtable is used to traverse all the tables within the scope of the current database context to get the space usage metrics of all the tables in the context.

Capture space usage details of all the databases

The undocumented stored procedure sp_MSforeachDB is used to traverse the entire database within the scope of the current SQL instance to get the space usage information of all the databases.

Here, database_name is the name of the database; in this case, PythonSample. database_size is the Unallocated+Reserved+Data+Index+Unused = MDF +LDF (= 848 MB in this case). The unallocated space here is 51.94 MB.

This is, in reality, the disk boundary that’s been marked for the database. The sp_spaceused outputs unallocated column defined at the database level and it’s not reserved for any table and may be taken by the first object that claims more room to grow.

The unallocated space is the free space inside the data file so that it doesn’t have to auto-grow every time you issue a query; usually, the SQL Server Storage Engine manages the autogrowth using a mechanism known as Proportional Fill Algorithm. The management of the extents is done effectively based on the number of writes happening on the files. And at the same time, when the used space reaches a threshold, an event is triggered for further auto-growth. Setting the right value of unallocated space depends on the needs and the situations, and the nature of use of the database. Unallocated space is the space that is not-yet-in-use, and is “up for grabs”. In essence, these extents are marked with bit 1 in the GAM page. Understanding the concept of autogrowth from above, any type of growth may generate further unallocated extents.

Using the following SQL query, we can see the number of times the auto-growth event was generated, along with the amount of time the database has kept on hold for the process.

Let’s look at what each of the terms means:

Reserved: The space reserved for use by database objects = (Data +Index + Unused) = 476704 + 1280 + 1312 = 479296 KB. This indicates how full the objects are; ideally, 10% of unused space is expected for transactional tables.

Data: The actual size of the data. This is the sum of all the data files of the database.

Index: The amount of space used by the index.

Note: In some cases, I’ve seen that the size of the index size is greater than the size of actual data. As far as indexes are concerned, what is needed for the system is always dependent on the performance of the database. Many times, the read operations are more important than the write operations. And in some other cases, writes are more important than the reads. In cases where the business has decided that reads are far more important than writes, that system might need tons of indexes in order to satisfy the performance requirements of the business and the users.

Unused: A portion of the reserved space, which is not yet used

Unused are pages on allocated extents but are not yet used by any object. As soon as an extent is allocated (either as a uniform or a shared extent), we get eight reserved pages on that extent. Some pages are used, and some are unused.

The unused and unallocated columns in the output may be confusing. To clarify, the unused column output doesn’t show the amount of free space left in the whole database. It is, instead, the total amount of space reserved for tables but not filled with data. In many cases, the unused space may be reclaimed by creating a clustered index or managing the existing indexes.

The output of sp_spaceused can further be simplified to find the size of the .mdf file and .log files. The sum of the reserved space and unallocated space is more or less equal to the size of the data—or MDF—file. Also, subtracting the MDF file size from the database size gives the log file size.

So here are two formulas:

Size of the MDF file = reserved + unallocated space

Size of the log file = Database_Size – MDF File size

The aforementioned points tell us how each of the columns in the output of sp_spaceused is interpreted, calculated and analyzed.

Impact of auto-growth setting

The initial sizes and the auto-growth configuration have a significant effect on the unused space. Setting the right values for these is a challenge. I have seen many cases where the auto-growth was set to grow in terms of percentages. Let’s assume that the auto-growth is set at 25% for a data file size of 100 GB. It takes only 4 auto-growth events to fill the disk drive.

The other case is rebuilding the indexes. This operation has a direct impact on the unused space of the table, as the data is reshuffled between the uniform and mixed extents. In few cases, while reshuffling the pages, the operation may induce unallocated space because of the auto-growth setting of the data file.

Let’s consider a scenario where the auto-growth setting is not rightly set on the database. This is again an issue: If auto-growth is enabled on the database, it means that the drive expansion takes place automatically during some event even if the data does not use all the space.

It’s always a good practice to set the appropriate auto-growth setting for the data file. Sometimes, the incorrect setting of the data file may inject physical fragmentation, resulting in severe performance degradation of the system. In other words, if you do not have unallocated space, the new data will try to sit in empty locations that may be scattered. This applies to the log file as well. The unallocated space in the database indirectly influences the auto-growth setting of the data file and the log file and directly influences the performance. They key is finding the right balance.

Wrapping Up

  1. In the process of database creation, the defined size (i.e. is the initial size) is nothing but the actual size of the database. This initial size is recorded in the Page Header. During a database shrink process, the process uses the minimum size property as the reference, only if the actual data size is less than the minimum size—minimum size is also found in the Page Header, and can be seen using the DBCC PAGE command. Also, the same process holds good for DBCC SHRINKFILE, which shrinks files to less than their initial sizes.
  2. It is not a recommended practice to shrink the database to free up the drive space, although, the decision depends on the scenario—unusual scenarios may warrant an unconventional action. However, one must remember that shrinking a database introduces fragmentation in the database. It is always a good practice to analyze the root cause of the unallocated space and unused space of the objects. In many cases, expanding the disk to handle the data growth would be a viable/recommended option.
  3. Auto-growth configuration: When SQL Server performs an auto-grow operation, the transaction that triggered the auto-grow event will have to wait until the auto-grow event completes. Only then can the transaction itself complete.
  4. It is always recommended to set the auto-growth options in numbers instead of percentages.
  5. The induction of unused space in the table could be because of the following reasons:
    • Fragmentation
      When data is fragmented due to its nature and type of the definition, some unused space is generated. Also, a frequent modification of the data (All UPDATE, INSERT OR DELETE operations) leads to more number of page splits, which is more likely to generate unused space in the table.
    • No clustered index on the table
      To reduce fragmentation in a Heap, one can think of creating a clustered index on the table. To reduce index fragmentation, perform index maintenance by determining the avg_fragmentation_in_percent value.
    • Size of the data
      In some cases, using appropriate Data Types yields smaller data rows which in its turn allows placing more rows in a page. It not only reduces the internal unused space but also has an impact on the performance by reducing the number of page splits.
  6. The unused space may also be a result of dropping the variable-length column. Use DBCC CLEANTABLE after you make significant changes to the variable-length columns in a table or indexed view in order to immediately reclaim the unused space. Alternatively, you can rebuild the indexes on the table or view; however, this is a more resource-intensive operation.
  7. The unused space is relatively larger when we end up in loading relatively larger data (>8 KB). In such cases, we end up with large amounts of unused space on the data pages.
  8. After a SharePoint Migration, one can see a significant amount of the unused space introduced in the databases. The reclamation is a slower process, the ghost cleanup process removes these pages and the free-up happens over a period of time.
  9. In some cases, the values of sp_spaceused might not be correct. Although sp_spaceused gets its information from the system object which holds all the estimates, it can sometimes be inaccurate. One of the reasons for this is that during a database migration, or in case of outdated statistics, or when the system is undergoing frequent DDL modifications, or after performing huge bulk-copy operations. To synchronize system objects, use the DBCC updateusage(0) or DBCC CHECKTABLE statements to ensure that sp_spaceused returns up-to-date accurate data. However, remember that DBCC commands are resource-intensive; have a good understanding of the implication of its usage. When we execute the DBCC updateusage command, the SQL Server Database Engine scans the data pages in the database and makes necessary corrections to the sys.allocation_units and sys.partitions catalog views regarding the storage space used by each table.

References

Prashanth Jayaram

Prashanth Jayaram

Prashanth Jayaram is working as a DB Manager for a leading MNC, Database Technologist, Author, Blogger, Automation Expert, Technet WIKI Ninja, and Powershell Geek.
Prashanth Jayaram

Latest posts by Prashanth Jayaram (see all)

Prashanth Jayaram

Prashanth Jayaram is working as a DB Manager for a leading MNC, Database Technologist, Author, Blogger, Automation Expert, Technet WIKI Ninja, and Powershell Geek.