Storing XML data in SQL Server

Total: 16 Average: 3.7

When working on the release of dbForge Transaction Log, among other tasks, our team had to puzzle out how to properly store typed XML data.

To start with, it is worth mentioning that SQL Server does not store XML in the format it was entered. An XML string is parsed, split to tags, and thus is stored in a compressed format. Description elements that the server considers unnecessary are discarded.

It also should be kept in mind that, if the data type of a column is specified as simple XML, the server will store this data as Unicode strings.
Example 1.

CREATE TABLE XmlValuesTable (
  [uid] [int] IDENTITY PRIMARY KEY,
  v XML NOT NULL );
GO
INSERT INTO XmlValuesTable (v)
VALUES ('<note><float>123.456</float><time>01:23:45.789</time></note>');
INSERT INTO XmlValuesTable (v)
VALUES ('<note><float>4.0000000000</float><time>01:23:45Z</time></note>');

The server will store the insert data as follows:

F0 04 6E006F0074006500 <- Name "note"
EF 000001 <- Namespace 01
F8 01 <- tag 01
F0 05 66006C006F0061007400 <- Name "float"
EF 000002 <- Namespace 02
F8 02 <- tag 02
11 07 3100320033002E00340035003600 <- string "123.456"
F7 <- closing tag
F0 04 740069006D006500 <- Name "time"
EF 000003 <- Namespace 02
F8 03 <- tag 03
11 0C 300031003A00320033003A00340035002E00370038003900 <- string "01:23:45.789"
F7 <- closing tag
F7 <- closing tag

In the following example, the column data type is specified as typed through XML Schema Collection.

Example 2.

CREATE XML SCHEMA COLLECTION [XmlValuesSchemaCollection_datetime2] AS
'<?xml version="1.0"?> 
<xsd:schema
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes"

  <xsd:element name="datetime2" type="sqltypes:datetime2"/> 
</xsd:schema>';
GO

CREATE TABLE XmlValuesTable_datetime2 (
  [uid] [int] IDENTITY PRIMARY KEY,
  v XML(XmlValuesSchemaCollection_datetime2) NOT NULL
);
GO

INSERT INTO XmlValuesTable_datetime2 (v)
VALUES (N'<datetime2>2014-06-18T06:39:05.190</datetime2>');
GO

In this particular case, the server will store the insert data as follows:

EA 09 014C010015 1A000000 <- type info 0x14C (332) “datetime2”, 0x15 (21) “dateTime” + offset
F0 09 6400610074006500740069006D0065003200 <- Name "datetime2"
EF 000001 <- Namespace 01
F8 01 <- tag 01
EA 05 004C010015 <- type info
7E 02978924A9380B <- "2014-06-18T06:39:05.190"
F7 <- closing tag

In this way, the server converts the stored data to types specified in the addendum to this article (you can see the list of all data types by running the “select * from sys.xml_schema_types” query on the server).

Let’s take a look at how the server will save a more complex structure similar to the one in Example 1 and described with XML Schema Collection.

Example 3.

CREATE XML SCHEMA COLLECTION [XmlValuesSchemaCollection] AS
'<?xml version="1.0"?>
<xsd:schema
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes"
  attributeFormDefault="unqualified" elementFormDefault="qualified"> 
  <xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes"
schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sql2008/sqltypes.xsd"/>

    <xsd:element name="note"> 
     <xsd:complexType> 
       <xsd:sequence> 
          <xsd:element name="float" type="xsd:float"/> 
          <xsd:element name="time" type="xsd:time"/> 
       </xsd:sequence> 
     </xsd:complexType> 
  </xsd:element> 
</xsd:schema>'; 
GO 

CREATE TABLE XmlValuesTable (
  [uid] [int] IDENTITY PRIMARY KEY,
  v XML(XmlValuesSchemaCollection) NOT NULL
);
GO

INSERT INTO XmlValuesTable (v)
VALUES ('<note><float>123.456</float><time>01:23:45.789</time></note>');

The server will save the insert data as follows:

EA 05 0001000100 <- type info
F0 04 6E006F0074006500 <- Name "note"
EF 000001 <- Namespace
F8 01 <- tag 01
EA 09 0111000011 12000000 <- type info 0x11 (17) "float" + offset
F0 05 66006C006F0061007400 <- Name "float"
EF 000002 <- Namespace
F8 02 <- tag 02
EA 05 0011000011 <- type info 0x11 (17) "float"
03 79E9F642 <- "123.456"
F7 <- closing tag
EA 09 0116000016 10000000 <- type info 0x16 (22) "time" + offset
F0 04 740069006D006500 <- Name "time"
EF 000003 <- Namespace
F8 03 <- tag 03
EA 05 0016000016 <- type info 0x16 (22) "time"
7D 03FDAF4C005B950A <- "01:23:45.789"
F7 <- closing tag
F7 <- closing tag

Let’s try adding a schema link to the insert.

Example 4.

INSERT INTO XmlValuesTable (v)
VALUES ('<note xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><float>123.456</float><time>01:23:45.789</time></note>');
EA 05 0001000100 <- type info
F0 04 6E006F0074006500 <- Name "note"
EF 000001 <- Namespace
F8 01 <- tag 01
F0 09 78006D006C006E0073003A00780073006900 <- Name "xmlns:xsi"
EF 000200 <- Namespace "xmlns:xsi"
F6 02 <- Attribute
11 29 68007400740070003A002F002F007700770077002E00770033002E006F00720067002F0032003000300031002F0058004D004C0053006300680065006D0061002D0069006E007300740061006E0063006500 <- "http://www.w3.org/2001/XMLSchema-instance"
F5 <- closing bracket
EA 09 0111000011 12000000 <- type info 0x11 (17) "float" + offset
F0 05 66006C006F0061007400 <- Name "float"
EF 000003 <- Namespace
F8 03 <- tag 03
EA 05 0011000011 <- type info 0x11 (17) "float"
03 79E9F642 <- "123.456"
F7 <- closing tag
EA 09 0116000016 10000000 <- type info 0x16 (22) "time" + offset
F0 04 740069006D006500  <- Name "time"
EF 000004 <- Namespace
F8 04 <- tag 08
EA 05 0016000016 <- type info 0x16 (22) "time"
7D 03FDAF4C005B950A <- "01:23:45.789"
F7 <- closing tag
F7 <- closing tag

As you can see, the server has carefully saved the namespace as an attribute and used almost half the space for this even despite the fact that the namespace doesn’t really serve any useful purpose here – the data was saved the same way it would be saved without the namespace.

Conclusion

From the above, it may seem that you can reduce the size of a database by storing some data types (e.g. float) as typed values since 4 bytes require significantly less storage than the same value saved as a Unicode string. However, you should keep in mind that additional 7-18 bytes are used for each value to describe its type and move it to the necessary position.

Addendum

Correlation of XML types, base types, and data types that the server uses to store typed values.

 

XML type Base type Stored as type Size in bytes
anyType string 2 * characters
anySimpleType anyType string
string anySimpleType string
boolean anySimpleType boolean 1
float anySimpleType float 4
double anySimpleType double 8
decimal anySimpleType SqlDecimal 20
duration anySimpleType string
dateTime anySimpleType *1
time anySimpleType *1
date anySimpleType *1
gYearMonth anySimpleType string
gYear anySimpleType string
gMonthDay anySimpleType string
gDay anySimpleType string
gMonth anySimpleType string
hexBinary anySimpleType array of bytes
base64Binary anySimpleType array of bytes
anyURI anySimpleType string
QName anySimpleType string
normalizedString string string
token string string
language string string
Name string string
NCName string string
ENTITY string string
NMTOKEN string string
integer decimal SqlDecimal 20
nonPositiveInteger integer SqlDecimal 20
negativeInteger nonPositiveInteger SqlDecimal 20
long integer SqlDecimal 20
int long SqlDecimal 20
short int SqlDecimal 20
byte short SqlDecimal 20
nonNegativeInteger integer SqlDecimal 20
unsignedLong nonNegativeInteger SqlDecimal 20
unsignedInt unsignedLong SqlDecimal 20
unsignedShort unsignedInt SqlDecimal 20
unsignedByte unsignedShort SqlDecimal 20
positiveInteger nonNegativeInteger SqlDecimal 20
char string string
nchar string string
varchar string string
nvarchar string string
text string string
ntext string string
varbinary base64Binary array of bytes
binary base64Binary array of bytes
image base64Binary array of bytes
timestamp base64Binary array of bytes
timestampNumeric long SqlDecimal 20
numeric decimal SqlDecimal 20
bigint long SqlDecimal 20
smallint short SqlDecimal 20
tinyint unsignedByte SqlDecimal 20
bit boolean boolean 1
real float float 4
datetime dateTime *1
smalldatetime dateTime *1
money decimal SqlDecimal
smallmoney decimal SqlDecimal
uniqueidentifier decimal string
datetime2 dateTime *1
datetimeoffset dateTime *1
hierarchyid string string
dbobject anyURI string

*1 – data/time information. The specific type is defined by the value.

Value Stored as type Size in bytes
DateOffset Date (amount of days) 3
DateOffset (2019-09-16+02:00) DateTimeOffset 11
DateTime DateTime 7-9 depends from precision
DateTimeOffset DateTimeOffset 9
Time DateTime 7-9 depends from precision
TimeOffset (01:23:45Z) DateTimeOffset 9
Vladimir Klimov
Latest posts by Vladimir Klimov (see all)