When working on the release of dbForge Transaction Log, among other tasks, our team had to puzzle out how to properly store typed XML data.
To start with, it is worth mentioning that SQL Server does not store XML in the format it was entered. An XML string is parsed, split to tags, and thus is stored in a compressed format. Description elements that the server considers unnecessary are discarded.
It also should be kept in mind that, if the data type of a column is specified as simple XML, the server will store this data as Unicode strings.
Example 1.
CREATE TABLE XmlValuesTable ( [uid] [int] IDENTITY PRIMARY KEY, v XML NOT NULL ); GO INSERT INTO XmlValuesTable (v) VALUES ('<note><float>123.456</float><time>01:23:45.789</time></note>'); INSERT INTO XmlValuesTable (v) VALUES ('<note><float>4.0000000000</float><time>01:23:45Z</time></note>');
The server will store the insert data as follows:
F0 04 6E006F0074006500 <- Name "note" EF 000001 <- Namespace 01 F8 01 <- tag 01 F0 05 66006C006F0061007400 <- Name "float" EF 000002 <- Namespace 02 F8 02 <- tag 02 11 07 3100320033002E00340035003600 <- string "123.456" F7 <- closing tag F0 04 740069006D006500 <- Name "time" EF 000003 <- Namespace 02 F8 03 <- tag 03 11 0C 300031003A00320033003A00340035002E00370038003900 <- string "01:23:45.789" F7 <- closing tag F7 <- closing tag
In the following example, the column data type is specified as typed through XML Schema Collection.
Example 2.
CREATE XML SCHEMA COLLECTION [XmlValuesSchemaCollection_datetime2] AS '<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" <xsd:element name="datetime2" type="sqltypes:datetime2"/> </xsd:schema>'; GO CREATE TABLE XmlValuesTable_datetime2 ( [uid] [int] IDENTITY PRIMARY KEY, v XML(XmlValuesSchemaCollection_datetime2) NOT NULL ); GO INSERT INTO XmlValuesTable_datetime2 (v) VALUES (N'<datetime2>2014-06-18T06:39:05.190</datetime2>'); GO
In this particular case, the server will store the insert data as follows:
EA 09 014C010015 1A000000 <- type info 0x14C (332) “datetime2”, 0x15 (21) “dateTime” + offset F0 09 6400610074006500740069006D0065003200 <- Name "datetime2" EF 000001 <- Namespace 01 F8 01 <- tag 01 EA 05 004C010015 <- type info 7E 02978924A9380B <- "2014-06-18T06:39:05.190" F7 <- closing tag
In this way, the server converts the stored data to types specified in the addendum to this article (you can see the list of all data types by running the “select * from sys.xml_schema_types” query on the server).
Let’s take a look at how the server will save a more complex structure similar to the one in Example 1 and described with XML Schema Collection.
Example 3.
CREATE XML SCHEMA COLLECTION [XmlValuesSchemaCollection] AS '<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" attributeFormDefault="unqualified" elementFormDefault="qualified"> <xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sql2008/sqltypes.xsd"/> <xsd:element name="note"> <xsd:complexType> <xsd:sequence> <xsd:element name="float" type="xsd:float"/> <xsd:element name="time" type="xsd:time"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>'; GO CREATE TABLE XmlValuesTable ( [uid] [int] IDENTITY PRIMARY KEY, v XML(XmlValuesSchemaCollection) NOT NULL ); GO INSERT INTO XmlValuesTable (v) VALUES ('<note><float>123.456</float><time>01:23:45.789</time></note>');
The server will save the insert data as follows:
EA 05 0001000100 <- type info F0 04 6E006F0074006500 <- Name "note" EF 000001 <- Namespace F8 01 <- tag 01 EA 09 0111000011 12000000 <- type info 0x11 (17) "float" + offset F0 05 66006C006F0061007400 <- Name "float" EF 000002 <- Namespace F8 02 <- tag 02 EA 05 0011000011 <- type info 0x11 (17) "float" 03 79E9F642 <- "123.456" F7 <- closing tag EA 09 0116000016 10000000 <- type info 0x16 (22) "time" + offset F0 04 740069006D006500 <- Name "time" EF 000003 <- Namespace F8 03 <- tag 03 EA 05 0016000016 <- type info 0x16 (22) "time" 7D 03FDAF4C005B950A <- "01:23:45.789" F7 <- closing tag F7 <- closing tag
Let’s try adding a schema link to the insert.
Example 4.
INSERT INTO XmlValuesTable (v) VALUES ('<note xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><float>123.456</float><time>01:23:45.789</time></note>');
EA 05 0001000100 <- type info F0 04 6E006F0074006500 <- Name "note" EF 000001 <- Namespace F8 01 <- tag 01 F0 09 78006D006C006E0073003A00780073006900 <- Name "xmlns:xsi" EF 000200 <- Namespace "xmlns:xsi" F6 02 <- Attribute 11 29 68007400740070003A002F002F007700770077002E00770033002E006F00720067002F0032003000300031002F0058004D004C0053006300680065006D0061002D0069006E007300740061006E0063006500 <- "http://www.w3.org/2001/XMLSchema-instance" F5 <- closing bracket EA 09 0111000011 12000000 <- type info 0x11 (17) "float" + offset F0 05 66006C006F0061007400 <- Name "float" EF 000003 <- Namespace F8 03 <- tag 03 EA 05 0011000011 <- type info 0x11 (17) "float" 03 79E9F642 <- "123.456" F7 <- closing tag EA 09 0116000016 10000000 <- type info 0x16 (22) "time" + offset F0 04 740069006D006500 <- Name "time" EF 000004 <- Namespace F8 04 <- tag 08 EA 05 0016000016 <- type info 0x16 (22) "time" 7D 03FDAF4C005B950A <- "01:23:45.789" F7 <- closing tag F7 <- closing tag
As you can see, the server has carefully saved the namespace as an attribute and used almost half the space for this even despite the fact that the namespace doesn’t really serve any useful purpose here – the data was saved the same way it would be saved without the namespace.
Conclusion
From the above, it may seem that you can reduce the size of a database by storing some data types (e.g. float) as typed values since 4 bytes require significantly less storage than the same value saved as a Unicode string. However, you should keep in mind that additional 7-18 bytes are used for each value to describe its type and move it to the necessary position.
Addendum
Correlation of XML types, base types, and data types that the server uses to store typed values.
XML type | Base type | Stored as type | Size in bytes | |
anyType | string | 2 * characters | ||
anySimpleType | anyType | string | ||
string | anySimpleType | string | ||
boolean | anySimpleType | boolean | 1 | |
float | anySimpleType | float | 4 | |
double | anySimpleType | double | 8 | |
decimal | anySimpleType | SqlDecimal | 20 | |
duration | anySimpleType | string | ||
dateTime | anySimpleType | *1 | ||
time | anySimpleType | *1 | ||
date | anySimpleType | *1 | ||
gYearMonth | anySimpleType | string | ||
gYear | anySimpleType | string | ||
gMonthDay | anySimpleType | string | ||
gDay | anySimpleType | string | ||
gMonth | anySimpleType | string | ||
hexBinary | anySimpleType | array of bytes | ||
base64Binary | anySimpleType | array of bytes | ||
anyURI | anySimpleType | string | ||
QName | anySimpleType | string | ||
normalizedString | string | string | ||
token | string | string | ||
language | string | string | ||
Name | string | string | ||
NCName | string | string | ||
ENTITY | string | string | ||
NMTOKEN | string | string | ||
integer | decimal | SqlDecimal | 20 | |
nonPositiveInteger | integer | SqlDecimal | 20 | |
negativeInteger | nonPositiveInteger | SqlDecimal | 20 | |
long | integer | SqlDecimal | 20 | |
int | long | SqlDecimal | 20 | |
short | int | SqlDecimal | 20 | |
byte | short | SqlDecimal | 20 | |
nonNegativeInteger | integer | SqlDecimal | 20 | |
unsignedLong | nonNegativeInteger | SqlDecimal | 20 | |
unsignedInt | unsignedLong | SqlDecimal | 20 | |
unsignedShort | unsignedInt | SqlDecimal | 20 | |
unsignedByte | unsignedShort | SqlDecimal | 20 | |
positiveInteger | nonNegativeInteger | SqlDecimal | 20 | |
char | string | string | ||
nchar | string | string | ||
varchar | string | string | ||
nvarchar | string | string | ||
text | string | string | ||
ntext | string | string | ||
varbinary | base64Binary | array of bytes | ||
binary | base64Binary | array of bytes | ||
image | base64Binary | array of bytes | ||
timestamp | base64Binary | array of bytes | ||
timestampNumeric | long | SqlDecimal | 20 | |
numeric | decimal | SqlDecimal | 20 | |
bigint | long | SqlDecimal | 20 | |
smallint | short | SqlDecimal | 20 | |
tinyint | unsignedByte | SqlDecimal | 20 | |
bit | boolean | boolean | 1 | |
real | float | float | 4 | |
datetime | dateTime | *1 | ||
smalldatetime | dateTime | *1 | ||
money | decimal | SqlDecimal | ||
smallmoney | decimal | SqlDecimal | ||
uniqueidentifier | decimal | string | ||
datetime2 | dateTime | *1 | ||
datetimeoffset | dateTime | *1 | ||
hierarchyid | string | string | ||
dbobject | anyURI | string |
*1 – data/time information. The specific type is defined by the value.
Value | Stored as type | Size in bytes |
DateOffset | Date (amount of days) | 3 |
DateOffset (2019-09-16+02:00) | DateTimeOffset | 11 |
DateTime | DateTime | 7-9 depends from precision |
DateTimeOffset | DateTimeOffset | 9 |
Time | DateTime | 7-9 depends from precision |
TimeOffset (01:23:45Z) | DateTimeOffset | 9 |