Learn to Store and Analyze Documents on Windows File System with SQL Server Semantic Search – Part 1

Total: 1 Average: 5

The article examines storing unstructured data using native windows directory structure and managing it through the SQL Server database. Subsequently, we demonstrate how to perform comparative analysis with the help of Semantic Search in order to get valuable information out of the data. Additionally, you are going to get some hands-on experience of storing and linking documents on Windows File System and using Semantic Search.

It is of great importance for both database beginners and professionals to know how to apply file storage strategies such as File Table to searching technologies like Semantic Search for unstructured data analysis.

Prerequisites

The prerequisites mentioned in this section apply to the complete article, which consists of three parts.

File Table Concepts

It wouldn’t hurt to have basic knowledge of File Table used in SQL Server databases. Nonetheless, this part of the article provides detailed information on how to use File Tables for storing documents.

Semantic Search Concepts

Since the ultimate goal of this article is to perform comparative analysis using Semantic Search, it is highly recommended to be well aware of the basic concepts of the following, particularly, if you are interested in implementing the walkthrough(s):

  1. Semantic Search concepts.
  2. Documents comparative analysis basics.
  3. Basic use of Semantic Search.

Please refer to the article How to use SQL Server Semantic Search to fulfill the above requirements provided you have basic T-SQL skills.

Basics of Full-Text Search

Additionally, you are encouraged to have knowledge of Full-Text Search, which is offered as a special add-on alongside Semantic Search during the SQL Server installation.

Please note that Semantic Search may also be considered as an extended feature of Full-Text Search as it cannot be implemented without fulfilling the basic requirements of Full-Text Search.

Please refer to the article Implementing Full-Text Search in SQL Server 2016 for beginners to get more information about Full-Text Search.

SQL Instance in use

We are using SQL Server 2016 version in this article but following the steps should not be problem with the newer versions with few exceptions.

Setup FILESTREAM Enabled Database

We have to use a special kind of database called FILESTREAM Enabled Database to store documents on Windows File System and manage them by the SQL database.

There are some steps that must be completed before we can create this special type of database.

Enable FILESTREAM at Instance Level

One of the first things to do is to enable FILESTREAM at the SQL Server instance level.

Type SQL Server 2016 Configuration (or your SQL instance version if other than SQL 2016) in the Windows Search Box, generally located at the bottom left corner of the taskbar, as shown below:

Click to open SQL Server Configuration Manager.

Locate and right-click on your SQL instance, which is SQLTAB2016 in our case, and click Properties under SQL Services:

Check Enable FILESTREAM for Transaction-SQL access and Enable FILESTREAM for file I/O access under the FILESTREAM tab. Enter a suitable Windows share name as well and click OK, as shown below:

Change FILESTREAM Access Level

The next thing is to change the FILESTREAM access level from 0 to 2 if it has not been already changed.

Run the following T-SQL script against the master database:

-- Enabling FILESTREAM for SQL instance in use
EXEC sp_configure filestream_access_level, 2  
RECONFIGURE

The output is as follows:

Please remember it is fine if you run the above script even if FILESTREAM is already set to 2.

Create a Windows Folder to store the database

Please create a windows folder C:\Sample\Data where you are going to physically store your SQL database:

You can choose any drive other than C drive.

Create FILESTREAM Enabled Database

Let us now create a new FILESTREAM enabled database named EmployeesFilestreamSample by running the following T-SQL script against the master database:

-- Creating FILESTREAM enabled database
CREATE DATABASE EmployeesFilestreamSample
ON
PRIMARY ( NAME = EmployeesSampleData,
    FILENAME = 'c:\sample\data\EmployeesFilestreamSample.mdf'),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM( NAME = EmployeesSampleFilestreamGroup,
    FILENAME = 'c:\sample\data\EmployeesSampleFilestreamGroup')
LOG ON  ( NAME = EmployeesSampleLog,
    FILENAME = 'c:\sample\data\EmployeesFilestreamLog.ldf')
GO

The output is as follows:

FILESTREAM enabled database has been successfully created.

Check FILESTREAM Enabled Database

Go to Object Explorer and expand the EmployeesFilestreamSample database under the Databases node to view the newly created FILESTREAM Enabled database.

Set up File Table

Once we have created FILESTREAM Enabled database, the next step is to set up a File Table, which is going to help us save our documents on to the Windows File System.

Name FILESTREAM Directory

In order to create File Table, we need to make some changes to the main database properties.

Right Click the database (EmployeesFilestreamSample) in Object Explorer and click Properties:

Go to the Options Page and name FILESTREAM Directory Name as EmployeesFilestreamDirectory, as shown in the following figure:

Set FILESTREAM Non-Transacted Access to FULL, as shown below, and click OK:

Click Yes when asked if you are fine to change the properties and close all the connections:

Create File Table

Create a new File Table to store the documents by typing and executing the following T-SQL script:

USE EmployeesFilestreamSample

-- Creating Employees FileTable sample
CREATE TABLE EmployeesDocumentStore AS FileTable  
    WITH (   
          FileTable_Directory = 'EmployeesSampleFilestreamGroup',  
          FileTable_Collate_Filename = database_default  
         );  
GO

Explore File Table

Select Databases=> dbo.EmployeeDocumentStore=>Tables =>FileTables=> EmployeesDocumentStore.

Right-click on EmployeesDocumentStore and click Explore FileTable Directory, as shown below:

Fixing File Location Error (Microsoft bug)

If you are able to view the folder where documents are supposed to be stored, then you are good to go, but if you encounter the following error, then you need to fix it first:

FIX: “The File location cannot be opened” error occurs when you try to open a FileTable directory in SQL Server

This is a Microsoft bug as confirmed by Microsoft:

To fix this bug, you have to run Cumulative Update 11 for SQL Server 2016 SP2.

Please run the fix that suits your SQL instance based on its version if you are not using the SQL Server instance used in this article and face the same problem.

Download the update if it is applicable to your SQL instance, as shown below:

You have to accept terms and conditions in order to proceed further as follows:

Once the update is successfully installed, you should see the following message:

Explore File Table after Resolving Location Error

Now, you should be able to explore FileTable Directory:

The windows folder can now be filled with your unstructured documents such as MS Word documents, as shown below:

Create and Save MS Word Documents for Employees

Create a new MS Word document and name it Asif Permanent Employee, as shown below:

Add the following text to this Word document and ensure the first line is the title of the document:

Asif Permanent Employee
Asif is a very experienced Project Manager with a solid programming background and highly skilled to work on data warehouse business intelligence (BI) solutions.
He is a quick learner and can work on multiple technologies and tools.
He has excellent communication skills and his overall rating is very good.

Remember to save the document.

Create another MS Word document and name it Peter Permanent Employee with the following text where the first line is the title of the document:

Peter Permanent Employee
Peter is a fresh Project Manager with a very strong programming background and highly skilled to work on data warehouse business intelligence (BI) solutions.
He can work with multiple teams and on multiple projects.
He has excellent business skills and his overall rating is good.

Please remember to save the document, which should contain the following text:

View Stored Documents

Type the following SQL code to view how SQL database has stored these documents:

-- View files stored on Windows File System managed by File Table in SQL Database
SELECT stream_id
      ,file_stream
      ,[name]
      ,file_type
      ,cached_file_size
	  ,creation_time
FROM EmployeesFilestreamSample.dbo.EmployeesDocumentStore

The output is as follows:

Congratulations! You have successfully set up a FILESTREAM Enabled Database to store documents on Windows File System managed by SQL database.

In other words, you can now take full advantage of Windows File System by comfortably creating and storing unstructured documents while, on the other side, you can fully manage these stored documents through database consistent architecture.

Please do not remove these documents if you would like to continue the walkthrough and stay in touch as we are going to perform comparative analysis on the stored documents using Semantic Search in the next part of this article.

Haroon Ashraf

Haroon Ashraf

Haroon's deep interest in logic and reasoning at an early age of his academic career paved his path to become a data professional. He began his professional life as a computer programmer more than 10 years ago working on his first data venture to migrate and rewrite a public sector database driven examination system from IBM AS400 (DB2) to SQL Server 2000 using VB 6.0 and Classic ASP along with developing reports and archiving many years of data. His work and interest revolves around Database-Centric Architectures and his expertise include database and reports design, development, testing, implementation and migration along with Database Life Cycle Management (DLM).