A hybrid cloud is a fairly attractive model when implementing cloud computing in enterprise information systems, as this approach combines the advantages of public and private clouds. It is possible to flexibly attract external resources when needed and reduce infrastructure costs. On the other hand, you keep full control over data and applications that the enterprise does not want to outsource. However, in such a scenario, we inevitably face the task of integrating data from various sources.
Suppose there is a table with customers, which is vertically divided into two parts. The depersonalized part is allocated in a public cloud, and the information personalizing the customers remains in a local database. For holistic processing inside the application, you need to combine both parts by CustomerID.
There are various ways to do this. Conventionally, they can be divided into two large categories: data aggregation at the on-premise database server level which, in this case, will be a single sign-on for accessing local and remote data, and data aggregation within the business logic. This article will consider the first approach.
In the case of SQL Server, access to heterogeneous data sources starting from version 7.0 has traditionally been based on the mechanism of linked servers. Using this mechanism, we need to get the data from the cloud database.
As you know, in Windows Azure cloud, the SQL Server database can be represented in two ways: Windows Azure SQL Database (SQL Azure), which is a PaaS solution, and as a regular database on a traditional SQL Server installed on a virtual machine hosted in Azure – IaaS. The latter case is conceptually the same as the connection with another instance of SQL Server. Creating a linked server for it is easy and has been described in the documentation.
Let’s consider the case of creating a linked server on SQL Azure. It is necessary to consider the specifics of SQL Azure:
- Only TCP/IP network library must be used, the port must not be assigned dynamically, it is always 1433.
- Only the standard authentication model is valid in SQL Azure.
- All connections to SQL Azure must be encrypted with TLS (SSL), which means adding Encrypt=True; TrustServerCertificate=true properties to the connection string. They may be not explicitly specified; they will be added automatically.
- Server name = <SQL Azure server name>.database.windows.net.
- There is no USE command in SQL Azure. Strictly speaking, it exists, but it is used purely in the case of sharding, which means that it is necessary to connect to the required base at once.
- You also need to consider the firewall rules in the SQL Azure server configuration that specify the white list of addresses from which you are allowed to connect.
In view of the foregoing, all that is lacking is to select an application access interface. SQL Server Native Client or ODBC can take on this role.
To run the demo, use the cloud version of the AdventureWorks2012 model database, which you can download here.
Open SQL Server Management Studio, connect to local SQL Server, and create an SQL Azure linked server on it, using the native client:
if exists (select 1 from sys.servers where name = 'SQLAzure_NCli') exec sp_dropserver @server = 'SQLAzure_NCli', @droplogins = 'droplogins' go exec sp_addlinkedserver @server='SQLAzure_NCli', @srvproduct='', @provider='sqlncli', @datasrc='u1qgtaf85k.database.windows.net', @location='', @provstr='', @catalog='AdventureWorks2012' go exec sp_addlinkedsrvlogin @rmtsrvname = 'SQLAzure_NCli', @useself = 'false', @rmtuser = 'alexejs', @rmtpassword = 'Password' go
where u1qgtaf85k is the name of the SQL Azure server that was generated automatically when the server was created. We need to test that we really can connect to the server from the local server and get data from it:
select CustomerID, AccountNumber from SQLAzure_NCli.AdventureWorks2012.Sales.Customer where CustomerID between 1 and 100
The result is shown below.
Having the appropriate rights, it is possible not only to read SQL Azure data from the local server but also to request data modification, as well as DDL requests.
For example, create a table in the SQL Azure DB and insert the data into it:
exec sp_serveroption 'SQLAzure_NCli', 'rpc out', true exec ('CREATE TABLE TestTbl(fld1 int not null CONSTRAINT PK_fld1 PRIMARY KEY CLUSTERED (fld1) )') at SQLAzure_Ncli exec ('INSERT INTO TestTbl VALUES (1), (2), (3)') at SQLAzure_NCli
Using Azure Management Portal, you can verify that the table has actually been created and that entries have been added to the table:
Creating a linked server using ODBC requires a DSN to be first created. Go to Control Panel\System and Security\Administrative Tools > Data Sources (ODBC) or simply run odbcad32.exe and create a data source on SQL Azure, as shown in the following illustrations:
You can automate the process by importing the following .REG file into the registry (regedit.exe):
[HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI] [HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\SQLAzure] "Driver"="C:\\Windows\\system32\\sqlncli10.dll" "Server"="u1qgtaf85k.database.windows.net" "LastUser"=“alexejs" "Database"=“AdventureWorks2012" [HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\ODBC Data Sources] “SQLAzure"="SQL Server Native Client 10.0“
Creating a linked server on the ODBC source, in this case, will look like this:
if exists (select 1 from sys.servers where name = 'SQLAzure_ODBC') exec sp_dropserver @server = 'SQLAzure_ODBC', @droplogins = 'droplogins‘ go exec sp_addlinkedserver @server = 'SQLAzure_ODBC', @srvproduct = 'Any', @provider = 'MSDASQL', @datasrc = 'SQLAzure', @catalog='AdventureWorks2012' go exec sp_addlinkedsrvlogin @rmtsrvname = 'SQLAzure_ODBC', @useself = 'false', @rmtuser = 'alexejs', @rmtpassword = 'Password'
Check that everything works:
select * from openquery(SQLAzure_ODBC, 'select * from sys.tables')
Regardless of the method of creating a linked server, the following is obvious. Link the client table on the local server database with the client table on the SQL Azure database:
select c.CustomerID, c.AccountNumber, p.FirstName, p.LastName from openquery(SQLAzure_NCli, 'select CustomerID, AccountNumber from Sales.Customer where CustomerID between 1 and 100') c join Person.Person p on c.CustomerID = p.BusinessEntityID order by c.CustomerID
Note: It is preferable to use the OpenQuery() function rather than refer to it by name with a 4-part notation, so as not to drag the maximum recordset to the local server. If possible, parallelize the work by filtering (joins, groupings, etc., if they are on that side) by means of remote resources.
Now, let’s consider the case when both, the local and cloud servers, are equal in terms of the application, and the data merging occurs directly in it, i.e. at the business logic level.
Pulling the data from SQL Azure from the point of view of application code is no different from working with a local SQL Server.
Let’s just say, it is identical up to the connection string. In the code below, u1qgtaf85k is the name of the SQL Azure server (it is generated automatically when it is created). I’ll remind you that the connection with the server is always established on the TCP/IP network library, port 1433. The Trusted_Connection=False parameter is not Integrated Security (it is always standard in SQL Azure), Trust_Server_Certificate=false is meant to avoid a possible man-in-the-middle attack.
using System; using System.Data; using System.Data.SqlClient; using System.Diagnostics; using System.Resources; namespace DevCon2013 { class Program { static void Main(string[] args) { ResourceManager resMan = new ResourceManager("DevCon2013.Properties.Resources", System.Reflection.Assembly.GetExecutingAssembly()); string sqlAzureConnString = String.Format(@"Server=tcp:u1qgtaf85k.database.windows.net,1433;Database=AdventureWorks2012;User ID=alexejs;Password={0};Trusted_Connection=False;Encrypt=True", resMan.GetString("Password")); SqlConnection cnn = new SqlConnection(sqlAzureConnString); cnn.Open(); SqlCommand cmd = cnn.CreateCommand(); cmd.CommandText = "select top 100 CustomerID, AccountNumber from Sales.Customer order by CustomerID"; DataTable tbl = new DataTable(); tbl.Load(cmd.ExecuteReader()); cnn.Close(); foreach (DataRow r in tbl.Rows) { for (int i = 0; i < tbl.Columns.Count; i++) Debug.Write(String.Format("{0}\t", r[i])); Debug.WriteLine(""); } } } }
I will also add the connection with the on-premise resource, i.e. with the local SQL Server. I assume that this process does not require explanations. So, let’s just modify the previous code by adding two methods – ExecuteSQL to connect to the source and execute a query against it, and DumpTable to somehow visualize the results.
Thus, working with SQL Azure and on-premise SQL Server from the point of view of the application will occur absolutely symmetrically.
string sqlOnPremiseConnString = @"Server=(local);Integrated Security=true;Database=AdventureWorks2012"; DataTable resultsOnPremise = ExecuteSQL(sqlOnPremiseConnString, "select BusinessEntityID, FirstName, LastName from Person.Person where BusinessEntityID between 1 and 100"); string sqlAzureConnString = String.Format(@"Server=tcp:u1qgtaf85k.database.windows.net,1433;Database=AdventureWorks2012;User ID=alexejs;Password={0};Trusted_Connection=False;Encrypt=True", resMan.GetString("Password")); DataTable resultsFromAzure = ExecuteSQL(sqlAzureConnString, "select CustomerID, AccountNumber from Sales.Customer where CustomerID between 1 and 100"); ... static DataTable ExecuteSQL(string cnnStr, string query) { SqlConnection cnn = new SqlConnection(cnnStr); cnn.Open(); SqlCommand cmd = cnn.CreateCommand(); cmd.CommandText = query; DataTable tbl = new DataTable(); tbl.Load(cmd.ExecuteReader()); cnn.Close(); return tbl; } static void DumpTable(DataTable tbl) { foreach (DataRow r in tbl.Rows) { for (int i = 0; i < tbl.Columns.Count; i++) Debug.Write(String.Format("{0}\t", r[i])); Debug.WriteLine(""); } }
Now we have both vertical pieces inside the application in two DataTables instead of the single Customers table. One piece if from the local server, the other one is from SQL Azure. We need to unite them by the CustomerID field, which exists in both servers.
For simplicity, we will assume that the connection is made by matching one column in one table with one column in the other table. This is a standard ADO.NET task. There are two most common ways to solve it, which are approximately equivalent in performance.
The first method is using DataRelation. It is implemented in the JoinTablesADO method.
Create a new DataSet, add both tablets to it, and create a relation (DataRelation) between them specifying the field in the parent and the field in the child table upon which JOIN will be built. Which of the two DataTable is the parent table and which one is the child table does not matter in this situation. In our case, the relationship is not one-to-many, but one-to-one.
Create an empty workpiece for the resulting DataTable. Looping through all records of the “child” table, we get the corresponding record of the “parent” table and combine it from the fields of both DataRow records, which we then put in the resulting DataTable.
DumpTable(JoinTablesADO(resultsFromAzure, resultsOnPremise, "CustomerID", "BusinessEntityID")); ... static DataTable JoinTablesADO(DataTable parentTbl, DataTable childTbl, string parentColName, string childColName) { DataSet ds = new DataSet(); ds.Tables.Add(parentTbl); ds.Tables.Add(childTbl); DataRelation dr = new DataRelation("something", parentTbl.Columns[parentColName], childTbl.Columns[childColName]); ds.Relations.Add(dr); DataTable joinedTbl = new DataTable(); foreach (DataColumn c in parentTbl.Columns) joinedTbl.Columns.Add(c.Caption, c.DataType); foreach (DataColumn c in childTbl.Columns) joinedTbl.Columns.Add(c.Caption, c.DataType); //Unfortunately, Clone() over DataColumn is not supported 🙁 foreach (DataRow childRow in childTbl.Rows) { DataRow parentRow = childRow.GetParentRow("something"); DataRow currentRowForResult = joinedTbl.NewRow(); for (int i = 0; i < parentTbl.Columns.Count; i++) currentRowForResult[i] = parentRow[i]; for (int i = 0; i < childTbl.Columns.Count; i++) currentRowForResult[parentTbl.Columns.Count + i] = childRow[i]; joinedTbl.Rows.Add(currentRowForResult); } return joinedTbl; }
The second method is using Linq. The idea is the same as in the first case. The difference is in the implementation details. First, we create a resulting table as a copy of the parent table structure. Then we add fields to it from the child table.
We get the collection of records as a result of the Linq-request to the collection of records of the parent table by the link condition with the collection of records of the child table, which is then added to the resulting table.
DumpTable(JoinTablesLinq(resultsFromAzure, resultsOnPremise, "CustomerID", "BusinessEntityID")); ... static DataTable JoinTablesLinq(DataTable parentTbl, DataTable childTbl, string parentColName, string childColName) { DataTable joinedTbl = parentTbl.Clone(); var childColumns = childTbl.Columns.OfType<DataColumn>().Select(c => new DataColumn(c.ColumnName, c.DataType, c.Expression, c.ColumnMapping)); joinedTbl.Columns.AddRange(childColumns.ToArray()); var joinedTblRows = from parentRow in parentTbl.AsEnumerable() join childRow in childTbl.AsEnumerable() on parentRow.Field<int>(parentColName) equals childRow.Field<int>(childColName) select parentRow.ItemArray.Concat(childRow.ItemArray).ToArray(); foreach (object[] values in joinedTblRows) joinedTbl.Rows.Add(values); return joinedTbl; }Tags: azure, linked server, sql server Last modified: September 24, 2021