Azure SQL DW (SQL Data Warehouse) is a massively parallel, petabyte-scale, cloud solution for data warehousing based on SQL. It is highly elastic and fully managed, allowing you to scale capacity in seconds and set up in minutes. You can scale computing and storage independently by yourself. It will enable you to burst computing of analytical workloads that are complex, or scale down your warehouse for archival scenarios and pay depending on what you are utilizing rather than lock yourself into cluster configurations that are predefined – and obtain better cost efficiency when compared with traditional data warehouse solutions.
Microsoft’s SQL Server Database is the engine behind Azure SQL Data Warehouse that runs queries on each and every individual node. The Azure SQL Data warehouse received an upgrade to SQL Server 2016 with general availability in 2016. This transparently provided a 40-percent performance increase to user workloads consisting of analytic queries.
The batch mode execution engine (vectorized query execution) and columnstore of SQL DW are regarded as its two performance pillars. The enhancements in SQL Server 2016 has taken the performance of SQL Data Warehouse to a newer level. The advanced features were in addition to the existing ones like segment elimination and columnar compression. The batch mode execution that has the ability to process multiple rows at a time rather than one value at a time and that can take advantage of the SIMD hardware innovations that were existing already. SQL Server further extended the batch mode execution to more scenarios and operators.
The columnstore implementation is provided by almost all the commercial database vendors. This implementation has its own benefits and drawbacks but the major thing to consider here is that SQL Server leads the pack for industry standard benchmark known as TPC Benchmark -H (TPC-H) for data warehouses as defined by its performance metric QphH@Size (indicated as TPC-H Composite Query-per-Hour Performance Metric) and $/QphH@Size (indicated as the TPC-H Price/Performance metric) in non-clustered configuration. Now, let’s have a look at the performance innovations in SQL Server 2016 for Columnstore Index and Batch mode Execution.
- High column elimination and data compression to decrease IO predominantly while processing numerous rows.
- A relational operator (e.g. Aggregate, Scan, Join) that is running in batch mode processes a large number of rows(this means, a batch of rows) at the same time, usually around 900 rows in order to speed up the analytic queries by an order of magnitude. This is available only when processing the rows from columnstore indexes. The batch mode processing has been continuously added by SQL Server Team for the common relational operators for processing analytics queries. Refer the below table.
- Till now, the operators and multiple aggregation paths would be instantiated thus resulting in slower performance. But now, on a table scan, SQL Server 2016 can process multiple aggregates more effectively in a single batch mode aggregation operator.
- For window aggregates, SQL Server 2016 introduces batch mode execution. The batch mode has the capability to speed up certain queries by almost 300 times.
- Directly applying predicates on the compressed data.
- Usage of regular btree indexes on Clustered Columnstore Index in SQL Server 2016.
- Elimination of Rowgroup by leveraging metadata stored with each column within a rowgroup.
- When possible, the aggregates are computed at the SCAN node by SQL Server 2016.
- The pushing of string predicates to SCAN node is allowed by SQL Server 2016. This wasn’t allowed in earlier SQL Server versions.
- Faster processing by leveraging SIMD(Single instruction, multiple data).
Apart from the performance innovations in SQL Server 2016 for the columnstore and batch mode that are dealt above, the below mentioned remarkable features are also observed when upgrading to ASDW. All these exciting features will help you achieve optimal performance from your ASDW.
Reduction in Cost with Pause and Scale
Built on SQL 2016 OLTP, ASDW is the next generation of Parallel Data Warehouse(PDW) from Microsoft. One of your main expectations as a user is that the product must be cost-effective. An interesting feature of ASDW is that it is highly cost-effective as with it, you are only paying for the data that is being used and processed.
One of the key features of ASDW is the capability to pause when not using it. This stops the billing of computing resources. The capability to scale resources is another major feature of ASDW. Scaling and Pausing can be done via the PowerShell commands or through Azure Portal. You can become familiar with these features as they have decreased the cost of my data warehouse to a greater extent when it was not in use.
You can load and export data quickly with Polybase
ASDW supports loading and exporting data via various tools that include BCP, Azure Data Factory, and Polybase. For a smaller amount of data where the performance is not critical, any kind of tool is sufficient for your needs. However, you will recognize that SQL Server Polybase is the best option when fast performance is required or when you are loading or exporting a huge volume of data.
SQL Server Polybase is designed to leverage ASDW’s Massively Parallel Processing(MPP) architecture and can load and export data magnitudes faster than any other tool. With the help of INSERT INTO or CTAS, the Polybase loads can be run. You can prefer CTAS as it is the fastest way to load your data and this can minimize the transaction logging as well. The polybase loads are also supported by Azure Data Factory.
Polybase supports different file formats including the Gzip files. When using the gzip text files, if you want to maximize throughput, you can break up the files into 60 or more in order to maximize the parallelism of your load. You can consider loading data concurrently for faster total throughput.
Enhanced SQL tooling, monitoring and migration experience
With introducing updates to SSMS(SQL Server Management Studio) and the Azure Portal, ASDW has provided a seamless experience when developing, monitoring and loading your SQL Data Warehouse. The updates incorporated integrated support for loading from more than 20 data stores in the cloud and on-premise, a simple process for the purpose of troubleshooting the common issues. These updates also brought highly requested functionality within SSMS like allowing the Generate Scripts wizard execution.
You will be able to quickly manage all your databases with the help of a top-level resource blade introduced by ASDW. You can utilize this to scan through your data warehouse very quickly for details such as subscription, location, tier, pricing, server, status, and name.
Unmatched access control and security
The auditing and threat detection are provided by ASDW that is built into the service with highly advanced machine learning in order to detect query patterns that are abnormal and alert us of potential security issues before it is too late. Transparent Data Encryption protects the Data at rest. Role-based access control and SSO are also enabled.
So, you can have finger-granularity permissions that enable you to control what operations a user can perform on individual procedures, views, tables and other database objects. These features further protect the data by making sure only the right users have access to the right data. This kind of access control and security is a wonderful feature as it avoids serious damage to your business and reputation.
You can run your data warehouse workloads in the cloud that involve highly intensive computing
Recently, ASDW has increased its compute scale threefold. This has raised the compute scale to 18,000 DWU(Data Warehouse Units) from 6,000 DWU. With this, now you will be able to run your data warehouse workloads in the cloud that involve highly intensive computing. You can scale your data warehouse workloads in Azure to newer heights; driving answers to the highly demanding analytical questions with the help of enterprise-class, fully-featured SQL engine.
As the volume, variety, and diversity of your data continue to evolve at exponential rates, the unlimited columnar storage is highly important. With Polybase scaling with the system automatically, no extra work is needed to take advantage of all the extra computing horsepower that is available now. For you, these features will make it even easier to serve and curate the large amounts of data being amassed in your cloud infrastructure with ASDW.
For quite some time, you may have advocated for even greater agility and compute power from ASDW and were keen to know what is next, but now you will be a satisfied user with this increment. Now, you am can scale with much more capacity to compute than ever before in order to drive your analytical workloads with the help of the fully featured SQL engine. You are free to use these new and high scale performance levels in order to analyze unbounded volumes of data with unlimited columnar storage.
Thus, the SQL Data Warehouse will get highly benefited from the above list of considerable boosts in performance. The best one of all: in order to get the above performance benefits, no change is needed to the user queries of SQL Data Warehouse – it is all automatic under the hood.
Today, ASDW has thousands of customers who are already utilizing it. Almost all of them are experiencing significant gain in performance over present multi-million dollar on-premise data warehouses. Today, some multi-hour queries are completed now in an hour and few queries that take 5 to 10 minutes are completed in seconds with ASDW. Thus, this article describes how SQL Server 2016 provides a huge boost in performance to Azure SQL Data Warehouse.
- How SQL Server 2016 Took Azure SQL Data Warehouse to Next Level - July 18, 2018