Written by Rajendra Gupta• February 15, 2022• 14:14• Azure, Microsoft Azure

Deploying Azure Synapse Analytics workspaces – How it works

HomeAzure, Microsoft AzureDeploying Azure Synapse Analytics workspaces – How it works

Сodingsight - Deploying Azure Synapse Analytics Workspaces

In the article, An Overview of the Azure Synapse Analytics, we explored the Azure Synapse workspace and its features as an analytics service combining Big data analytics and enterprise data warehousing.

This article is a practical demonstration of deploying Azure Synapse Analytics workspace using the Azure portal.

How to Deploy Synapse Analytics Workspace in Azure
Basics
Security
Networking
Getting Started
Analyze Azure Open Datasets Using the Serverless SQL Pool
Conclusion

How to Deploy Synapse Analytics Workspace in Azure

Go to the search box and look for Azure Synapse Analytics in the services in the Azure portal.

Deploy Azure Synapse Analytics Workspace

We do not have any synapse workspace deployed. Therefore, you get the page “No Azure Synapse Analytics to display”.

Click Create, and you get a page Create synapse workspace. This page is divided into multiple sections: Basic, Security, Networking, Tags, Review + Create.

Basics

The basics section requires Azure subscription, resource group, managed resource group, workspace name, region, account name, and file system name.

The synapse workspace requires a data lake storage gen2 account and storage blob role as its data contributor. You can configure a separate gene2 storage account and specify the URI using the option – Manually via URL. In this demonstration, we choose from subscription and specify the new deployment’s storage account name and file system name.

Basics section in Azure Synapse Analytics

As highlighted below, it gives details of a few additional configurations for providing permissions to other users on the synapse workspace.

Security

Enter the credentials that will be the administrator for the synapse workspace SQL pool in the security tab. To generate an automatic password, leave the password section empty.

Scroll down and verify that you have the following options checked – Allow pipelines (running as workspace system identity) to access the SQL pools.

System assigned managed identity permission

You also get an option to do workspace encryption with the customer-managed key. If you choose this option, it configures the double encryption using the platform-managed keys at the infrastructure layer. We can skip workspace encryption for this article.

Networking

You can choose to configure public or private networking connectivity to the Azure workspace. By default, the option Managed virtual network is disabled. If we want to use the Azure internal network instead of open internet for synapse workspace, data sources, you can use the synapse managed virtual network.

Click Review + Create, and it starts deploying Azure Synapse Workspace. Once the validation is successful, you get an estimated server SQL cost per TB.

Scroll down in the review page and note down the data lake storage Gen2 account URL.

Click Create, and it starts synapse workspace deployment.

As shown below, it creates a storage account and synapse workspace.

Click Synapse workspace, and you get the following details.

Workspace web URL
Primary ADLS Gen2 account
Primary ADLS Gen2 file system
SQL admin username
SQL active directory admin
Dedicated SQL endpoint
Serverless SQL endpoint
Development endpoint

In the analytics pools section, you get deployed SQL pools or Apache Spark pools. By default, it deploys an SQL pool in serverless mode.

Getting Started

In this section, click Open Synapse Studio to build a fully integrated analytics solution.

It opens the following synapse Analytics workspace with few tiles – Ingest, Explore & Analyze, Visualize.

You also get an option Knowledge center that gives Azure Open Datasets and sample code to learn about the synapse workspace.

Knowledge center in Synapse Analytics workspace

Use samples immediately
Browse gallery
Tour Synapse Studio

To understand the synapse analytics solution, let’s use the samples with the option Use Samples immediately. Here, you get the following options:

Explore sample data with spark: It creates a sample script along with a new or existing spark pool.
- Name: SampleSpark
- Size: Medium 8 vCores , 64 GB – 3 nodes
Query data with SQL: It creates a sample script that you can use with a SQL On-demand(default) pool.
Create an external table with SQL: It creates a standard SQL pool for accessing external data.

use the samples in Synapse Analytics workspace

Go back and click Browse Gallery. It gives a wide range of data sets, notebooks, SQL scripts, and pipelines.

Click Notebooks, and you get sample notebooks in different languages such as PySpark, Scala, Spark.net C#.

Notebooks in Synapse Analytics workspace

Analyze Azure Open Datasets Using the Serverless SQL Pool

For this article, we use the sample script Analyze Azure Open Datasets using serverless SQL pool. It demonstrates the exploratory data analysis using serverless SQL pools. It visualizes the results in the Azure Synapse studio.

Note: You can refer to sample serverless SQL pools for more details.

Select the script and click Continue to open the following page with SQL script. For the demo, the data is stored in the Azure Storage account blob in the Parquet file format. Therefore, it can use an automatic schema interface where you do not need to specify the data type of the columns in the file.

Analyze Azure Open Datasets using Serverless SQL pool

For example, the below SQL script returns the NYC Taxi data from the PARQUET file:

SELECT TOP 100 * FROM
    OPENROWSET(
        BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=*/puMonth=*/*.parquet',
        FORMAT='PARQUET'
    ) AS [nyc];

Click the Open script option and select the parameters’ values from the drop-down.

Connect to: Select SQL pools
Use database: Select the default database as master

We can select the options to return the first 5000 rows (default) or all properties.

Click Run to start deployment.

Start executing query in Azure Open Datasets

It starts populating the query results in the tabular format.

If there are multiple queries in the scripts, each query gives its results for individual queries. You can view the results in chart format as well. For example, the following figure displays the result in chart format for query 0.

Result of query execution in Azure Synapse Workspace

The message section gives information such as the total size of data scanned, the total size of data moved, the total size of data written.

message section in Azure Synapse Workspace

Conclusion

This article explored the Azure Synapse Workspace deployment, different configurations for networking, and Azure data lake gen2 storage account. We also used the Azure samples for processing data stored in Azure storage parquet files.

Tags: azure, azure dw, azure sql Last modified: October 06, 2022

About the Author / Rajendra Gupta

Rajendra Gupta is a Database Administrator with over 13 years of experience working with SQL Server, Azure, AWS, Power BI and related technologies. He loves writing and an author of 500+ technical articles. He is also a writer of the book " DP-300 Administering Relational Databases on Microsoft Azure."