This post provides an overview of the Azure Synapse Analytics. It explains what it is and how it can be used.
Table of contents
- What is Azure Synapse Analytics?
- When to use Azure Synapse Analytics?
- Main features of Azure Synapse Analytics
- Unified Analytics Platform
- Enterprise Data Warehousing
- Code-Free Hybrid Data Integration
- Serverless and Dedicated Options
- Data Lake Exploration
- Integrated Apache Spark and SQL Engines
- Choice of Language
- Integrated Artificial Intelligence and Business Integration
- Azure Synapse Data Integration
- HTAP Implementation
- Components of Azure Synapse Analytics
- Synapse Workspace
- Linked Services
- Synapse SQL
- Apache Spark for Synapse
- Data Flow
What is Azure Synapse Analytics?
The Azure Synapse Analytics is an analytics service that brings together Big data analytics and enterprise data warehousing. It offers the combination of SQL technologies, Spark technologies, ETL, ELT, Power BI, Cosmos DB, and Azure Machine Learning. This service is an enhanced successor of the Azure SQL Data Warehouse (SQL DW).
Many organizations use data lakes for storing unlimited data volumes. The data can be structured (CSV, Excel, relational database) or unstructured (text files, audio, video). The data lake provides low-cost storage for any format of data. Similarly, the data warehouse helps process the data so that business applications can consume it.
Therefore, organizations face the following challenges with the data warehouses:
- Requirement of processing a large amount of data
- Data integration with various data sources and formats
- Data security
- Infrastructure Scalability based on data volume
The Azure Synapse Analytics offers data integration, processing, and visualizations as a limitless analytics service. It is a PaaS (Platform as a Service) offering with either server-less or provisioned resources at scale.
When to use Azure Synapse Analytics?
When you want to store and analyze your data in a scalable way, Azure Synapse Analytics offers a powerful and easy-to-use tool to do so.
- Azure Synapse Analytics is a tool that helps you with data warehousing. It is helpful in the following scenarios.
- You have a large amount of data, and you want to find patterns in the data.
- You want to create a data warehouse that can store and analyze the data in a secure and scalable way.
- You want to build a predictive model.
Main features of Azure Synapse Analytics
This tool has a lot of features that make it very easy for users to get started. Azure Synapse Analytics offers the ability to store and process all types of data. It also has a built-in SQL engine that enables users to perform simple and complex queries to access the data that they need.
Unified Analytics Platform
It can perform data integration, exploration, data warehousing, big data analytics, and Machine Learning using a single unified platform.
- Perform key tasks such as data ingest, explore, prepare, orchestrate, visualize in a single user experience
- Monitor resources and their usage using SQL and Spark.
- Write SQL or Spark code for integrating with enterprise CI/CD processes
Enterprise Data Warehousing
It uses Synapse SQL as a distributed query system with extended T-SQL support.
- It has built-in streaming capabilities for migrating data from cloud data sources into SQL tables.
- It can load data into a managed table for best query performance. Also, it can query data directly in Azure Data Lake or Azure Cosmos DB without importing it into specific tables.
- It can integrate artificial intelligence using the machine learning modes with the SQL functions.
Code-Free Hybrid Data Integration
It offers in-built ETL or ELT processes without the requirement of writing any code. There are various connections to quickly ingest data from various sources. You can use Spark jobs, SQL scripts, stored procedures, and Orchestrate notebooks as well.
Serverless and Dedicated Options
It has both serverless and dedicated resource models. You can use predictable performance and cost for the dedicated resource pools. For any unplanned workloads, use the serverless endpoints. Therefore, you can choose the most cost-effective pricing as per workload requirements.
Data Lake Exploration
Azure Synapse Analytics can query both relational and non-relational data stored in the data lake. Integrating SQL and Spark can analyze CSV, TSV, Delta lake, JSON, Parquet stored in a data lake.
Integrated Apache Spark and SQL Engines
Azure Synapse seamlessly integrates with Apache Spark.
Apache Spark is an open-source big data engine for data preparation, engineering, machine learning, etc. It has built-in .Net support for C#, SparkML algorithm, Azure ML integration for Linux foundation delta lake.
Choice of Language
It offers flexibility to choose T-SQL, Scala, Python, Spark SQL,.Net as your preferred language for dedicated and serverless resources.
Integrated Artificial Intelligence and Business Integration
Azure Synapse Analytics has an end-to-end analytics solution having data integration capabilities of Azure Machine Learning, Cognitive service, and Power BI.
Azure Synapse Analytics offers high security and privacy features such as data encryption, threat detection, transparent data encryption, dynamic data masking, etc.
- It is compliant with 30 industry-leading compliances such as ISO, SOC, FedRAMP, DISA, HIPAA, FIPS.
- It provides SQL authentication, Azure AD authentication, Multi-factor authentication.
- You can configure network-level security using firewalls, virtual networks.
Azure Synapse Data Integration
It has an integrated orchestration engine to load data from external data sources with over 90 data sources support for Azure, file system, open-source, cloud databases, NoSQL, ODBC.
You do not need to manage separate tools for data ingestion. Therefore, Azure Synapse Analytics helps to reduce data redundancy and the management of separate tools and resources for data ingestion.
You can easily integrate it with Adobe, Microsoft, and SAP technologies such as Microsoft Office, Dynamics 365, Azure Data Lake, Azure Active Directory, Azure Machine Learning, Azure Blob Storage, Power BI.
The Azure Synapse Analytics uses Hybrid Transaction and Analytical Processing (HTAP) and synapse links for achieving real-time data integration with Azure databases. It gives real-time data (most recent) using a simple, low-cost cloud solution.
The Azure synapse link allows you to execute the analytical workload against the Azure Cosmos DB analytical store. It serves independent transaction workload traffic without affecting the transaction workload traffic.
Components of Azure Synapse Analytics
Let’s get a high-level overview of the Azure Synapse architecture components.
The Azure Synapse workspace is a secure platform for performing cloud-based enterprise analytics in Azure infrastructure. The workspace is associated with a specific resource group, Azure region, ADLS Gen2 account, and file system. You can use SQL and Apache spark for performing data analytics.
The synapse workspace contains linked services that define connection strings for various external resources.
The synapse SQL is used to do T-SQL-based analytics inside the workspace in both serverless and dedicated resource consumption models. The resource model consists of dedicated SQL pools and serverless SQL pools. You can use the SQL scripts for working with these SQL pools.
Apache Spark for Synapse
We can configure serverless Apache Spark pools for performing spark analytics and use them in two ways:
- Spark notebooks: The notebooks consist of Scala, PySpark, C#, and SparkSQL codes.
- Spark job definitions: It consists of spark jobs using the jar files.
The pipelines are a logical group of activities with various tasks for data ingestion, analytics.
The activity refers to an action within a pipeline. The activity can be such as copying data, executing SQL script, or notebook.
The data flow refers to activity for data transformation using in-built connectors and without any coding.
The triggers define the process to execute a pipeline either manually or automatically. It can be on a specific schedule, tumbling window, or event-driven.
This article has provided you with an overview of Azure Synapse Analytics, including its basic architecture, features, and components. It builds a foundation for deep-diving into Azure synapse service and its implementation that we will cover in further articles.Tags: azure, azure dw, azure sql Last modified: September 15, 2022