Written by 12:11 Azure, Microsoft Azure

An Overview of the Azure Synapse Analytics

Codingsight - An Overview of the Azure Synapse Analytics

The Azure Synapse Analytics is an analytics service that brings together Big data analytics and enterprise data warehousing. It offers the combination of SQL technologies, Spark technologies, ETL, ELT, Power BI, Cosmos DB, and Azure Machine Learning. This service is an enhanced successor of the Azure SQL Data Warehouse (SQL DW).

Azure Synapse Analytics

Many organizations use data lakes for storing unlimited data volumes. The data can be structured (CSV, Excel, relational database) or unstructured (text files, audio, video). The data lake provides low-cost storage for any format of data. Similarly, the data warehouse helps process the data so that business applications can consume it.

Therefore, organizations face the following challenges with the data warehouses:

  • Requirement of processing a large amount of data
  • Data integration with various data sources and formats
  • Data security
  • Infrastructure Scalability based on data volume

The Azure Synapse Analytics offers data integration, processing, and visualizations as a limitless analytics service. It is a PaaS (Platform as a Service) offering with either server-less or provisioned resources at scale.

The main features of Azure Synapse Analytics are below.

Unified Analytics Platform

It can perform data integration, exploration, data warehousing, big data analytics, and Machine Learning using a single unified platform.

  • Perform key tasks such as data ingest, explore, prepare, orchestrate, visualize in a single user experience
  • Monitor resources and their usage using SQL and Spark.
  • Write SQL or Spark code for integrating with enterprise CI/CD processes

Enterprise Data Warehousing

It uses Synapse SQL as a distributed query system with extended T-SQL support.

  • It has built-in streaming capabilities for migrating data from cloud data sources into SQL tables.
  • It can load data into a managed table for best query performance. Also, it can query data directly in Azure Data Lake or Azure Cosmos DB without importing it into specific tables.
  • It can integrate artificial intelligence using the machine learning modes with the SQL functions.

Code-Free Hybrid Data Integration

It offers in-built ETL or ELT processes without the requirement of writing any code. There are various connections to quickly ingest data from various sources. You can use Spark jobs, SQL scripts, stored procedures, and Orchestrate notebooks as well.

Serverless and Dedicated Options

It has both serverless and dedicated resource models. You can use predictable performance and cost for the dedicated resource pools. For any unplanned workloads, use the serverless endpoints. Therefore, you can choose the most cost-effective pricing as per workload requirements.

Data Lake Exploration

Azure Synapse Analytics can query both relational and non-relational data stored in the data lake. Integrating SQL and Spark can analyze CSV, TSV, Delta lake, JSON, Parquet stored in a data lake.

Integrated Apache Spark and SQL Engines

Azure Synapse seamlessly integrates with Apache Spark.

Apache Spark is an open-source big data engine for data preparation, engineering, machine learning, etc. It has built-in .Net support for C#, SparkML algorithm, Azure ML integration for Linux foundation delta lake.

Choice of Language

It offers flexibility to choose T-SQL, Scala, Python, Spark SQL,.Net as your preferred language for dedicated and serverless resources.

Integrated Artificial Intelligence and Business Integration

Azure Synapse Analytics has an end-to-end analytics solution having data integration capabilities of Azure Machine Learning, Cognitive service, and Power BI.

Security

Azure Synapse Analytics offers high security and privacy features such as data encryption, threat detection, transparent data encryption, dynamic data masking, etc.

  • It is compliant with 30 industry-leading compliances such as  ISO, SOC, FedRAMP, DISA, HIPAA, FIPS.
  • It provides SQL authentication, Azure AD authentication, Multi-factor authentication.
  • You can configure network-level security using firewalls, virtual networks.

Azure Synapse Data Integration

It has an integrated orchestration engine to load data from external data sources with over 90 data sources support for Azure, file system, open-source, cloud databases, NoSQL, ODBC.

You do not need to manage separate tools for data ingestion. Therefore, Azure Synapse Analytics helps to reduce data redundancy and the management of separate tools and resources for data ingestion.

You can easily integrate it with Adobe, Microsoft, and SAP technologies such as Microsoft Office, Dynamics 365, Azure Data Lake, Azure Active Directory, Azure Machine Learning, Azure Blob Storage, Power BI.

HTAP Implementation

The Azure Synapse Analytics uses Hybrid Transaction and Analytical Processing (HTAP) and synapse links for achieving real-time data integration with Azure databases. It gives real-time data (most recent) using a simple, low-cost cloud solution.

HTAP (Hybrid Transaction and Analytical Processing) Implementation

The Azure synapse link allows you to execute the analytical workload against the Azure Cosmos DB analytical store. It serves independent transaction workload traffic without affecting the transaction workload traffic.

Azure Synapse Components and Features

Let’s get a high-level overview of the Azure Synapse architecture components.

Azure Synapse Components and Features

Synapse Workspace

The Azure Synapse workspace is a secure platform for performing cloud-based enterprise analytics in Azure infrastructure. The workspace is associated with a specific resource group, Azure region, ADLS Gen2 account, and file system.  You can use SQL and Apache spark for performing data analytics.

Linked Services

The synapse workspace contains linked services that define connection strings for various external resources.

Synapse SQL

The synapse SQL is used to do T-SQL-based analytics inside the workspace in both serverless and dedicated resource consumption models. The resource model consists of dedicated SQL pools and serverless SQL pools. You can use the SQL scripts for working with these SQL pools.

Apache Spark for Synapse

We can configure serverless Apache Spark pools for performing spark analytics and use them in two ways:

  • Spark notebooks: The notebooks consist of Scala, PySpark, C#, and SparkSQL codes.
  • Spark job definitions: It consists of spark jobs using the jar files.

Pipelines

The pipelines are a logical group of activities with various tasks for data ingestion, analytics.

Activities

The activity refers to an action within a pipeline. The activity can be such as copying data, executing SQL script, or notebook.

Data Flow

The data flow refers to activity for data transformation using in-built connectors and without any coding.

Triggers

The triggers define the process to execute a pipeline either manually or automatically. It can be on a specific schedule, tumbling window, or event-driven.

Conclusion

This article has provided you with an overview of Azure Synapse Analytics, including its basic architecture, features, and components. It builds a foundation for deep-diving into Azure synapse service and its implementation that we will cover in further articles.

Tags: , , Last modified: February 15, 2022
Close