Skip to content

Latest commit

 

History

History
284 lines (217 loc) · 37.8 KB

README.md

File metadata and controls

284 lines (217 loc) · 37.8 KB

Azure Analytics End to End with Azure Synapse - Deployment Accelerator

Overview

This is a deployment accelerator based on the reference architecture described in the Azure Architecture Center article Analytics end-to-end with Azure Synapse. This deployment accelerator aims to automate not only the deployment of the services covered by the reference architecture, but also to fully automate the configuration and permissions required for the services to work together. The deployed architecture enables the end-to-end analytics platform capable of handling the most common uses cases for most organizations.

The implementation of this deployment accelerator is done through the use of Azure Bicep, a domain-specific language (DSL) that uses declarative syntax to deploy Azure resources.

Reference Architecture

Deploy

Before you hit the deploy button, make sure you review the details about the services deployed.

Deploy to Azure

Note: The "Deploy to Azure" button above will redirect you to the Azure Portal with a reference to the resulting ARM template file generated by the build of the Bicep code. Please refer to Bicep files for the true source of the code for this accelerator.

You can also use Azure CLI to deploy the services:

For a full deployment of all workloads with public endpoints use the command below:

az deployment group create --resource-group resource-group-name --template-file ./AzureAnalyticsE2E.bicep --parameters synapseSqlAdminPassword=use-complex-password-here

For a full deployment of all workloads with vNet integrated endpoints use the command below:

az deployment group create --resource-group resource-group-name --template-file ./AzureAnalyticsE2E.bicep --parameters networkIsolationMode=vNet synapseSqlAdminPassword=use-complex-password-here

You can have more control over the deployment by providing values to optional template parameters in the form of:

az deployment group create --resource-group resource-group-name --template-file ./AzureAnalyticsE2E.bicep --parameters synapseSqlAdminPassword=use-complex-password-here param1=value1 param2=value2...

Important: This deployment accelerator is meant to be executed under no interference from Azure Policies that deny certain configurations as they might prevent the its successful completion. Please use a sandbox environment if you need to validate the deployment resulting configuration before you run it against other environments under Azure Policies.

Important: This deployment accelerator implements some service features that are still in Public Preview. Please consider those before you plan for a production deployment.

Required Resource Providers

The target subscription for the deployment accelerator needs to have the following resource providers enabled before the deployment execution:

  • Microsoft.Synapse
  • Microsoft.Purview
  • Microsoft.MachineLearningServices
  • Microsoft.ContainerRegistry
  • Microsoft.Network
  • Microsoft.DataShare
  • Microsoft.Authorization
  • Microsoft.CognitiveServices
  • Microsoft.ManagedIdentity
  • Microsoft.KeyVault
  • Microsoft.Storage
  • Microsoft.StreamAnalytics
  • Microsoft.Devices
  • Microsoft.Insights
  • Microsoft.EventHub

Deployment Details

The deployment accelerator can be deployed in two network isolation modes: default or vNet.

Network Isolation Mode Description
default Deploys the selected components to Azure using public endpoints.
vNet Deploys the selected components to Azure and the additional services to support private connectivity and restricted inter-service connectivity where possible. This includes provisioning and configuration of virtual networks, managed virtual network deployments for Azure Synapse Analytics, the private endpoints for all services that support Private Link and the supporting Private DNS Zones.

Azure Services Provisioned

The scope of this deployment accelerator is illustrated in the diagram below.

Achitecture Components

Important: All services are deployed in a single resource group and in the same region as the resource group. Before creating the resource group that will host the workloads, check the Azure Products by Region and select a region that has all selected services available. The deployment will fail if any of the services is not available in the chosen region.

Important: For a fully automated deployment and configuration of Synapse Analytics and Purview the deployment accelerator makes use of post-deployment PowerShell scripts to perform data plane operations. The operations executed by these scripts are to execute operations to complement the final environment configuration as not every setring is available through Bicep. Because of these imperative actions executed by the scripts, the template is no longer idempotent and should only be used for initial deployment and configuration. For more details about the scripts see the deployment accelerator documentation.

The default pricing tier for all services are provisioned are their lowest possible to meet the initial deployment requirements. If you choose to provide different different values to the input parameters, please observe the pricing information for each service in the table below.

If explicit names are not provided, all services names will be appended with a unique 5-letter suffix to ensure name uniqueness in Azure.

The Azure services used in the architecture above have been divided into workloads (see workload tables below) that can be conditionally deployed based on input parameters. The only mandatory workload is Synapse Analytics represented in the grey box in the diagram above.

Platform Services

Name Type Default Pricing Tier Conditional Notes
az-resource group name-uami Managed Identity N/A No Required to run post-deployment scripts. It is deleted by clean-up post deployment script.
azkeyvaultsuffix Key vault Standard A No

Synapse Analytics

Name Type Default Pricing Tier Conditional Notes
azsynapsewkssuffix Synapse workspace N/A No Default workspace deployment doesn't incur costs.
SparkCluster Apache Spark pool Small (3 nodes) Yes
EnterpriseDW Synapse SQL pool DW100 Yes
adxpoolsuffix Data Explorer pool Extra Small (2 nodes) Yes
azwksdatalakesuffix Storage account Standard LRS No
azrawdatalakesuffix Storage account Standard GRS No
azcurateddatalakesuffix Storage account Standard GRS No
SynapsePostDeploymentScript Deployment Script N/A No Deployment script resources will be automatically deleted after 24hs.

Data Governance

Name Type Default Pricing Tier Conditional Notes
azpurviewsuffix Purview account 1 Capacity Unit Yes
PurviewPostDeploymentScript Deployment Script N/A Yes Deployment script resources will be automatically deleted after 24hs.

Artificial Intelligence (AI)

Name Type Default Pricing Tier Conditional Notes
azanomalydetectorsuffix Anomaly detector Standard Yes
aztextanalyticssuffix Language Standard Yes
azmlwkssuffix Machine learning workspace N/A Yes Default workspace deployment doesn't incur costs.
azmlstoragesuffix Storage account Standard LRS Yes
azmlcontainerregsuffix Container registry Basic or Premium (see notes) Yes Premium service tier required for private link support
azmlappinsightssuffix Application Insights On-demand data ingestion charges Yes

Data Sharing

Name Type Default Pricing Tier Conditional Notes
azdatasharesuffix Data Share On-demand data processing charges Yes

Streaming

Name Type Default Pricing Tier Conditional Notes
azeventhubnssuffix Event Hub namespace Basic Yes
aziothubsuffix IoT Hub Free Yes
azstreamjobsuffix Stream Analytics job Standard Yes

Integration and Permissions

Integration and Permissions

Beyond the deployment of the services that make up the reference architecture, this template also automates the configuration of connections and permissions between the services in order for the to work properly. Every arrow you see in the diagram above represents a configuration step that has been automated for you saving you a lot of time to get to insights.

Each connection and permission in the list below has been implemented following the technical documentation for the services involved below. Check the reference documentation links below for more information about them.

Service Connections

These are the service connections explicitly defined in deployment accelerator template. These connections represent the necessary configuration for the services to be fully integrated and work well together. Note that these connections may result in implicit RBAC permissions set between resources participating in the connection that are not in the permission list below. Check the reference documentation of each service connection below for more information.

ID From Service To Service Connection Type Reference Documentation
Blue01 azsynapsewkssuffix azwksdatalakesuffix Linked Service Control storage account access for serverless SQL pool in Azure Synapse Analytics
Blue02 azsynapsewkssuffix azpurviewsuffix Workspace Connection Connect a Synapse workspace to an Azure Purview account
Blue03 azsynapsewkssuffix azkeyvaultsuffix Linked Service Store credential in Azure Key Vault
Blue04 azsynapsewkssuffix azanomalydetectorsuffix, aztextanalyticssuffix Linked Service Configure prerequisites for using Cognitive Services in Azure Synapse Analytics
Blue05 azsynapsewkssuffix azmlwkssuffix Linked Service Create a new Azure Machine Learning linked service in Synapse
Blue06 azsynapsewkssuffix azrawdatalakesuffix, azcurateddatalakesuffix Linked Service Control storage account access for serverless SQL pool in Azure Synapse Analytics
Blue07 azpurviewsuffix azkeyvaultsuffix Service Connection Credentials for source authentication in Azure Purview
Blue08 azpurviewsuffix azdatasharesuffix Service Connection How to connect Azure Data Share and Azure Purview
Blue09 azpurviewsuffix azrawdatalakesuffix, azcurateddatalakesuffix Data Source Connect to Azure Data Lake Gen2 in Azure Purview
Blue10 azmlwkssuffix azmlstoragesuffix Linked Service Connect to storage services on Azure
Blue11 azmlwkssuffix azmlappinsightssuffix Linked Service Monitor and collect data from ML web service endpoints
Blue12 azmlwkssuffix azmlcontainerregsuffix Linked Service Manage Azure Machine Learning workspaces in the portal or with the Python SDK
Blue13 azmlwkssuffix azkeyvaultsuffix Linked Service Use authentication credential secrets in Azure Machine Learning training runs
Blue14 azpurviewsuffix azsynapsewkssuffix Data Source Connect to and manage Azure Synapse Analytics workspaces in Azure Purview
Blue15 azmlwkssuffix azsynapsewkssuffix Linked Service Link Azure Synapse Analytics and Azure Machine Learning workspaces and attach Apache Spark pools
Blue16 azmlwkssuffix azrawdatalakesuffix, azcurateddatalakesuffix Datastore Connect to storage services on Azure
Blue17 azeventhubnssuffix azrawdatalakesuffix Event Capture Capture events through Azure Event Hubs in Azure Blob Storage or Azure Data Lake Storage

Azure Role Based Access Control (RBAC) Permissions

Beyond the service connections created above, the deployment accelerator template defined Azure RBAC permissions between the services. These are the minimum level of permissions granted to their system-assigned identity (MSI) for the integration to function properly. These are the Azure RBAC permissions explicitly set by the template and the reason for these permissions to exist is describer in the reference documentation for each one of them.

ID Granted To Service Granted On Service Permission Level Reference Documentation
Green01 azsynapsewkssuffix azwksdatalakesuffix Storage Blob Data Contributor Grant permissions to workspace managed identity
Green02 azpurviewsuffix azsynapsewkssuffix Reader Connect to and manage Azure Synapse Analytics workspaces in Azure Purview
Green03 azsynapsewkssuffix azrawdatalakesuffix, azcurateddatalakesuffix Storage Blob Data Contributor Grant permissions to workspace managed identity
Green04 azsynapsewkssuffix azmlwkssuffix Contributor Create a new Azure Machine Learning linked service in Synapse
Green05 azpurviewsuffix azrawdatalakesuffix, azcurateddatalakesuffix Storage Blob Data Reader Connect to Azure Data Lake Gen2 in Azure Purview
Green06 azdatasharesuffix azrawdatalakesuffix, azcurateddatalakesuffix Storage Blob Data Reader Roles and requirements for Azure Data Share
Green07 azmlwkssuffix azrawdatalakesuffix, azcurateddatalakesuffix Storage Blob Data Reader Connect to storage by using identity-based data access
Green08 azstreamjobsuffix azrawdatalakesuffix, azcurateddatalakesuffix Storage Blob Data Contributor Use Managed Identity to authenticate your Azure Stream Analytics job to Azure Blob Storage
Green09 aziothubsuffix azrawdatalakesuffix, azcurateddatalakesuffix Storage Blob Data Contributor
Green10 azstreamjobsuffix azeventhubnssuffix Event Hub Data Owner Use managed identities to access Event Hub from an Azure Stream Analytics job
Green11 azstreamjobsuffix aziothubsuffix IoT Hub Data Receiver Control access to IoT Hub by using Azure Active Directory
Green12 azpurviewsuffix Resource Group Storage Blob Data Reader Connect to and manage Azure Synapse Analytics workspaces in Azure Purview

Data Plane Permissions

ID Granted to Service Granted On Service Permission Level Reference Documentation
Red01 azsynapsewkssuffix azkeyvaultsuffix Get and List Secrets Use Azure Key Vault secrets in pipeline activities
Red02 azpurviewsuffix azkeyvaultsuffix Get and List Secrets Credentials for source authentication in Azure Purview
Red03 azmlwkssuffix azsynapsewkssuffix Synapse Apache Spark Administrator Link Azure Synapse Analytics and Azure Machine Learning workspaces and attach Apache Spark pools
Red04 azsynapewkssuffix azpurviewsuffix Data Curator Connect a Synapse workspace to an Azure Purview account
Red05 azdatasharesuffix azpurviewsuffix Data Curator How to connect Azure Data Share and Azure Purview

Networking Architecture

If you choose for a 'vNet Integrated' network isolation mode then the following applies:

  • The Synapse Workspace will be deployed with a Managed Virtual Network.
  • Managed private endpoints for some of the services will be created in the Synapse Workspace managed virtual network.
  • Either a new or an existing virtual network will be used to deploy the private endpoints for all services in the architecture that support Private Link.
  • Public access will be disabled and firewall rules will be set to restrict connectivity to and from the virtual network and between the services in the architecture.
  • Private DNS zones required by the different private link domains can be optionally deployed and linked to the selected virtual network.

Networking Architcture

The following extra services will be deployed to support the private connectivity configuration:

Component Name Type Optional
Synapse Analytics privatelink.azuresynapse.net Private DNS Zone Yes
Synapse Analytics privatelink.dev.azuresynapse.net Private DNS Zone Yes
Synapse Analytics privatelink.azuresynapse.net Private DNS Zone Yes
Synapse Analytics privatelink.sql.azuresynapse.net Private DNS Zone Yes
Synapse Analytics privatelink.dfs.core.windows.net Private DNS Zone Yes
Synapse Analytics privatelink.vaultcore.azure.net Private DNS Zone Yes
AI privatelink.api.azureml.ms Private DNS Zone Yes
AI privatelink.azurecr.io Private DNS Zone Yes
AI privatelink.file.core.windows.net Private DNS Zone Yes
AI privatelink.notebooks.azure.net Private DNS Zone Yes
Data Governance privatelink.queue.core.windows.net Private DNS Zone Yes
Data Governance privatelink.servicebus.windows.net Private DNS Zone Yes
Data Governance privatelink.blob.core.windows.net Private DNS Zone Yes
Data Governance privatelink.purview.azure.com Private DNS Zone Yes
Streaming privatelink.azure-devices.net Private DNS Zone Yes
Synapse Analytics azvnetsuffix Virtual Network No
Synapse Analytics azsynapsehubsuffix Synapse private link hub No
Synapse Analytics azsynapsewkssuffix-web Private Endpoint No
Synapse Analytics azsynapsewkssuffix-sqlserverless Private Endpoint No
Synapse Analytics azsynapsewkssuffix-sql Private Endpoint No
Synapse Analytics azsynapsewkssuffix-dev Private Endpoint No
Synapse Analytics azkeyvaultsuffix Private Endpoint No
Synapse Analytics azwksdatalakesuffix-dfs Private Endpoint No
Synapse Analytics azrawdatalakesuffix-dfs Private Endpoint No
Synapse Analytics azcurateddatalakesuffix-dfs Private Endpoint No
Data Governance azpurviewsuffix-queue Private Endpoint No
Data Governance azpurviewsuffix-portal Private Endpoint No
Data Governance azpurviewsuffix-namespace Private Endpoint No
Data Governance azpurviewsuffix-blob Private Endpoint No
Data Governance azpurviewsuffix-account Private Endpoint No
AI aztextanalyticssuffix-account Private Endpoint No
AI azanomalydetectorsuffix-account Private Endpoint No
AI azmlwkssuffix-amlworkspace Private Endpoint No
AI azmlstoragesuffix-file Private Endpoint No
AI azmlstoragesuffix-blob Private Endpoint No
AI azmlcontainerregsuffix-registry Private Endpoint No
Streaming azeventhubnssuffix-namespace Private Endpoint No
Streaming azeiothubsuffix-iothub Private Endpoint No

Beyond the extra services above required to support the network isolation mode, the following network settings are applied to the services:

1

Workload Name Type Network Settings Notes Reference Documentation
Platform Services azkeyvaultsuffix Key vault 1 2 'Allow Azure Services' required for access from Azure Purview and Azure ML Configure Azure Key Vault networking settings
Synapse Analytics azsynapsewkssuffix Synapse workspace 1 Managed Virtual Network enabled Understanding Azure Synapse Private Endpoints
Synapse Analytics azwksdatalakesuffix Storage account 1 3 Configure Azure Storage firewalls and virtual networks
Synapse Analytics azrawdatalakesuffix Storage account 1 2 3 'Allow Azure Services' enabled only when deploying Streaming workloads with Event Hubs Configure Azure Storage firewalls and virtual networks
Synapse Analytics azcurateddatalakesuffix Storage account 1 2 3 'Allow Azure Services' enabled only when deploying Streaming workloads with Event Hubs Configure Azure Storage firewalls and virtual networks
Data Governance azpurviewsuffix Purview account 1 Connect to your Azure Purview and scan data sources privately and securely
AI azanomalydetectorsuffix Anomaly detector 1 Configure Azure Cognitive Services virtual networks
AI aztextanalyticssuffix Language 1 Configure Azure Cognitive Services virtual networks
AI azmlwkssuffix Machine learning workspace 1 Secure Azure Machine Learning workspace resources using virtual networks (VNets)
AI azmlstoragesuffix Storage account 1 2 Secure an Azure Machine Learning workspace with virtual networks
AI azmlcontainerregsuffix Container registry 1 2 Secure an Azure Machine Learning workspace with virtual networks
Streaming azeventhubnssuffix Event Hub namespace 1 2 Network security for Azure Event Hubs
Streaming aziothubsuffix IoT Hub 1 IoT Hub support for virtual networks with Private Link and Managed Identity
Streaming azstreamjobsuffix Stream Analytics job Stream Analytics Jobs don't support vNet integration. For that you should use Stream Analytics Clusters

Contributing

If you would like to contribute to the solution (log bugs, issues, or add code) we have details on how to do that in our CONTRIBUTING.md file.

License

Details on licensing for the project can be found in the LICENSE file.