Below is a full Azure Data Factory (ADF) project with a sample dataset, code, and detailed steps for creating a pipeline to move data from Azure Blob Storage to an Azure SQL Database, including transformations.
Move and transform data from Azure Blob Storage (CSV file) to an Azure SQL Database using Azure Data Factory.
Save this dataset as a sample_data.csv
file and upload it to an Azure Blob Storage container.
EmployeeID,Name,Department,Salary
101,John Doe,Engineering,60000
102,Jane Smith,Marketing,55000
103,Michael Brown,Sales,45000
104,Linda White,HR,50000
- Azure Subscription: Ensure you have access.
- Azure Blob Storage: Create a storage account, container, and upload the
sample_data.csv
file. - Azure SQL Database:
- Create a table for the data:
CREATE TABLE Employee ( EmployeeID INT PRIMARY KEY, Name NVARCHAR(50), Department NVARCHAR(50), Salary INT );
- Create a table for the data:
- Install Azure Data Factory Studio (available via the Azure Portal).
- Navigate to the Azure Portal.
- Search for Data Factory and click Create.
- Fill in:
- Subscription: Select your subscription.
- Resource Group: Create or select one.
- Region: Choose a nearby region.
- Data Factory Name: Provide a unique name.
- Click Review + Create, then Create.
Linked Services are used to connect ADF to external resources.
- In ADF Studio, go to Manage > Linked Services.
- Click + New and select Azure Blob Storage.
- Configure:
- Account Selection Method: Enter manually or use a subscription.
- Storage Account Name: Enter the storage account.
- Test the connection and save.
- Create another Linked Service for Azure SQL Database.
- Configure:
- Server Name: Enter the server address.
- Database Name: Enter the database name.
- Authentication Type: SQL authentication.
- Username and Password: Enter credentials.
- Test the connection and save.
Datasets represent the data structure in the source and destination.
- Go to Author > Datasets, click + New Dataset.
- Select Azure Blob Storage and DelimitedText.
- Configure:
- Linked Service: Select Blob Storage.
- File Path: Point to
sample_data.csv
. - Enable First Row as Header.
- Save as
BlobInputDataset
.
- Add another dataset for the Azure SQL Database.
- Select Azure SQL Database and configure:
- Linked Service: Select SQL Database.
- Table Name: Choose
Employee
.
- Save as
SQLSinkDataset
.
- In Author > Pipelines, click + New Pipeline.
- Drag and drop the Copy Data activity onto the canvas.
- Configure the activity:
- Source:
- Select
BlobInputDataset
.
- Select
- Sink:
- Select
SQLSinkDataset
.
- Select
- Mapping:
- Map source columns to sink columns:
EmployeeID → EmployeeID
Name → Name
Department → Department
Salary → Salary
.
- Map source columns to sink columns:
- Source:
- Click Debug to test the pipeline.
- Monitor the progress in the Output window.
- Click Publish All to save changes.
- Add a trigger:
- Manual: Use Trigger Now.
- Scheduled: Configure a schedule in Add Trigger > New/Edit.
{
"name": "CopyPipeline",
"properties": {
"activities": [
{
"name": "Copy Data from Blob to SQL",
"type": "Copy",
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"additionalColumns": []
},
"sink": {
"type": "AzureSqlSink"
}
},
"inputs": [
{
"referenceName": "BlobInputDataset",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "SQLSinkDataset",
"type": "DatasetReference"
}
]
}
]
}
}
{
"name": "BlobInputDataset",
"properties": {
"linkedServiceName": {
"referenceName": "AzureBlobStorageLinkedService",
"type": "LinkedServiceReference"
},
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"container": "sample-container",
"fileName": "sample_data.csv"
},
"columnDelimiter": ",",
"firstRowAsHeader": true
}
}
}
{
"name": "SQLSinkDataset",
"properties": {
"linkedServiceName": {
"referenceName": "AzureSQLDatabaseLinkedService",
"type": "LinkedServiceReference"
},
"type": "AzureSqlTable",
"typeProperties": {
"tableName": "Employee"
}
}
}
- Transformation:
- Add a Data Flow for complex transformations.
- Error Handling:
- Use Try-Catch blocks for pipeline errors.
- Parameterization:
- Add pipeline parameters for dynamic dataset paths or table names.
- Monitoring:
- Enable alerts via Azure Monitor.
This full project demonstrates creating a functional Azure Data Factory pipeline with datasets, linked services, and transformations. Let me know if you'd like additional customizations!