GitHub - atulkamble/Azure-Data-Factory: Azure Data Factory Project

Below is a full Azure Data Factory (ADF) project with a sample dataset, code, and detailed steps for creating a pipeline to move data from Azure Blob Storage to an Azure SQL Database, including transformations.

Objective

Move and transform data from Azure Blob Storage (CSV file) to an Azure SQL Database using Azure Data Factory.

Sample Dataset

Save this dataset as a sample_data.csv file and upload it to an Azure Blob Storage container.

EmployeeID,Name,Department,Salary
101,John Doe,Engineering,60000
102,Jane Smith,Marketing,55000
103,Michael Brown,Sales,45000
104,Linda White,HR,50000

Steps to Create the ADF Project

Step 1: Prerequisites

Azure Subscription: Ensure you have access.
Azure Blob Storage: Create a storage account, container, and upload the sample_data.csv file.

Azure SQL Database:

Create a table for the data:

CREATE TABLE Employee (
    EmployeeID INT PRIMARY KEY,
    Name NVARCHAR(50),
    Department NVARCHAR(50),
    Salary INT
);

Install Azure Data Factory Studio (available via the Azure Portal).

Step 2: Create the Data Factory

Navigate to the Azure Portal.
Search for Data Factory and click Create.
Fill in:
- Subscription: Select your subscription.
- Resource Group: Create or select one.
- Region: Choose a nearby region.
- Data Factory Name: Provide a unique name.
Click Review + Create, then Create.

Step 3: Create Linked Services

Linked Services are used to connect ADF to external resources.

1. Blob Storage Linked Service

In ADF Studio, go to Manage > Linked Services.
Click + New and select Azure Blob Storage.
Configure:
- Account Selection Method: Enter manually or use a subscription.
- Storage Account Name: Enter the storage account.
Test the connection and save.

2. Azure SQL Database Linked Service

Create another Linked Service for Azure SQL Database.
Configure:
- Server Name: Enter the server address.
- Database Name: Enter the database name.
- Authentication Type: SQL authentication.
- Username and Password: Enter credentials.
Test the connection and save.

Step 4: Create Datasets

Datasets represent the data structure in the source and destination.

1. Blob Dataset

Go to Author > Datasets, click + New Dataset.
Select Azure Blob Storage and DelimitedText.
Configure:
- Linked Service: Select Blob Storage.
- File Path: Point to sample_data.csv.
- Enable First Row as Header.
Save as BlobInputDataset.

2. SQL Dataset

Add another dataset for the Azure SQL Database.
Select Azure SQL Database and configure:
- Linked Service: Select SQL Database.
- Table Name: Choose Employee.
Save as SQLSinkDataset.

Step 5: Create the Pipeline

In Author > Pipelines, click + New Pipeline.
Drag and drop the Copy Data activity onto the canvas.
Configure the activity:
- Source:
  - Select BlobInputDataset.
- Sink:
  - Select SQLSinkDataset.
- Mapping:
  - Map source columns to sink columns:
    - EmployeeID → EmployeeID
    - Name → Name
    - Department → Department
    - Salary → Salary.

Step 6: Debug and Run

Click Debug to test the pipeline.
Monitor the progress in the Output window.

Step 7: Publish and Trigger

Click Publish All to save changes.
Add a trigger:
- Manual: Use Trigger Now.
- Scheduled: Configure a schedule in Add Trigger > New/Edit.

Code Snippets

Pipeline JSON

{
  "name": "CopyPipeline",
  "properties": {
    "activities": [
      {
        "name": "Copy Data from Blob to SQL",
        "type": "Copy",
        "typeProperties": {
          "source": {
            "type": "DelimitedTextSource",
            "additionalColumns": []
          },
          "sink": {
            "type": "AzureSqlSink"
          }
        },
        "inputs": [
          {
            "referenceName": "BlobInputDataset",
            "type": "DatasetReference"
          }
        ],
        "outputs": [
          {
            "referenceName": "SQLSinkDataset",
            "type": "DatasetReference"
          }
        ]
      }
    ]
  }
}

Blob Dataset JSON

{
  "name": "BlobInputDataset",
  "properties": {
    "linkedServiceName": {
      "referenceName": "AzureBlobStorageLinkedService",
      "type": "LinkedServiceReference"
    },
    "type": "DelimitedText",
    "typeProperties": {
      "location": {
        "type": "AzureBlobStorageLocation",
        "container": "sample-container",
        "fileName": "sample_data.csv"
      },
      "columnDelimiter": ",",
      "firstRowAsHeader": true
    }
  }
}

SQL Dataset JSON

{
  "name": "SQLSinkDataset",
  "properties": {
    "linkedServiceName": {
      "referenceName": "AzureSQLDatabaseLinkedService",
      "type": "LinkedServiceReference"
    },
    "type": "AzureSqlTable",
    "typeProperties": {
      "tableName": "Employee"
    }
  }
}

Enhancements

Transformation:
- Add a Data Flow for complex transformations.
Error Handling:
- Use Try-Catch blocks for pipeline errors.
Parameterization:
- Add pipeline parameters for dynamic dataset paths or table names.
Monitoring:
- Enable alerts via Azure Monitor.

This full project demonstrates creating a functional Azure Data Factory pipeline with datasets, linked services, and transformations. Let me know if you'd like additional customizations!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ADF.md		ADF.md
LICENSE		LICENSE
README.md		README.md
practice.md		practice.md
sample_data.csv		sample_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Objective

Sample Dataset

Steps to Create the ADF Project

Step 1: Prerequisites

Step 2: Create the Data Factory

Step 3: Create Linked Services

1. Blob Storage Linked Service

2. Azure SQL Database Linked Service

Step 4: Create Datasets

1. Blob Dataset

2. SQL Dataset

Step 5: Create the Pipeline

Step 6: Debug and Run

Step 7: Publish and Trigger

Code Snippets

Pipeline JSON

Blob Dataset JSON

SQL Dataset JSON

Enhancements

About

Releases

Packages

License

atulkamble/Azure-Data-Factory

Folders and files

Latest commit

History

Repository files navigation

Objective

Sample Dataset

Steps to Create the ADF Project

Step 1: Prerequisites

Step 2: Create the Data Factory

Step 3: Create Linked Services

1. Blob Storage Linked Service

2. Azure SQL Database Linked Service

Step 4: Create Datasets

1. Blob Dataset

2. SQL Dataset

Step 5: Create the Pipeline

Step 6: Debug and Run

Step 7: Publish and Trigger

Code Snippets

Pipeline JSON

Blob Dataset JSON

SQL Dataset JSON

Enhancements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages