In this project, I designed and implemented a robust data management pipeline using Microsoft Azure’s cloud services. Setup of the infrastructure was managed with Terraform. The complete project can be reviewed in my GitHub repository.
Project Overview
This Azure Data Management Pipeline project focuses on the integration of various Azure services to create a scalable and efficient pipeline for data ingestion, processing, and storage. The pipeline facilitates advanced data analysis and is tailored to support enterprises in making agile, informed business decisions. Terraform is utilized to programmatically create, modify, and remove resources on the Microsoft Azure cloud platform.
Technologies Employed
- Terraform For programatically managing cloud infrastructure.
- Azure Data Factory: For orchestrating and automating data flows between various Azure services.
- Azure Databricks: Utilized for data processing and running big data analytics.
- Azure SQL Database: Used for storing processed data in a structured format.
- Azure Storage: Employed for durable, scalable storage of raw data.
Execution Instructions
To engage with this project, follow these steps:
- Clone the repository:
git clone https://github.com/drjodyannjones/azure-data-management-pipeline.git
- Set up the Azure services as detailed in the project’s documentation.
- Deploy the Azure Data Factory pipelines and monitor the workflow execution within Azure Portal.
Sample Code Snippet: main.tf
Here is the main Terraform script that manages the infrastructure: