Data Lake Starter Kit 1 hour Implementation


Module to automatically deploy a complete and opinionated data lake network on your own Microsoft Azure environment in less than one hour.

This is a module for Terraform that deploys a complete and opinionated data lake network on Microsoft Azure. All the different Azure components to start up a data platform and enable the power of big data. Code available via

Traditional data platforms are not originated for big data like streaming data//real-time data, unstructured data or big volumes of data. A data lake is designed to cope with this big data aspects and enables organizations to benefit from the potential of big data. This module will start you up in establishing a data lake on Azure and realize the benefits of AI use case on big data.

Using Infrastructure as Code all initial components of an Azure Data Lake are deployed:

  • Azure Data Factory for data ingestion from various sources
  • 3 or more Azure Data Lake Storage gen2 containers to store raw, clean and curated data
  • Azure Databricks to clean and transform the data
  • Azure Synapse Analytics to store presentation data
  • Azure CosmosDB to store metadata
  • Credentials and access management configured ready to go
  • Sample data pipeline (optional)

The Infrastructure as Code module needs some configuration to adapt to your Azure environment. All configuration options can be found in the guide on

After configuration and running the Terraform module, the result is a ready to use Azure data lake. In the package we provide sample data, but use the Azure data factory component to connect to other data sources. The data stored in the Azure Data Lake Storage gen2 can be used to infuse an Artificial Intelligence use case. Databricks notebooks are deployed for preprocessing and transformation activities on the data. Via the Azure Synapse component data and results of the AI use case can be organized and prepared for visualization purposes when connecting to Power BI.