https://store-images.s-microsoft.com/image/apps.13311.3325e6b4-a496-4c16-8924-740e61ab7ea9.b423bd13-a356-41ea-b275-736fd916c9c6.2945004c-6fb5-4345-b072-02f87e419e06

Free

Publisher

dataroots

Service type

Silver competencies Silver badge

Data Analytics

Cloud Platform

Solution Areas

Country/Region

Belgium

Data Lake Starter Kit 1 hour Implementation

dataroots

Module to automatically deploy a complete and opinionated data lake network on your own Microsoft Azure environment in less than one hour.

This is a module for Terraform that deploys a complete and opinionated data lake network on Microsoft Azure. All the different Azure components to start up a data platform and enable the power of big data. Code available via https://github.com/datarootsio/terraform-module-azure-datalake

Traditional data platforms are not originated for big data like streaming data//real-time data, unstructured data or big volumes of data. A data lake is designed to cope with this big data aspects and enables organizations to benefit from the potential of big data. This module will start you up in establishing a data lake on Azure and realize the benefits of AI use case on big data.

Using Infrastructure as Code all initial components of an Azure Data Lake are deployed:

Azure Data Factory for data ingestion from various sources
3 or more Azure Data Lake Storage gen2 containers to store raw, clean and curated data
Azure Databricks to clean and transform the data
Azure Synapse Analytics to store presentation data
Azure CosmosDB to store metadata
Credentials and access management configured ready to go
Sample data pipeline (optional)

The Infrastructure as Code module needs some configuration to adapt to your Azure environment. All configuration options can be found in the guide on https://github.com/datarootsio/terraform-module-azure-datalake/blob/master/CONFIGURATION.md

After configuration and running the Terraform module, the result is a ready to use Azure data lake. In the package we provide sample data, but use the Azure data factory component to connect to other data sources. The data stored in the Azure Data Lake Storage gen2 can be used to infuse an Artificial Intelligence use case. Databricks notebooks are deployed for preprocessing and transformation activities on the data. Via the Azure Synapse component data and results of the AI use case can be organized and prepared for visualization purposes when connecting to Power BI.

Learn more

Azure Data Lake Architecture Module overview

https://store-images.s-microsoft.com/image/apps.2766.3325e6b4-a496-4c16-8924-740e61ab7ea9.b423bd13-a356-41ea-b275-736fd916c9c6.e55cbcc0-1cf6-44e8-96de-89cdc140e822

/staticstorage/8a851d9/assets/videoOverlay_7299e00c2e43a32cf9fa.png

https://store-images.s-microsoft.com/image/apps.2766.3325e6b4-a496-4c16-8924-740e61ab7ea9.b423bd13-a356-41ea-b275-736fd916c9c6.e55cbcc0-1cf6-44e8-96de-89cdc140e822

/staticstorage/8a851d9/assets/videoOverlay_7299e00c2e43a32cf9fa.png

https://store-images.s-microsoft.com/image/apps.56542.3325e6b4-a496-4c16-8924-740e61ab7ea9.b423bd13-a356-41ea-b275-736fd916c9c6.9ec82c4e-b473-4d4b-857a-2f3e22157490

https://store-images.s-microsoft.com/image/apps.13166.3325e6b4-a496-4c16-8924-740e61ab7ea9.b423bd13-a356-41ea-b275-736fd916c9c6.b428710e-342a-436d-815f-eaf02eb8ac86

Publisher

Data Lake Starter Kit 1 hour Implementation

dataroots

Module to automatically deploy a complete and opinionated data lake network on your own Microsoft Azure environment in less than one hour.

Learn more

Other consulting services from dataroots

AI Prototype Building - 10 Day Implementation

Data Platform Assessment

AI Strategy Workshop: 5 days Workshop

AI Prototype Building - 10 Day Implementation