12-week Azure Databricks Data Lakehouse implementation

Fractal Analytics Inc.

Rapidly implement an Azure Databricks Data Lakehouse with Fractal's Enterprise Data Lakehouse Migration Accelerator

According to research sponsored by Databricks, 73% of organizations are currently linking their data warehouses to data lakes to enable the use of all their data in BI and machine learning analytics. This approach works for most of these organizations; however, it comes with challenges. For example, leveraging both a Data Lake and Data Warehouse creates the need for a complicated and costly architecture. Such a complex architecture also leads to other underlying challenges, like maintaining the database connections and connections to the BI and data science solutions and services using the data.

A Data Lakehouse built on Azure Databricks helps address these challenges by combining the cost-efficient data storage of a Data Lake with the data management and ACID-compliant transaction capabilities of a Data Warehouse. This allows organizations to leverage all of their data in BI visualizations and machine learning analytics from a single source of truth with a simplified architecture that helps reduce costs and the risk of errors.

Enterprise Data Lakehouse Accelerator

The Enterprise Data Lakehouse Accelerator offer from Fractal Analytics is designed to help organizations stand up and migrate data to a simple, open Data Lakehouse built on Azure Databricks in 12 weeks. During this engagement, Fractal will leverage predefined templates, migration patterns, and consulting practices to:
  • Understand client challenges and desired outcomes
  • Build a migration roadmap designed to achieve the client's end goals
  • Make infrastructure and architecture recommendations
  • Implement and scale the Data Lakehouse
  • Migrate data to the Data Lakehouse
  • Train client to manage the Data Lakehouse and hand over documentation

Deliverables and timelines

Azure Databricks Data Lakehouse implementations typically follow the timeline below:

Weeks 1-2:

  • Understand key requirements and documentation
  • Key stakeholder and SME interviews
  • Review customer DataOps environment

Week 3-10:

  • Provision cloud environment (including Azure Databricks)
  • Deploy DataOps tools and services
  • Review maturity gaps and validate designs
  • Create a pilot framework based on an existing model
  • Update and finalize recommendations

Week 11-12:

  • Finalize documentation
  • Test framework and present results
  • Review progress and move to production

At the end of the 12-week period, the client should expect to receive a fully functional Data Lakehouse operating on Azure Databricks.

Sample Deployment

We have a GitHub repository for this solution where you can go through step by step to deploy the resources.

Click here to access the GitHub repository.