Data Lake for RGM: 3-Wk Pilot Implementation

Tredence Inc

Building a centralized data platform across markets to take RGM decisions efficiently along with a consumption layer for improved decision making on distribution, assortment, pricing and promotions.

Objective: Develop a robust, centralized, scalable data platform built across regions, products along with a consumption layer to take RGM decisions efficiently and effectively.

Key Challenges Addressed:

  1. Information across the org is stored in siloes which depicts an incomplete picture of the business
  2. Datasets across business units are diverse and sometimes has same data elements but labelled differently. Such datasets are difficult to map across functions and practices
  3. Managing data access and linking it to external sources
  4. Compromised ML / AI model accuracy due to smaller data sets

How do we address your challenges:

  1. Data Elements discovery is performed by analyzing RGM playbook requirements and by scrutinizing exploratory and advanced analytics use cases beyond just the documented use cases
  2. Iteratively the data model is evaluated against analytics use cases to identify gaps in entities and data elements
  3. Persona based access / security through fine grain control
  4. Extended semantics layer to interact with external sources

Pilot Outcome: Scope - 2-3 countries; up to 4 sources of input data; ~100k entries

Data cleansing, harmonization and consolidation for 1 use case. For example: creating golden records for customers from multiple data sources of overlapping information

Mapped, cleansed and consolidated data output for the given use-case along with golden record IDs as a spreadsheet / database table

Implementation Plan The break-up of the implementation plan is as below: Week 1 - Data source identification and Ingestion Week 2 - Data Harmonization Process Implementation Week 3 - Sanitize and catalog data. Stage data for queries

This implementation uses the following native Azure components: ADF pipelines Azure MS-SQL Database Azure Databricks Power BI embedded