Sogeti Artificial Data Amplifier (ADA): 4 Week Proof of Concept

Capgemini Group

A synthetic data generating solution built on Azure using advanced deep learning techniques using all forms of data (text, image, speech etc..).

  1. Synthetic means not real or artificial, the same is applicable for data. Synthetic data is not something new and has been around in quality assurance and testing for ages with traditional tools you could create your own rule-based production data, that can be used for quite a lot of testing purposes.

The biggest drawback of using rule based synthetic data is that it doesn’t necessarily represent the real-world data as the defined rules can’t cover all the different use cases and business behaviors. Other issues with many firms are unavailability of data, data is too sensitive and when we need more data.

Artificial data amplifier is the solution to all the above issues. ADA is an AI solution that generates synthetic data that looks and feels like real data. ADA unlocks analytics and software development by supplying full access to data without compromising on customer trust, compliance, and privacy & security. ADA uses advanced neural networks to map real data and generate synthetic data that preserves all the relationships & characteristics. ADA can generate data in various forms like, tabular data (databases), Images and Free texts.

ADA uses the cognitive capabilities of Microsoft’s azure services to train and deploy the solution/ model. Various PaaS components are used to host the docker model on Azure e.g. Azure App services, Azure Blob storage, container registry, Azure D evOps etc. Also ADA leverages components like Azure Computer vision, ML Studio, MLOps framework to extract information and learn the details from unstructured data. ADA enables our customer to create high volume relevant data that resembles the production data. ADA also brings value to our customers by

  1. Generates Risk free and Scalable Data set
  2. Exponential opportunities - Dataset is interchangeable with real data and can unlock & accelerate many complex AI solutions
  3. High Quality and Speed
  4. ETL and DB agnostic This solution is available on cloud, on premises and also as a services.

The ADA business model includes the following tasks: Discovery

  1. Setup Work environment
  2. Workshop with customer to understand the requirement and existing landscape
  3. Setup Azure Services 4.Create the Project plan
  4. Gather Data for training


  1. Deploy the ADA components to Azure 2.Train custom models with ADA Framework
  2. Implement the UI component for utilizing ADA
  3. Implement the API for the end to end pipeline
  4. Model Visualization and finetuning
  5. QA and Bug Fixes


  1. User Acceptance testing
  2. Deployment and training