Data Protection Engine PoC for all structured and semi-structure data sets
Built in Apache Spark and deployed to Azure Databricks the Data Protection Engine (DPE) helps ensure your business is compliant with GDPR, CCPA and other Data Privacy regulations. The DPE is able to handle the following data formats: Parquet, CSV, XML, JSON, Avro, SQL with more being added.
The PoC will include the deployment of the DPE Platform to your environment, including the following resources: Azure Data Factory, Azure Data Lake Store, Azure Cosmos DB & Azure Cosmos DB. The DPE is fully JSON configurable allowing multiple types of data protection to take place including: Pseudonymization, Anonymization, Generalisation and more. Furthermore, for Pseudonymization workflows, de-tokenization is also supported to help turn back to the original values.
Our tool uses patterns and practices as highlighted by the European Union Agency for Cybersecurity (enisa) to ensure that your data is protected in ways to maintain compliance with GDPR and CCPA processing laws, with Right to Erasure mechanisms built in and exposed via a Data Factory pipeline.
Cosmos DB is used as a configuration store for your Data Protection Polices telling the DPE how to process the data, this config is read and processed by Data Factory before calling Azure Databricks to protect the data as per the Cosmos Policy. Logs are pushed into Azure SQL DB for reporting on linage as well as Log Analytics to track any issue or errors. The deployment includes defining ACLs to ensure the protection of highly sensitive data within the "Token Vault" which contains the Pseudonymization key value mapping.