- Adviesdiensten
Reduce time to implement an ETL solution with a proven method that is a standard process.
Save 60 to 70+ percent of your ETL needs on the Azure Data Platform excluding complex business rules.
Starting a cloud journey can be a long path starting from scratch. IBM can help save time and money with packaged accelerators that can save months in engineering with built in best practices from many Azure implementations.
IBM Azure Data Platform Accelerators Summary
Install a big data analytical processing system tomorrow with IBM and leap-frog the process of acquiring mature capabilities of processing data on the cloud. You own the code implemented on Azure native resources with an IBM services implementation engagement. Save 60 to 70+ percent of your ETL needs on the Azure Data Platform excluding complex business rules. Choose which Azure resources are needed for your solution because the accelerator is driven by configuration. For example, the tabular model of Azure Analysis Services in the Semantic tier may be stored in Power BI. If that is the case, this tier can be skipped, and the updates of the tabular model can be refreshed right on the Power BI capacity. Security best practices are pre-configured and ready for deployment on day one. This accelerator and framework will work with the pre- and post-Synapse versions where Spark Notebooks and ADF are more tightly integrated with SQL DW.
Use Azure Data Factory (ADF) with only configuration entries in a set of database tables to control how your data pipelines operate. A patent-pending set of algorithms in TSQL and ADF allow development teams to create pipelines that run dynamically to create thousands of possibilities with no development or unit testing. Also, gain deeper insights into the operations such as data lake placement of files, any errors, the configuration used for the run, overall status, operation status, and content auditing. A project at Nestle was able to create 50 pipelines to handle approximately 150 datasets in 2 days. Creating the pipelines from scratch with unit testing would have taken 45 to 60 days.
ETL means processes extract, transform, and then load the data into a data repository. That is three operations in a specific order. Some data repositories run more efficiently with an ELT approach, which is extracting from the source, loading into the destination, and then performing the transformation. With this invention that includes nine operations out of the box, it can order the operations in any order. Depending on the most efficient way to handle the data, the developer can configure the orchestration of the pipeline for the dataset without having to develop and unit test the pipeline. Another term that can be used for this invention is the automation of the dataset logistics throughout a data platform. Additionally, an identifier is appended to the dataset when it is received so it can track and trace the dataset through the stages of data platform processing. Furthermore, one of the operations performs content auditing of the dataset so it can be confirmed that aggregated metrics of the dataset are the same when it comes into the data platform and after transformations. Unexpected values in the incoming dataset can lead to unexpected results in the calculations, business rules, and/or mapping which can lead to a false sense of the truth. With this process, the single pane of glass dashboard explicitly shows unequal values where proactive action can be taken before the business, which relies on numbers being accurate, reports a defect in the system. Finally, the asset includes a dependency checker that will process datasets in a sequence when dependent datasets have not finished, which gives the capability to process all data asynchronously providing a shortened processing window. Pipelines that have dependencies will be held in a suspended state until their dependencies have finished.
BENEFITS: SPEED-TO-MARKET
FLEXIBLE
TOTAL COST OF OWNERSHIP
SCALABLE
EXTENSIBLE
ENTERPRISE GRADE