ASR-0001 Self-Service ETL
Originally
ASR-0001-SELF-SERVICE-ETL (v4) · Source on Confluence ↗Summary
Each DroneUp league holds unique data, as an output of it’s processes. This data could serve as a valuable resource that can be harnessed to provide crucial business insights. However, before these insights can be extracted and utilized effectively, an Extract, Transform, Load (ETL) process must be undertaken. This process involves extracting the relevant data, transforming it into a suitable format, and loading it into a system where detailed analysis can be performed.
To extract the business value the data first needs to be moved to chosen data management solution (Data Warehouse/Data Lake/Data Lakehouse) where it is later processed and BI/ML outputs are produces.
In the default flow a dedicated team was responsible for the management of whole organization data. This approach creates a tight coupling between this team and the other leagues.
However there could be a scenarios where, to avoid creating a bottle neck, a league decides to integrate with data management layer and create a Extract, Transform, Load (ETL) by themselves. This approach from one side brings huge value to the company, because a team that produces the data understands it the most, and can provide accurate analytics to it. On the other hand there is a risk that a team may not be aware about good practices related with data management, that could lower it’s quality.
Requirements
- Each team should have the capability to create and implement an ETL process for their data within a robust data management solution.
- A comprehensive guideline should be established to address the handling of real-time data within the system. This guideline should outline the necessary procedures, protocols, and best practices for efficiently managing and processing real-time data streams.
- The system should possess the capability to handle data that does not have specific requirements or predefined structures. It should be designed to accommodate diverse data formats, allowing for seamless integration and analysis of various data types. This flexibility will enable the system to adapt to evolving business needs, as well as incorporate new sources of data without significant modifications or disruptions.
- A guideline and a set of tools should be provided that supports the external teams with creating ETL to preserve core data quality features such as traceability and maintain a clear and auditable lineage of data throughout the ETL.
- A guideline and a set of tools should be provided to implement data quality checks and validations at various stages of the ETL process. This functionality ensures that data meets predefined quality standards and criteria, including accuracy, completeness, consistency, and integrity.
- An external team should be able to self orchestrate any self-created analytics withing chosen data management solution
- The chosen data management solution must support concurrent data loading and analytics processing. Creation of a new ETL cannot interrupt existing ETLs execution.