Skip to content
ATLAS Releasestrategy 5 Dataauditability

ATLAS Releasestrategy 5 Dataauditability

Andi Lamprecht Andi Lamprecht ·· 3 min read· Accepted
ADR-0042 · Author: Sybil Melton · Date: 2025-02-07 · Products: platform
Originally ADR-0045 ATLAS-ReleaseStrategy-5-DataAuditability (v3) · Source on Confluence ↗

Release Strategy - Data Auditability

Context

Data auditability refers to the ability to track, review, and verify the history, usage, and changes made to data within DroneUp organization. It is a crucial aspect of data governance and compliance, ensuring that data is accurate and reviewing it’s past state.

To achieve high data auditability following properties must be met:

Enablement of data tracking

Data auditability involves tracking the lifecycle of data, from its creation or acquisition to its eventual deletion. This includes monitoring where the data is stored, who has access to it, and how it is used.

Capturing of data changes

It’s essential to record any changes made to data over time. This includes modifications, updates, and deletions. Knowing who made these changes and when they occurred is crucial for maintaining data integrity.

Data Retention and Archiving

Data auditability also extends to data retention and archiving policies. It ensures that data is stored for the required duration, and historical data can be retrieved when needed.

Atlas Application

To enable data auditing Atlas used an open lakehouse file format to store the data - Delta Lake. Delta Lake is a data storage and management solution built on top of Apache Parquet that is designed to meet many of the criteria for data auditability.

Delta Lake tables have following properties:

  • ACID Transactions

Delta Lake supports ACID (Atomicity, Consistency, Isolation, Durability) transactions, which ensure that data operations are reliable and that changes to data are either fully completed or fully rolled back. This is essential for maintaining data integrity and consistency.

  • Data Versioning

Delta Lake provides automatic versioning of data. Every change made to the data is recorded as a new version, allowing users to track changes over time. This versioning capability is crucial for data auditability and rollback to previous states if necessary.

  • Schema Evolution and Enforcement

Delta Lake allows for schema evolution, which means that columns can be added, modified, or deleted without breaking existing pipelines. This ensures that data structures can evolve over time while maintaining data lineage and auditability. In a default mode delta lake sets the schema enforcement on the tables, that guarantees that any commits of the data with invalid schema will be rolled back

  • Metadata Management

Delta Lake maintains metadata that tracks changes to the data, including information about when the changes had been made. This metadata is valuable for auditing data usage and ensuring accountability.

  • Time Travel

Delta Lake’s time travel feature enables you to query data at specific points in time. This is useful for auditing purposes, as it can be seen how data looked at any historical moment, making it easier to investigate issues or compliance violations.

References

Delta Lake Website

Last updated on