ATLAS Releasestrategy 7 Datahandling

Andi Lamprecht ·2026-05-11· 3 min read· Accepted

ADR-0040 · Author: Sybil Melton · Date: 2025-02-07 · Products: platform
Originally ADR-0047 ATLAS-ReleaseStrategy-7-DataHandling (v3) · Source on Confluence ↗

Release Strategy - Data Handling

Handling data across three environments - development, staging, and production - involves meticulous care, especially when dealing with two distinct types of data:

sensitive
nonsensitive.

Nonsensitive data

For nonsensitive data, a streamlined approach is adopted, where the same dataset flows seamlessly through all three environments. This approach allows to fine-tune solutions to that dataset that are based on real use-cases and speeds up the data discovery phase.

This ends up with following data structure:

Environment	Data values	Dataset size values
Development	real data	full dataset
Staging	real data	full dataset
Production	real data	full dataset

Sensitive data

When it comes to sensitive data, a heightened level of security and privacy measures are implemented. In the development environment, sensitive data undergoes anonymization.

Data anonymization must be applied before ingestion of the sensitive data into the dev environment

An example of data anonymization ensuring that personally identifiable information is replaced with pseudonymous or placeholder values. The same should be applied to any location data that could not be processed directly. In the staging and production environments, the actual sensitive data is processed, but within a fortified security framework, safeguarding it against unauthorized access and breaches. This careful data handling strategy guarantees that sensitive information remains protected throughout its lifecycle while enabling robust development and testing processes in a controlled environment.

With that approach, the data solution cannot be fully fine tuned during the development phase, since the developers might be missing some dataset information, but is compliant with the regulations.

Environment	Data values	Dataset size values
Development	anonymized data	full dataset
Staging	real data	full dataset
Production	real data	full dataset

Data anonymization

At its core, data anonymization is a technique employed to safeguard the privacy and confidentiality of sensitive information while still allowing for meaningful analysis and research.

There are multiple techniques of data anonymization, below are few examples of it:

Data Masking/Redaction:

Original Data: John Doe’s social security number: 123-45-6789
Anonymized Data: John Doe’s social security number: XXX-XX-XXXX

Generalization:

Original Data: Exact birthdate (e.g., 1990-05-15)
Anonymized Data: Birth year (e.g., 1990)
Original Data: Exact location (e.g., lat: 34.0522 lon: -118.2437)
Anonymized Data: Geohash that contain this location (e.g., 9q5exr3h)

Pseudonymization:

Original Data: Full names (e.g., Sarah Johnson)
Anonymized Data: Assigning unique pseudonyms (e.g., User1, User2)

Tokenization:

Original Data: Credit card number (e.g., 1234-5678-9012-3456)
Anonymized Data: Replaced with a token or unique identifier (e.g., TOKEN-123456)

Aggregation:

Data Encryption:

Encrypting data in such a way that it can only be decrypted with a specific key, protecting the data’s confidentiality.

Noise Addition

Original Data: Precise location coordinates
Anonymized Data: Adding random noise to coordinates to obfuscate the exact location

Last updated on May 11, 2026

QA ATLAS Releasestrategy QA LVL1 Basechecks DATA Pipeline ATLAS 3 ETL Structure