Skip to content
ATLAS Releasestrategy 7 Datahandling

ATLAS Releasestrategy 7 Datahandling

Andi Lamprecht Andi Lamprecht ·· 3 min read· Accepted
ADR-0040 · Author: Sybil Melton · Date: 2025-02-07 · Products: platform
Originally ADR-0047 ATLAS-ReleaseStrategy-7-DataHandling (v3) · Source on Confluence ↗

Release Strategy - Data Handling

Handling data across three environments - development, staging, and production - involves meticulous care, especially when dealing with two distinct types of data:

  • sensitive
  • nonsensitive.

Nonsensitive data


For nonsensitive data, a streamlined approach is adopted, where the same dataset flows seamlessly through all three environments. This approach allows to fine-tune solutions to that dataset that are based on real use-cases and speeds up the data discovery phase.

This ends up with following data structure:

EnvironmentData valuesDataset size values
Developmentreal datafull dataset
Stagingreal datafull dataset
Productionreal datafull dataset

Sensitive data


When it comes to sensitive data, a heightened level of security and privacy measures are implemented. In the development environment, sensitive data undergoes anonymization.

Data anonymization must be applied before ingestion of the sensitive data into the dev environment

An example of data anonymization ensuring that personally identifiable information is replaced with pseudonymous or placeholder values. The same should be applied to any location data that could not be processed directly. In the staging and production environments, the actual sensitive data is processed, but within a fortified security framework, safeguarding it against unauthorized access and breaches. This careful data handling strategy guarantees that sensitive information remains protected throughout its lifecycle while enabling robust development and testing processes in a controlled environment.

With that approach, the data solution cannot be fully fine tuned during the development phase, since the developers might be missing some dataset information, but is compliant with the regulations.

EnvironmentData valuesDataset size values
Developmentanonymized datafull dataset
Stagingreal datafull dataset
Productionreal datafull dataset

Data anonymization


At its core, data anonymization is a technique employed to safeguard the privacy and confidentiality of sensitive information while still allowing for meaningful analysis and research.

There are multiple techniques of data anonymization, below are few examples of it:

  1. Data Masking/Redaction:
  • Original Data: John Doe’s social security number: 123-45-6789
  • Anonymized Data: John Doe’s social security number: XXX-XX-XXXX
  1. Generalization:
  • Original Data: Exact birthdate (e.g., 1990-05-15)
  • Anonymized Data: Birth year (e.g., 1990)
  • Original Data: Exact location (e.g., lat: 34.0522 lon: -118.2437)
  • Anonymized Data: Geohash that contain this location (e.g., 9q5exr3h)
  1. Pseudonymization:
  • Original Data: Full names (e.g., Sarah Johnson)
  • Anonymized Data: Assigning unique pseudonyms (e.g., User1, User2)
  1. Tokenization:
  • Original Data: Credit card number (e.g., 1234-5678-9012-3456)
  • Anonymized Data: Replaced with a token or unique identifier (e.g., TOKEN-123456)

Aggregation:

  1. Data Encryption:
  • Encrypting data in such a way that it can only be decrypted with a specific key, protecting the data’s confidentiality.
  1. Noise Addition
  • Original Data: Precise location coordinates
  • Anonymized Data: Adding random noise to coordinates to obfuscate the exact location
Last updated on