Skip to content
QA ATLAS Releasestrategy QA LVL2 Integrationtests

QA ATLAS Releasestrategy QA LVL2 Integrationtests

Andi Lamprecht Andi Lamprecht ·· 3 min read· Accepted
ADR-0029 · Author: Sybil Melton · Date: 2025-02-07 · Products: platform
Originally ADR-0049 QA_ATLAS-ReleaseStrategy-QA-LVL2-IntegrationTests (v7) · Source on Confluence ↗

Release Strategy - QA - Level 2 - Integration Tests

Context

Integration testing is a crucial phase in the software development lifecycle that focuses on verifying the interactions between different components or modules of a system. It ensures that these components work harmoniously together as intended and identifies any issues that may arise when they are integrated.

In the Atlas context, the two critical scenarios for integration testing emerges:

  • Apache Airflow Integration Testing
  • Pyspark Integration Testing

Application

Apache Airflow Integration Testing:

Integration testing for Apache Airflow primarily revolves around the orchestration and scheduling of tasks within workflows. It aims to verify that tasks are correctly triggered, dependencies are managed, and data flows smoothly through the pipeline. In this scenario, the scope includes testing the integration of various operators, sensors, and custom components. Objectives encompass detecting issues related to task dependencies, data exchange between tasks, and the proper execution of workflows. The testing process validates that Airflow orchestrates tasks as expected, ensuring that scheduled workflows meet their deadlines, error handling is effective, and the system operates reliably.

Pyspark Integration Testing

Pyspark integration testing focuses on two aspects of the data processing.

  • Integration with Sinks/Outputs

This sub-group of the tests primary objective is to validate the seamless integration between PySpark and data storage layers.

  • Validating data transformation outputs

Those are crucial tests that validates every spark script as a whole. The main idea for this test is to:

  1. Prepare fake input data for a script
  2. Prepare expected output data that script should produce
  3. Run the spark script with the fake data injected
  4. Validate if the script output matches expected output data

These integration tests are performed at this level because the size and complexity of the fake input data or expected output data can be substantial. By rigorously validating data transformation outputs, PySpark integration testing ensures that the data processing scripts perform accurately and consistently under diverse scenarios, guaranteeing data quality and reliability within the data processing pipelines.

Incorporating these two aspects of integration testing into the PySpark workflows contributes significantly to the robustness and dependability of the data processing pipeline and supports maintaining the data integrity.

Main goals of this test level

Integration tests verify the seamless collaboration of individual components within a system, ensuring they work cohesively as a unified whole. These tests aim to detect and address integration issues early in the development process, reducing the risk of critical defects emerging in production. Ultimately, integration tests play a pivotal role in enhancing system reliability, stability, and overall software quality.

This test level gives an answer to the questions:

Are the integrations between the services works as we expect them to work?
Are the spark scripts as a whole produce expected results?
Last updated on