Skip to content
AOI Management Service (Atlas)

AOI Management Service (Atlas)

Andi Lamprecht Andi Lamprecht ·· 8 min read· Draft
FieldValue
StatusDraft
OwnerSzymon Sikora
ContributorsTBD
Date2026-04-20

PRD: AOI Management Service — Demand-Driven Data Acquisition

1. Executive Summary

Problem Statement: The Atlas pipeline has no geographic scoping mechanism, creating four compounding problems for the data ops team:

  1. Requesting new data is a manual engineering process — adding coverage for a new area requires a code change and a PR, introducing days of lead time for what should be a self-service operation.
  2. No visibility into what data we have — there is no inventory of which geographic areas are currently covered, making it hard to answer “do we have data for X?” without querying raw tables.
  3. Data readiness is opaque — knowing whether a pipeline has finished processing for a given area requires direct Airflow monitoring and knowledge of which DAGs to watch.
  4. OSM updates are all-or-nothing and risky — refreshing obstacle data updates every site simultaneously, which can silently introduce new obstacles into already-validated areas or remove existing ones, with no way to scope the update to only areas that need it.

Proposed Solution: A new Go microservice and map-based admin UI enabling internal operators to define Areas of Interest that constrain both scheduled and on-demand (OSM refresh) pipeline acquisition, with per-pipeline status visibility.

Success Criteria:

  1. All active pipeline DAGs acquire data only within defined AOI geometries
  2. Data refresh for a specific AOI runs in isolation — does not affect other AOI pipeline runs
  3. Zero data loss: AOI deletion is soft-delete only

Scope Clarity: A — requirements clear, stakeholders aligned.


2. User Experience & Functionality

User Personas:

  • Data Ops Analyst — defines AOIs for new operational areas; requests OSM data refreshes
  • Data Engineer — monitors per-pipeline status; investigates failures
  • Atlas Pipeline (Airflow) — machine consumer of the API

User Stories:

  1. As a Data Ops Analyst, I want to draw a polygon on a map or upload a GeoJSON file to define an AOI.

    • Map renders with polygon/rectangle drawing tools
    • File upload accepts .geojson/.json; invalid GeoJSON rejected with a descriptive error before submit
    • Drawn/uploaded geometry rendered on map for confirmation before save
    • AOI created with derived status pending, empty pipeline list
  2. As a Data Ops Analyst, I want to see AOI processing status broken down by individual pipeline.

    • AOI detail view lists all pipelines: name and status (pending | processing | success | failed), plus error message if failed
    • Derived AOI status: all pipelines successready; any failedfailed; any processingprocessing; none started → pending
    • UI refreshes without full page reload (polling or SSE)
  3. As a Data Ops Analyst, I want to request a data refresh for a specific AOI so updated OSM data is loaded only for that area.

    • “Refresh Data” action available on AOIs whose latest run status is ready or failed; returns 409 if latest run is still processing
    • AOI service generates a new run_id UUID, creates an aoi_runs record, then calls Airflow with conf: {aoi_id, run_id}
    • New run starts with all pipelines pending; previous run’s records are retained for history
    • Refresh run does not affect other AOIs’ pipeline runs
  4. As a Data Ops Analyst, I want pipeline processing to start automatically when I create an AOI so I don’t have to wait for the next scheduled run.

    • On POST /areas-of-interest, the AOI service calls Airflow REST API immediately after persisting the record
    • AOI status transitions to processing once Airflow confirms the DAG run was accepted
    • If Airflow trigger fails, AOI remains pending and an error is surfaced in the UI; user can retry via “Refresh Data”
    • Trigger uses the same Airflow REST API call as the refresh flow: POST /api/v1/dags/{osm_dag_id}/dagRuns with conf: {aoi_id, run_id}
  5. As a Data Ops Analyst, I want to update an AOI’s name or geometry.

    • Name update does not affect pipeline status
    • Geometry update resets all pipeline records to pending (fresh run on next schedule)
  6. As a Data Ops Analyst, I want to soft-delete an AOI.

    • Sets deleted_at; record not removed from database
    • Deleted AOIs excluded from pipeline queries
    • Visible in UI with “deleted” indicator
  7. As an Airflow DAG, I want to query all active AOIs on schedule.

    • GET /areas-of-interest returns non-deleted AOIs with GeoJSON geometry
    • Response ≤ 200ms for up to 500 AOIs
  8. As an Airflow DAG, I want to report per-pipeline status back to the AOI service so operators can track readiness.

    • PATCH /areas-of-interest/{id}/runs/{run_id}/pipelines/{pipeline_name} accepts {status: "processing"|"success"|"failed", error?: string}
    • run_id is the UUID generated by the AOI service at trigger time and passed to Airflow via conf
    • Endpoint is idempotent — re-sending the same status is a no-op
    • AOI service creates the pipeline record on first callback if it does not yet exist
    • Valid transitions: pending → processing → success, pending → processing → failed; invalid transitions return 409 Conflict
    • Readiness for a run available at GET /areas-of-interest/{id}/runs/{run_id}/readiness{ready: bool, details: {pipeline_name: "success"|"failed"|"pending"|"processing"}}

Non-Goals:

  • No overlap/conflict resolution between AOIs — pipeline handles union
  • No customer-facing access
  • No RBAC beyond Okta authentication (v1)
  • No geometry history / versioning (v1)
  • Gold table writers out of scope — obstacle/OSM pipelines only

Example UI

alt text

3. Regulatory & Compliance

No direct FAA or DO-178C applicability — internal data management tool, not flight-critical.

  • Soft-delete preserves audit trail for downstream regulated systems
  • Okta SSO required; no anonymous access
  • OSM data is public; no ITAR/PII concerns
  • All data stored in existing GCP project; no new data residency concerns

4. Technical Specifications

Architecture:

    flowchart LR
    UI["Admin UI\n(React)"]
    Okta["Okta OIDC"]
    SVC["AOI Service\n(Go)"]
    DB["PostgreSQL\n+ PostGIS"]
    AF["Airflow\n(Atlas Pipeline)"]

    UI -->|"OIDC auth"| Okta
    UI -->|"CRUD + refresh"| SVC
    SVC -->|"persist AOIs + runs"| DB
    SVC -->|"trigger DAG run\nconf: {aoi_id, run_id}"| AF
    AF -->|"query active AOIs"| SVC
    AF -->|"PATCH runs/{run_id}/pipelines/{name}"| SVC
  

Trigger & callback flow:

    sequenceDiagram
    actor Operator
    participant UI as Admin UI
    participant SVC as AOI Service
    participant DB as PostgreSQL
    participant AF as Airflow

    Operator->>UI: Create AOI / Request Refresh
    UI->>SVC: POST /areas-of-interest
    SVC->>DB: persist AOI + generate run_id + create aoi_runs record
    SVC->>AF: POST /api/v1/dags/{dag}/dagRuns\nconf: {aoi_id, run_id}
    AF-->>SVC: 200 OK
    SVC-->>UI: 201 Created {aoi, run_id}

    loop per pipeline
        AF->>SVC: PATCH /{id}/runs/{run_id}/pipelines/{name}\n{status: processing|success|failed}
        SVC->>DB: upsert aoi_run_pipelines
        SVC-->>AF: 200 OK
    end

    Operator->>UI: Poll readiness
    UI->>SVC: GET /{id}/runs/{run_id}/readiness
    SVC->>DB: aggregate pipeline statuses
    SVC-->>UI: {ready: true, details: {...}}
  

Affected Repos:

RepoLanguageChange
droneup/dataanalytics-atlas-pipelinePython/AirflowAdd AOI API queries to download tasks + per-pipeline status callbacks
droneup/dataanalytics-atlas-aoi-service (new)GoNew microservice
droneup/dataanalytics-atlas-aoi-ui (new)React/TypeScriptNew admin UI

Data Model:

CREATE TABLE areas_of_interest (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name        TEXT NOT NULL,
  geometry    GEOMETRY(GEOMETRY, 4326) NOT NULL,  -- PostGIS WGS84
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  deleted_at  TIMESTAMPTZ
);
CREATE INDEX ON areas_of_interest USING GIST (geometry);
CREATE INDEX ON areas_of_interest (deleted_at) WHERE deleted_at IS NULL;

CREATE TABLE aoi_runs (
  id           UUID PRIMARY KEY,               -- generated by AOI service, passed to Airflow as run_id
  aoi_id       UUID NOT NULL REFERENCES areas_of_interest(id),
  triggered_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ON aoi_runs (aoi_id, triggered_at DESC);

CREATE TABLE aoi_run_pipelines (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  run_id        UUID NOT NULL REFERENCES aoi_runs(id),
  pipeline_name TEXT NOT NULL,
  status        TEXT NOT NULL DEFAULT 'pending'
                  CHECK (status IN ('pending', 'processing', 'success', 'failed')),
  error         TEXT,
  updated_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE (run_id, pipeline_name)
);

Derived statuses are computed in the service layer (not stored):

  • Run readiness: all pipelines successready: true; otherwise ready: false
  • AOI status (shown on list/detail): derived from the latest aoi_runs record for that AOI

API:

POST   /areas-of-interest                                          → 201  includes run_id  [Okta]
GET    /areas-of-interest                                          → 200  [Okta + service account]
GET    /areas-of-interest/{id}                                     → 200  includes latest run_id + derived status  [Okta]
PATCH  /areas-of-interest/{id}                                     → 200  [Okta]
DELETE /areas-of-interest/{id}                                     → 204  soft delete  [Okta]
POST   /areas-of-interest/{id}/refresh                             → 202  includes new run_id  [Okta]
PATCH  /areas-of-interest/{id}/runs/{run_id}/pipelines/{name}      → 200  idempotent  [service account]
GET    /areas-of-interest/{id}/runs/{run_id}/readiness             → 200  {ready, details}  [Okta + service account]

Trigger flow (shared by create and refresh):

  1. Generate run_id UUID; persist aoi_runs record
  2. Call Airflow REST API: POST /api/v1/dags/{osm_dag_id}/dagRuns with conf: {aoi_id, run_id}
  3. On Airflow failure: record is retained with no pipeline entries; run_id returned so client can poll or retry
  4. Return run_id in response body

Create flow (POST /areas-of-interest):

  1. Validate and persist AOI record
  2. Execute trigger flow above
  3. Return 201 Created with AOI + run_id

Refresh flow (POST /{id}/refresh):

  1. Return 409 if latest run for this AOI is processing
  2. Execute trigger flow above
  3. Return 202 Accepted with new run_id

Auth:

  • Admin UI: Okta OIDC (authorization code flow)
  • Airflow → AOI service callbacks: bearer token stored in GCP Secret Manager, rotated on schedule
  • GeoJSON validated server-side (ST_IsValid) before PostGIS insert; reject with 422 on invalid geometry
  • No PII stored

5. Risks & Phased Rollout

RiskLikelihoodImpactMitigation
Airflow REST API not reachable from AOI service in-clusterMediumHighVerify network policy with platform team before design finalization
Geometry update mid-pipeline run resets in-flight recordsLowMediumIn-flight Airflow runs use a snapshot of AOI; reset only affects next pickup

Phased Rollout:

  • MVP: CRUD API + pipeline status callbacks + pipeline integration in dataanalytics-atlas-pipeline. Admin UI — map draw, GeoJSON upload, status display, refresh button. Okta SSO.
  • v2.0: RBAC, per-source resolution tiers, AOI templates.

Dependencies:

  • Okta application registration for dataanalytics-atlas-aoi-ui
  • Airflow REST API reachable from K8s cluster (verify with platform team)
  • New repo provisioning: dataanalytics-atlas-aoi-service, dataanalytics-atlas-aoi-ui

6. Estimation Input

prd_sizing_input:
  feature: "AOI Management Service — demand-driven data acquisition"
  scope_clarity: "A"
  key_terms:
    - "area of interest"
    - "geojson"
    - "osm obstacle pipeline"
    - "pipeline status"
    - "data refresh"
  risk_flags:
    - "new-service"
    - "geospatial"
    - "okta-integration"
    - "airflow-integration"
  affected_repos:
    - "dataanalytics-atlas-pipeline"
    - "dataanalytics-atlas-aoi-service"
    - "dataanalytics-atlas-aoi-ui"
  domains:
    - "backend-go"
    - "frontend-react"
    - "data-pipeline-python"
    - "infrastructure-gcp"
  regulatory: false
  discovery_needed: false
Last updated on