Skip to content
Multi-Tenant Deployment & Release Strategy

Multi-Tenant Deployment & Release Strategy

Andi Lamprecht Andi Lamprecht ·· 11 min read· Draft
Traceability Links
Jama Requirements
Jira TasksCORE-2084

Context

The requirement to onboard multiple Organizations and Tenants and provide an automated standard way for releasing changes continuously creates a need to establish safe and reliable Release process to keep us and our clients compliant with all required regulations and our own internal process.

Decision Drivers

  • Ability to compose Production Global Release which will control all components of the system and include Software Configuration Index (SCI) with detailed information about upcoming changes and V&V evidences.
  • Maintain multiple Release Candidates in Global Production Release in parallel so each agent can control when they switch versions.
  • Organize multi-tenant infrastructure where different Tenant-Agents will be working on a shared platform without impacting each other or noticing each other’s presence.
  • Define the shared services that run as a single replica and provide platform capabilities for multiple agents per each organization.
  • Add ability for gating upgrade of Production Organization environment by requiring CCB approval.

Key Entities in Deployment & Release Process

  • Release Blueprint — lists components (repository URLs) which will be included into one Release Candidate.
  • Release Candidate — lists all components from Release Blueprint and corresponding versions, also includes a Version of this software bundle that Agents will be able to select on demand without redeploy.
  • Global Production Release — instructs how a single self-sufficient and independent Environment looks like.
  • Environment — selects Global Production Release and any custom settings required for individual deployment.

Complete System Deployment

Overview

The deployment of the system will be managed using three separate repositories. The 1st and 2nd repository will contain a Helm chart that encapsulates all necessary dependencies and services, while the last repository will be responsible for selecting and deploying specific versions of the Helm charts to various environments.

    graph LR
    subgraph Repos["Uncrew Versioned Helm Chart AND Uncrew Shared Helm Chart"]
        direction LR
        subgraph Entities_Row[" "]
            direction LR
            BP["<b>Release Blueprint</b>
            • name
            • components
              ◦ name
              ◦ repo_url"]
            RC["<b>Release Candidate</b>
            • name
            • version
            • components
              ◦ name
              ◦ repo_url
              ◦ version
            • compatible onboard versions"]
            BP --> RC
        end
        subgraph Examples_Row[" "]
            direction LR
            BPex["<b>uncrew-versioned:</b>
            • apollo-frontend
            • avatar-controller
            • mission-service
            • pathfinding
            • compatible onboard version"]
            RCex["<b>uncrew-versioned - v2.2.2:</b>
            • apollo-frontend v2.2.3
            • avatar-controller v1.0.3
            • mission-service v2.2.4
            • pathfinding v4.2.4
            • compatible onboard versions:
              ◦ v1.1.1
              ◦ v1.3.3
              ◦ v1.5.5"]
            BPex --> RCex
        end
    end

    subgraph GKE["GKE Deployments"]
        direction LR
        GPR["<b>Global Production Release</b>
        • release candidates for each
          product and each version
          to be supported
            ◦ name/id
            ◦ version"]
        GPRex["<b>Global Production Release - prod:</b>
        • uncrew-versioned v2.2.2
        • uncrew-versioned v2.1.1
        • uncrew-versioned v1.5.5
        • uncrew-shared v1.4.5"]
    end

    RC --> GPR
    RCex --> GPRex
  

Repository Structure & Deployment Process

1. Uncrew Versioned Helm Chart Repository

This repository will include the Helm chart that defines the services and their dependencies for a multi-version approach.

  • Each service will be specified with its corresponding version tag.
  • The Helm chart itself will be tagged to facilitate building a complete release of the system.
  • Successful new version creation in this Helm chart repo can trigger an automated PR opening in gke-deployments repo to add new version in stage/prod config file and attach “Deployment notes” in the PR.
    graph LR
    subgraph PR["PR in bundle repo"]
        direction LR
        V222["<b>uncrew-versioned - v2.2.2:</b>
        • apollo-frontend v2.2.3
        • avatar-controller v1.0.3
        • mission-service v2.2.4
        • pathfinding v4.2.4
        • compatible onboard versions:
          ◦ v1.1.1
          ◦ v1.3.3
          ◦ v1.5.5"]

        V230["<b>uncrew-versioned - v2.3.0:</b>
        • apollo-frontend v2.2.5
        • avatar-controller v1.1.0
        • mission-service v2.3.0
        • pathfinding v4.2.4
        • compatible onboard versions:
          ◦ v1.1.1
          ◦ v1.3.3
          ◦ v1.5.5
          ◦ v1.5.8"]

        V222 --> V230
    end
  

Key Components:

  • Chart.yaml: Contains metadata about the Helm chart.
  • values.yaml: Defines the default configuration values for the chart.
  • templates/: Directory containing the Kubernetes resource templates.
  • dependencies.yaml: Lists all dependencies with their specific versions.

2. Uncrew Shared Helm Chart Repository

  • This Helm chart will include only shared components across tenants/versions, that don’t need to be replicated X times and don’t support switching.
  • Repo structure is the same as in Uncrew Versioned Helm Chart Repository.
  • Successful new version creation in this Helm chart repo can trigger an automated PR opening in gke-deployments repo to replace shared chart version in stage/prod config file and attach “Deployment notes” in the PR.

3. GKE Deployments Repository

This repository will manage the deployment configurations for different environments.

  • It will specify which version of the Helm chart to deploy to each environment (e.g., production, staging).
  • Each configuration file will contain a list of Uncrew Versioned Helm Chart versions (v2.2.1, v2.2.2, v3.3.3) and a single Uncrew Shared Helm Chart version (v1.1.1).
  • Using the Helm CLI, the specified versions of the Helm charts will be deployed to the corresponding environments.
  • Automation can look through expected versions and reference “Deployment notes” from Helm repos to show.
  • Cleanup: when a version is planned to be deprecated, the corresponding config file in gke-deployments should be updated to remove the line with the deprecated version. With an opened PR and approval it will be uninstalled and namespace will be deleted from GKE cluster; version becomes unavailable on UI.
    graph LR
    subgraph PR["PR in deployment repo"]
        direction LR
        Before["<b>Global Production Release
        (production env):</b>
        • uncrew-versioned v1.1.0
        • uncrew-versioned v2.1.0
        • uncrew-shared v1.4.5"]

        After["<b>Global Production Release
        (production env):</b>
        • uncrew-versioned v1.1.0
        • uncrew-versioned v2.1.0
        • uncrew-versioned v2.2.2
        • uncrew-shared v1.4.6"]

        Before --> After
    end
  

Key Components:

  • prod-deployment.yaml: Configuration file for deploying the Helm chart version to the production environment.
  • stage-deployment.yaml: Configuration file for deploying the Helm chart version to the staging environment.
  • etc.

Testing Approaches

  1. Stable versions. All environments should contain the same set of Production supported Helm Releases to provide effective testing capabilities and allow smooth troubleshooting of any production issues.

  2. Latest trunk from all repos. Stable trunk version is deployed and accessible on Dev/Sandbox/Staging in parallel with Production stable versions.

  3. Draft Release version. When Release Draft is created, all latest trunk versions are combined in an early version of Helm Chart and deployed to Dev/Sandbox/Staging for Automated System Testing and any required manual Product Testing.

  4. Feature Development.

    • One Unstable Sandbox version. Code changes from all short-lived branches are deployed into one Sandbox Unstable version to allow combined testing between different repos.
    • New temp version. Every branch in every repo gets its own version on Sandbox in parallel with all others mentioned above. The branch will deploy a changed version of one service plus all other services from trunk. It will allow testing branch changes against all latest stable code and prevent issues when features in development are conflicting or branch changes introduce a bug that impacts all other components. Once PR is reviewed, approved and merged, the temporary version is deleted from the environment. Overall costly option, but can be configured with some custom tag in commit message to ignore most commits.
    graph TB
    subgraph ApolloRepo["Apollo Frontend Repo"]
        A_Main["Main Branch"]
        A_Feature["New feature Branch"]
        A_Main --> A_Feature
        A_Feature --> A_PR["PR is opened"]
        A_PR --> A_Main
    end

    subgraph MissionRepo["Mission Repo"]
        M_Main["Main Branch"]
        M_Feature["New feature Branch"]
        M_Main --> M_Feature
        M_Feature --> M_PR["PR is opened"]
        M_PR --> M_Main
    end

    subgraph Sandbox["Deployment Options"]
        Unstable["<b>Unstable sandbox:</b>
        • mission-service
        • apollo-frontend
        • avatar-controller
        • pathfinding"]

        OR_label["OR"]

        TmpVer["<b>New tmp version:</b>
        • mission-service (trunk)
        • apollo-frontend
        • avatar-controller (trunk)
        • pathfinding (trunk)"]

        AND_label["AND"]

        TmpVer2["<b>New tmp version:</b>
        • mission-service
        • apollo-frontend (trunk)
        • avatar-controller (trunk)
        • pathfinding (trunk)"]
    end

    A_PR -.-> Unstable
    A_PR -.-> TmpVer
    M_PR -.-> Unstable
    M_PR -.-> TmpVer2
  
  1. Security patches in current and older versions. For security patches, custom branch naming can be selected specifying Helm Chart version and current Service version. When fix is developed and PR is opened to the release branch, a new temporary Version is deployed into GKE for System Testing to make sure no functionality is impacted by the security patch. When Testing is finished, PR is reviewed and merged, new tag is created, temporary branches are deleted.
    graph TB
    subgraph Repo["Apollo Frontend Repo"]
        NewTag["New tag"]
        RelBranch["Release Branch"]
        FixBranch["New fix branch"]
        PR["PR is opened"]

        NewTag --> RelBranch
        RelBranch --> FixBranch
        FixBranch --> PR
        PR --> RelBranch
    end

    TmpVer["<b>New tmp version:</b>
    • apollo-frontend (PR)
    • avatar-controller (release)
    • mission-service (release)
    • pathfinding (release)"]

    Label["Hotfix, security patch
    development and testing"]

    PR -.-> TmpVer
    Label ~~~ TmpVer
  
  1. Custom automation. Separate automation process outside of repo pipeline is created to allow custom selection of versions for different components. For example, most repos to use trunk and a few taken from feature branches.
    graph RL
    Automation["Automation"]
    CustomVer["<b>Custom manual version:</b>
    • mission-service branch-ABC
    • apollo-frontend branch-XYZ
    • avatar-controller v2.2.4
    • pathfinding v4.2.4"]

    Automation --> CustomVer
  

Formal Impact

This structured approach using three repositories allows clear separation of concerns and enables efficient management of dependencies and deployment versions across different environments. Each repository serves a distinct purpose, ensuring that the system can be deployed reliably and consistently. All application and infrastructure components will require review and integrating under Versioned or Shared Helm Charts with ad-hoc installation and shared infra components across multiple versions.

More steps in maintaining old versions will require some effort to prepare automation and onboard training into the new process. Automation is required to offload some manual burden, like preparing sets of branches for security fixes for older versions and cleanup once changes are merged and tagged.

Staged Approach

Iteration 1: All services deploy up-to-prod same way as now, individual services added in bundle. Once added, stable deployment moves to bundle pipeline.

Iteration 2: Services grouped in bundles, bundle is deployed on existing fixed environments: staging, production. Some services can stay unbundled as exceptional cases.

Iteration 3: All bundles grouped in deployment repo to allow ephemeral spin up, based on a compatible combo of bundles.

    graph LR
    subgraph Iter1["Iteration 1"]
        direction LR
        I1_A["Service A"] --> I1_Stg["Staging"]
        I1_A --> I1_Prod["Production"]
        I1_B["Service B"] --> I1_Stg
        I1_B --> I1_Prod
        I1_B --> I1_Bundle["Bundle repo"]
        I1_Bundle --> I1_Stg
        I1_Bundle --> I1_Prod
    end
  
    graph LR
    subgraph Iter2["Iteration 2"]
        direction LR
        I2_A["Service A"] --> I2_Bundle["Bundle repo"]
        I2_B["Service B"] --> I2_Bundle
        I2_Bundle --> I2_Stg["Staging"]
        I2_Bundle --> I2_Prod["Production"]
    end
  
    graph LR
    subgraph Iter3["Iteration 3"]
        direction LR
        I3_A["Service A"] --> I3_Bundle["Bundle repo"]
        I3_B["Service B"] --> I3_Bundle
        I3_Bundle --> I3_Deploy["Deployment repo
        Bundle v1
        Bundle v2"]
        I3_Deploy --> I3_Stg["Staging"]
        I3_Deploy --> I3_Prod["Production"]
    end
  

Alternatives Considered

GitFlow Branching and Release Strategy

Pros that GitFlow can provide:

  • Easily seeing what code is in Production for troubleshooting. Refutation: with maintaining multiple production-facing versions of software, there is a set of versions in Production at the same time. Knowing the code of the last one will not provide the whole picture. But if we install all Production versions in dev environment as well, it will allow real-time troubleshooting on the same code base.
  • Avoid code freezes. When a new version is prepared for releasing from develop and if some new early features aren’t expected in it, it will be a similar code freeze to prevent unexpected code changes in develop, or branching out from an earlier commit in develop to make a release branch. Similar strategy can be applied to trunk-based: if trunk contains more than expected for release, then a new branch can be created from an earlier commit/tag and cherry-pick necessary changes from trunk. Then a new tag is created which will be included in the Release.

Cons:

  • Maintaining multiple branches can introduce more places for human error.
  • Same changes to be merged into multiple branches: develop, release, main.

Long-Lived Branches vs Tag-Based Maintenance with Short-Lived Branches

AspectLong-Lived BranchesShort-Lived Branches
Branch clutter❌ Multiple permanent branches✅ Only trunk + short-lived
Patching process✅ Simple (just push) in long-lived branch⚠️ More steps: Tag → Temp Branch → Fix → Tag → Delete
Mental model✅ Branches = versions✅ Tags = versions
Automation needed⚠️ Some✅ More critical
Git complexity⚠️ Merges between branches⚠️ Orphaned commits
Team onboarding❌ Requires training on branch management⚠️ New aspect of working with tags, but short-lived branches are same as now
Scheduled flow (like security scanning)✅ GHA automation can look into set of stable release branches, do scanning and prepare PRs automatically❌ Separate process needed for scanning versions in prod

Appendix: Key Terms

Mostly taken from the Organizations & Tenancy concept.

TermDefinition
OperatorOperators hold certificates. For example, DroneUp has a Part 135 Certification that affords certain privileges. They have control about which versions of software are made available to tenants in Global Production Releases. They have operational control over their child-agents.
AgentAgents leverage an Operator’s certification (e.g. Part 135 Certification) for their operations. Agents must adhere to the GOM, GMM, and SMS of the Operator to be compliant. Both Agents and Operators are Tenants, but with different privileges.
TenancyA Tenant is a group of users who share a common access with specific privileges to the software instance.
Multi-TenancyMulti-tenancy is an architecture wherein a single occurrence of a software application serves numerous clients. It is the illusion of a standalone application.
InstanceSeparate and independent copies of a software application or service that run on the same physical or virtual infrastructure (also Environment).
Multi-InstanceThe same source code deployed into separate environments. These environments are isolated such that the services contained within are unable to communicate across environments.
Release BlueprintSpecifies repositories and components that constitute a manageable product line, includes name of components and repository URL.
Release CandidateSelection of specific versions for each component defined in a Release Blueprint.
Global Production ReleaseA selection of specific, versioned releases from multiple, independent software and infrastructure bundles.

Links

Last updated on