Multi-Tenant Deployment & Release Strategy
| Traceability Links | |
|---|---|
| Jama Requirements | |
| Jira Tasks | CORE-2084 |
Context
The requirement to onboard multiple Organizations and Tenants and provide an automated standard way for releasing changes continuously creates a need to establish safe and reliable Release process to keep us and our clients compliant with all required regulations and our own internal process.
Decision Drivers
- Ability to compose Production Global Release which will control all components of the system and include Software Configuration Index (SCI) with detailed information about upcoming changes and V&V evidences.
- Maintain multiple Release Candidates in Global Production Release in parallel so each agent can control when they switch versions.
- Organize multi-tenant infrastructure where different Tenant-Agents will be working on a shared platform without impacting each other or noticing each other’s presence.
- Define the shared services that run as a single replica and provide platform capabilities for multiple agents per each organization.
- Add ability for gating upgrade of Production Organization environment by requiring CCB approval.
Key Entities in Deployment & Release Process
- Release Blueprint — lists components (repository URLs) which will be included into one Release Candidate.
- Release Candidate — lists all components from Release Blueprint and corresponding versions, also includes a Version of this software bundle that Agents will be able to select on demand without redeploy.
- Global Production Release — instructs how a single self-sufficient and independent Environment looks like.
- Environment — selects Global Production Release and any custom settings required for individual deployment.
Complete System Deployment
Overview
The deployment of the system will be managed using three separate repositories. The 1st and 2nd repository will contain a Helm chart that encapsulates all necessary dependencies and services, while the last repository will be responsible for selecting and deploying specific versions of the Helm charts to various environments.
graph LR
subgraph Repos["Uncrew Versioned Helm Chart AND Uncrew Shared Helm Chart"]
direction LR
subgraph Entities_Row[" "]
direction LR
BP["<b>Release Blueprint</b>
• name
• components
◦ name
◦ repo_url"]
RC["<b>Release Candidate</b>
• name
• version
• components
◦ name
◦ repo_url
◦ version
• compatible onboard versions"]
BP --> RC
end
subgraph Examples_Row[" "]
direction LR
BPex["<b>uncrew-versioned:</b>
• apollo-frontend
• avatar-controller
• mission-service
• pathfinding
• compatible onboard version"]
RCex["<b>uncrew-versioned - v2.2.2:</b>
• apollo-frontend v2.2.3
• avatar-controller v1.0.3
• mission-service v2.2.4
• pathfinding v4.2.4
• compatible onboard versions:
◦ v1.1.1
◦ v1.3.3
◦ v1.5.5"]
BPex --> RCex
end
end
subgraph GKE["GKE Deployments"]
direction LR
GPR["<b>Global Production Release</b>
• release candidates for each
product and each version
to be supported
◦ name/id
◦ version"]
GPRex["<b>Global Production Release - prod:</b>
• uncrew-versioned v2.2.2
• uncrew-versioned v2.1.1
• uncrew-versioned v1.5.5
• uncrew-shared v1.4.5"]
end
RC --> GPR
RCex --> GPRex
Repository Structure & Deployment Process
1. Uncrew Versioned Helm Chart Repository
This repository will include the Helm chart that defines the services and their dependencies for a multi-version approach.
- Each service will be specified with its corresponding version tag.
- The Helm chart itself will be tagged to facilitate building a complete release of the system.
- Successful new version creation in this Helm chart repo can trigger an automated PR opening in gke-deployments repo to add new version in stage/prod config file and attach “Deployment notes” in the PR.
graph LR
subgraph PR["PR in bundle repo"]
direction LR
V222["<b>uncrew-versioned - v2.2.2:</b>
• apollo-frontend v2.2.3
• avatar-controller v1.0.3
• mission-service v2.2.4
• pathfinding v4.2.4
• compatible onboard versions:
◦ v1.1.1
◦ v1.3.3
◦ v1.5.5"]
V230["<b>uncrew-versioned - v2.3.0:</b>
• apollo-frontend v2.2.5
• avatar-controller v1.1.0
• mission-service v2.3.0
• pathfinding v4.2.4
• compatible onboard versions:
◦ v1.1.1
◦ v1.3.3
◦ v1.5.5
◦ v1.5.8"]
V222 --> V230
end
Key Components:
Chart.yaml: Contains metadata about the Helm chart.values.yaml: Defines the default configuration values for the chart.templates/: Directory containing the Kubernetes resource templates.dependencies.yaml: Lists all dependencies with their specific versions.
2. Uncrew Shared Helm Chart Repository
- This Helm chart will include only shared components across tenants/versions, that don’t need to be replicated X times and don’t support switching.
- Repo structure is the same as in Uncrew Versioned Helm Chart Repository.
- Successful new version creation in this Helm chart repo can trigger an automated PR opening in gke-deployments repo to replace shared chart version in stage/prod config file and attach “Deployment notes” in the PR.
3. GKE Deployments Repository
This repository will manage the deployment configurations for different environments.
- It will specify which version of the Helm chart to deploy to each environment (e.g., production, staging).
- Each configuration file will contain a list of Uncrew Versioned Helm Chart versions (
v2.2.1,v2.2.2,v3.3.3) and a single Uncrew Shared Helm Chart version (v1.1.1). - Using the Helm CLI, the specified versions of the Helm charts will be deployed to the corresponding environments.
- Automation can look through expected versions and reference “Deployment notes” from Helm repos to show.
- Cleanup: when a version is planned to be deprecated, the corresponding config file in gke-deployments should be updated to remove the line with the deprecated version. With an opened PR and approval it will be uninstalled and namespace will be deleted from GKE cluster; version becomes unavailable on UI.
graph LR
subgraph PR["PR in deployment repo"]
direction LR
Before["<b>Global Production Release
(production env):</b>
• uncrew-versioned v1.1.0
• uncrew-versioned v2.1.0
• uncrew-shared v1.4.5"]
After["<b>Global Production Release
(production env):</b>
• uncrew-versioned v1.1.0
• uncrew-versioned v2.1.0
• uncrew-versioned v2.2.2
• uncrew-shared v1.4.6"]
Before --> After
end
Key Components:
prod-deployment.yaml: Configuration file for deploying the Helm chart version to the production environment.stage-deployment.yaml: Configuration file for deploying the Helm chart version to the staging environment.- etc.
Testing Approaches
Stable versions. All environments should contain the same set of Production supported Helm Releases to provide effective testing capabilities and allow smooth troubleshooting of any production issues.
Latest trunk from all repos. Stable trunk version is deployed and accessible on Dev/Sandbox/Staging in parallel with Production stable versions.
Draft Release version. When Release Draft is created, all latest trunk versions are combined in an early version of Helm Chart and deployed to Dev/Sandbox/Staging for Automated System Testing and any required manual Product Testing.
Feature Development.
- One Unstable Sandbox version. Code changes from all short-lived branches are deployed into one Sandbox Unstable version to allow combined testing between different repos.
- New temp version. Every branch in every repo gets its own version on Sandbox in parallel with all others mentioned above. The branch will deploy a changed version of one service plus all other services from trunk. It will allow testing branch changes against all latest stable code and prevent issues when features in development are conflicting or branch changes introduce a bug that impacts all other components. Once PR is reviewed, approved and merged, the temporary version is deleted from the environment. Overall costly option, but can be configured with some custom tag in commit message to ignore most commits.
graph TB
subgraph ApolloRepo["Apollo Frontend Repo"]
A_Main["Main Branch"]
A_Feature["New feature Branch"]
A_Main --> A_Feature
A_Feature --> A_PR["PR is opened"]
A_PR --> A_Main
end
subgraph MissionRepo["Mission Repo"]
M_Main["Main Branch"]
M_Feature["New feature Branch"]
M_Main --> M_Feature
M_Feature --> M_PR["PR is opened"]
M_PR --> M_Main
end
subgraph Sandbox["Deployment Options"]
Unstable["<b>Unstable sandbox:</b>
• mission-service
• apollo-frontend
• avatar-controller
• pathfinding"]
OR_label["OR"]
TmpVer["<b>New tmp version:</b>
• mission-service (trunk)
• apollo-frontend
• avatar-controller (trunk)
• pathfinding (trunk)"]
AND_label["AND"]
TmpVer2["<b>New tmp version:</b>
• mission-service
• apollo-frontend (trunk)
• avatar-controller (trunk)
• pathfinding (trunk)"]
end
A_PR -.-> Unstable
A_PR -.-> TmpVer
M_PR -.-> Unstable
M_PR -.-> TmpVer2
- Security patches in current and older versions. For security patches, custom branch naming can be selected specifying Helm Chart version and current Service version. When fix is developed and PR is opened to the release branch, a new temporary Version is deployed into GKE for System Testing to make sure no functionality is impacted by the security patch. When Testing is finished, PR is reviewed and merged, new tag is created, temporary branches are deleted.
graph TB
subgraph Repo["Apollo Frontend Repo"]
NewTag["New tag"]
RelBranch["Release Branch"]
FixBranch["New fix branch"]
PR["PR is opened"]
NewTag --> RelBranch
RelBranch --> FixBranch
FixBranch --> PR
PR --> RelBranch
end
TmpVer["<b>New tmp version:</b>
• apollo-frontend (PR)
• avatar-controller (release)
• mission-service (release)
• pathfinding (release)"]
Label["Hotfix, security patch
development and testing"]
PR -.-> TmpVer
Label ~~~ TmpVer
- Custom automation. Separate automation process outside of repo pipeline is created to allow custom selection of versions for different components. For example, most repos to use trunk and a few taken from feature branches.
graph RL
Automation["Automation"]
CustomVer["<b>Custom manual version:</b>
• mission-service branch-ABC
• apollo-frontend branch-XYZ
• avatar-controller v2.2.4
• pathfinding v4.2.4"]
Automation --> CustomVer
Formal Impact
This structured approach using three repositories allows clear separation of concerns and enables efficient management of dependencies and deployment versions across different environments. Each repository serves a distinct purpose, ensuring that the system can be deployed reliably and consistently. All application and infrastructure components will require review and integrating under Versioned or Shared Helm Charts with ad-hoc installation and shared infra components across multiple versions.
More steps in maintaining old versions will require some effort to prepare automation and onboard training into the new process. Automation is required to offload some manual burden, like preparing sets of branches for security fixes for older versions and cleanup once changes are merged and tagged.
Staged Approach
Iteration 1: All services deploy up-to-prod same way as now, individual services added in bundle. Once added, stable deployment moves to bundle pipeline.
Iteration 2: Services grouped in bundles, bundle is deployed on existing fixed environments: staging, production. Some services can stay unbundled as exceptional cases.
Iteration 3: All bundles grouped in deployment repo to allow ephemeral spin up, based on a compatible combo of bundles.
graph LR
subgraph Iter1["Iteration 1"]
direction LR
I1_A["Service A"] --> I1_Stg["Staging"]
I1_A --> I1_Prod["Production"]
I1_B["Service B"] --> I1_Stg
I1_B --> I1_Prod
I1_B --> I1_Bundle["Bundle repo"]
I1_Bundle --> I1_Stg
I1_Bundle --> I1_Prod
end
graph LR
subgraph Iter2["Iteration 2"]
direction LR
I2_A["Service A"] --> I2_Bundle["Bundle repo"]
I2_B["Service B"] --> I2_Bundle
I2_Bundle --> I2_Stg["Staging"]
I2_Bundle --> I2_Prod["Production"]
end
graph LR
subgraph Iter3["Iteration 3"]
direction LR
I3_A["Service A"] --> I3_Bundle["Bundle repo"]
I3_B["Service B"] --> I3_Bundle
I3_Bundle --> I3_Deploy["Deployment repo
Bundle v1
Bundle v2"]
I3_Deploy --> I3_Stg["Staging"]
I3_Deploy --> I3_Prod["Production"]
end
Alternatives Considered
GitFlow Branching and Release Strategy
Pros that GitFlow can provide:
- Easily seeing what code is in Production for troubleshooting. Refutation: with maintaining multiple production-facing versions of software, there is a set of versions in Production at the same time. Knowing the code of the last one will not provide the whole picture. But if we install all Production versions in dev environment as well, it will allow real-time troubleshooting on the same code base.
- Avoid code freezes. When a new version is prepared for releasing from
developand if some new early features aren’t expected in it, it will be a similar code freeze to prevent unexpected code changes indevelop, or branching out from an earlier commit indevelopto make areleasebranch. Similar strategy can be applied to trunk-based: iftrunkcontains more than expected for release, then a new branch can be created from an earlier commit/tag and cherry-pick necessary changes fromtrunk. Then a new tag is created which will be included in the Release.
Cons:
- Maintaining multiple branches can introduce more places for human error.
- Same changes to be merged into multiple branches:
develop,release,main.
Long-Lived Branches vs Tag-Based Maintenance with Short-Lived Branches
| Aspect | Long-Lived Branches | Short-Lived Branches |
|---|---|---|
| Branch clutter | ❌ Multiple permanent branches | ✅ Only trunk + short-lived |
| Patching process | ✅ Simple (just push) in long-lived branch | ⚠️ More steps: Tag → Temp Branch → Fix → Tag → Delete |
| Mental model | ✅ Branches = versions | ✅ Tags = versions |
| Automation needed | ⚠️ Some | ✅ More critical |
| Git complexity | ⚠️ Merges between branches | ⚠️ Orphaned commits |
| Team onboarding | ❌ Requires training on branch management | ⚠️ New aspect of working with tags, but short-lived branches are same as now |
| Scheduled flow (like security scanning) | ✅ GHA automation can look into set of stable release branches, do scanning and prepare PRs automatically | ❌ Separate process needed for scanning versions in prod |
Appendix: Key Terms
Mostly taken from the Organizations & Tenancy concept.
| Term | Definition |
|---|---|
| Operator | Operators hold certificates. For example, DroneUp has a Part 135 Certification that affords certain privileges. They have control about which versions of software are made available to tenants in Global Production Releases. They have operational control over their child-agents. |
| Agent | Agents leverage an Operator’s certification (e.g. Part 135 Certification) for their operations. Agents must adhere to the GOM, GMM, and SMS of the Operator to be compliant. Both Agents and Operators are Tenants, but with different privileges. |
| Tenancy | A Tenant is a group of users who share a common access with specific privileges to the software instance. |
| Multi-Tenancy | Multi-tenancy is an architecture wherein a single occurrence of a software application serves numerous clients. It is the illusion of a standalone application. |
| Instance | Separate and independent copies of a software application or service that run on the same physical or virtual infrastructure (also Environment). |
| Multi-Instance | The same source code deployed into separate environments. These environments are isolated such that the services contained within are unable to communicate across environments. |
| Release Blueprint | Specifies repositories and components that constitute a manageable product line, includes name of components and repository URL. |
| Release Candidate | Selection of specific versions for each component defined in a Release Blueprint. |
| Global Production Release | A selection of specific, versioned releases from multiple, independent software and infrastructure bundles. |