Cluster Networking WITH Apigee
Originally
ADR-0010-CLUSTER-NETWORKING-WITH-APIGEE (v6) · Source on Confluence ↗Networking setup in GKE cluster with routes from Apigee/Cato and public
Context
As Apigee has been selected as API Gateway solution for internal and external REST/gRPC API communications, it requires evaluating new approach for networking configuration across GKE clusters.
Several use cases have been outlined for exploring networking solutions to make sure that existing capabilities provided by infrastructure will be covered:
- external users/services (REST/gRPC)
- cluster to cluster (REST/gRPC)
- internal (cato) users (REST/gRPC/FE)
- external users (mTLS)
- external users (FE)
- service to service (REST/gRPC)
Out of scope of this ADR:
- mTLS termination on Gateway level (it requires deeper research, currently mTLS is terminated and processed on service level, traffic is passed through)
- TLS on server-side (secure TLS connection will be provided to user with client-facing domains with proxying through Cloudflare, but end-to-end encryption is left out of scope for this ADR)
- API specs deployment automation on Apigee (CircleCI, apigeecli)
Decision drivers
- Scalability
- Maintainability
- Automation
- Security
- GCP-native
- Cost
Decision - Apigee + GKE API Gateway

To fill all the required use cases hybrid approach is combined to take the best from different options.
Below you can see a break down and short summary about each use case with small implementation detail:
External users (REST/gRPC):
- Few central not auto-generated/auto-assigned domains (for example, api.droneup.com/api-dev.droneup.com) will be pointed to central External Application Load Balancer (L7), that contains rule to allow public traffic on public client-facing domains and path the requests to Apigee;
- Apigee validates requests and routes the to necessary GKE cluster where application is hosted through Private Service Connect;
- Private Service Connect is pointed to Internal Application Load Balancer created by GKE Gateway resource in GKE cluster;
- GKE Gateway routes traffic inside of cluster based on HTTPRoutes created during application deployment.
Cluster to cluster (REST/gRPC):
- Few central not auto-generated/auto-assigned domains (for example, api-internal.droneup.com/api-internal-dev.droneup.com) will be pointed to central External Application Load Balancer (L7), that contains rule to allow private traffic on private domains and path the requests to Apigee;
- All next steps are the same as for external users (Use case 1)
Internal users (REST/gRPC/FE):
- Internal Application Load Balancer (L7) created by GKE Gateway will be accessible from Cato users via private network.
- GKE Gateway routes traffic inside of cluster based on HTTPRoutes created during application deployment.
External users (mTLS). This path is currently an edge case and open to be changed in future.
- External Load Balancer (L4) created by GKE Gateway will be accessible from public internet without certificate validation, because TLS termination will happen on service level.
- GKE Gateway routes traffic inside of cluster based on HTTPRoutes created during application deployment.
External users (FE):
- External Application Load Balancer (L7) created by GKE Gateway will be accessible from with whitelisting from CloudFlare.
- GKE Gateway routes traffic inside of cluster based on HTTPRoutes created during application deployment.
Service to service (REST/gRPC):
- The traffic from service to service inside of cluster is routed by kubernetes by default. There is no need to take additional actions or setup alternative approach for that.
Consequences
The main attention was focused on taking most of the heavy lifting on initial setup of all the integrations between Apigee, Clusters and Network and leave small pieces/steps to be followed during service/application setup.
So the process to make initial setup will involve heavy changes across different infrastructure components.
Changes in automations to be implemented on various levels:
- droneup-shared-infrastructure (central initial setup of purchased Apigee organization)
- droneup-shared-infrastructure (central networking)
- league creation module (base league infra to add K8S Gateway setup and connection to Apigee)
- K8S module (supportive K8S gateway changes)
- shared helm chart (adding HTTPRoute)
- CICD platform kit (any application dependencies)
A need to serve multple use cases will create a challenge for building automated testing around deploying and validating each use case.
Cost
Apigee pricing plan will require purchasing Apigee subscription for 3 years. Currently selected plan is to have 1 Apigee organization with 3 environments (supports deploying 3 instances for some traffic separation, can be helpful for dev and prod or internal and external separation), but current selection of 3 environments will not allow to do both.
Alternatives Considered
Kong ingress controller
Pros:
- Provides capabilities for customizing the authentication flow with use of custom and built-in plugins
- Already implemented and actively used in devevelopment and production environments
- Supports FE/REST/gRPC in current flow
- HTTP authentication headers parsing/validation
- Allows control over setup on application level without a need to change anything outside of application repository
- The cost of running this solution includes only incoming traffic as Kong deployment can provide high performance without creating big overhead and load on the cluster
- Supports REST/gRPC/FE
Cons:
- Requires maintaining and updating Kong/Nginx deployment as a part of GKE workload
- Not GCP-native solution
GCP-native GKE Ingress resource
Pros:
- GCP-native
- Supports using different Load Balancer resources based on a need (L7 or L4, external or Internal)
- Supports all required use cases
Cons:
- Adding new application on existing Ingress will require changing and maintaining long central configuration file that contains setup for multiple endpoints
- Maintaining ingress per service will require constant networking updates
GKE Gateway resource
Pros:
- GCP-native
- Supports using different Load Balancer resources based on a need (L7 or L4, external or Internal)
- Allows controlling individual endpoint configuration on service level, that adds route under GKE Gateway
- Supports attaching HTTP and Security policies for adding compatibility with Cloud Armor, Cloud IAP
- Simplicity
- Supports all required use cases
Cons:
- No authentication/API request validation, only routing
- Each Private Service Connect to Apigee will require a separate IP range (
/29) to be allocated and assigned to subnet
Multi-cluster service setup (Anthos based)
Pros:
- Provides an ability to create cross-cluster resources in GKE-native way
- Service-mesh support
Cons:
- Multi-cluster gateway resource isn’t supported in GA yet which isn’t recommended to be used in production workload. Alternative option would be to use Ingress or Srevice, but they will require more networking/firewall configurations to fit the self-service model
- Big effort to perform initial maintenance to re-configure all clusters under one fleet (per env - dev and prod)
- Initial setup requires less secure networking configuration - “any” to “any” between clusters
- Converting each league cluster into Anthos compatible will add a cost per cluster/vCPU
Apigee
Pros:
- GCP-native
- HTTP header parsing
- Monetization
- External/Internal API management portal
- Traffic analysis
- Supports REST/gRPC
Cons:
- It has good support of manual maintenance of Apigee organization through UI while automation setup can bring some challenges on conneting and outlining all dependencies and requirements
- Doesn’t provide a way for routing traffic right into GKE cluster
- Doesn’t support connecting to more than 2 shared VPCs
- Connection to GKE Gateway requires setting up Private Service Connect for each cluster that will neccesiate accurate dependency ordering on league project re-configuration and networking setup
- Creates an overhead on initial setup, where multiple moving parts can be misconfigured and will require deeper troubleshooting and networking knowledge
- Cost
Links
- Miro board
- [Apigee Re-search](confluence-title://PE/API Gateway evaluation)