Autoscale and Add Flex Gateway Replicas to a Namespace

For high availability (HA) environments, horizontal scaling distributes load across multiple Flex Gateway instances. By default, the Helm chart for Flex Gateway is configured with one replica. To release more Flex Gateway replicas into a namespace, provide one of the following configurations:

Set a replica count: Set a fixed number of Flex Gateway replicas to create.
Configure autoscaling: Generate replicas automatically based on CPU, memory, and other settings.

Find the default Helm chart settings for Flex Gateway (flex-gateway) through one of the following methods:

Open the flex-gateway page in ArtifactHUB
Run helm show values <helm-repository-name>/<helm-chart-name> from a terminal window, for example, helm show values flex-gateway/flex-gateway.

Before You Begin

Ensure that the following prerequisites are in place:

A Flex Gateway deployment or a registered Flex Gateway instance (gateway)

For gateway registration and deployment processes, see Setting Up Flex Gateway or Getting Started with Flex Gateway in a Kubernetes Cluster.
A Kubernetes cluster for your Flex Gateway deployment

Set a Replica Count

Configure and verify the number of replicas that you require. Note that new replicas replace existing replicas. If a newer version of Flex Gateway is available, the command replaces your older replica versions with the latest version.

The steps to take depend on whether a Helm chart is installed.

If a Helm chart for your namespace is installed, run the following command:

Syntax:

helm -n <namespace> upgrade \
<release-name> <helm-repository-name>/<helm-chart-name> \
--reuse-values \
--set replicaCount=<number of replicas>

Example:

helm -n gateway upgrade \
ingress flex-gateway/flex-gateway \
--reuse-values \
--set replicaCount=2helm

When successful, the command prints output similar to this example:

Release "ingress" has been upgraded. Happy Helming!
NAME: ingress
LAST DEPLOYED: Mon Apr 17 15:00:09 2023
NAMESPACE: gateway
STATUS: deployed
REVISION: 27
TEST SUITE: None

If the Helm chart is not installed, run the following command:

Syntax:

helm -n <namespace> \
upgrade -i --create-namespace \
<release-name> <helm-repository-name>/<helm-chart-name> \
--set-file registration.content=<registration file> \
--set replicaCount=<number of replicas>

Example:

helm -n gateway \
upgrade -i --create-namespace \
ingress flex-gateway/flex-gateway \
--set-file registration.content=registration.yaml \
--set replicaCount=2helm

When successful, the command prints output similar to this example:

Release "ingress" does not exist. Installing it now.
NAME: ingress
LAST DEPLOYED: Mon Apr 17 15:32:50 2023
NAMESPACE: gateway
STATUS: deployed
REVISION: 1
TEST SUITE: None

Verify creation of the replicas by running the following command:

Syntax:

kubectl get rs -n <namespace>

Example:

kubectl get rs -n gateway

When successful, the command prints output similar to this example:

NAME                  DESIRED   CURRENT   READY   AGE
ingress-5b7474b8f6    2         2         2       70s

Configure Autoscaling

Horizontal Pod autoscaling generates Flex Gateway replicas automatically based on CPU, memory, and other target and behavioral settings. For descriptions of these settings, see Autoscaling Parameters.

Update Your Helm Chart with an Autoscaling Configuration

To update your Helm chart with autoscaling parameters:

Create a YAML file with your autoscaling configuration, modifying the settings to meet your requirements.

Example:

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 11
  targetCPUUtilizationPercentage: 50
  targetMemoryUtilizationPercentage: 50
  behavior:
    scaleDown:
      selectPolicy: Min
      stabilizationWindowSeconds: 100
      policies:
      - type: Percent
        value: 70
        periodSeconds: 30
    scaleUp:
      selectPolicy: Max
      stabilizationWindowSeconds: 100
      policies:
      - type: Percent
        value: 70
        periodSeconds: 30yaml

If a Helm chart for your namespace is installed, run the following command to apply your autoscaling configuration:

Syntax:

helm -n <namespace> upgrade \
<release-name> <helm-repository-name>/<helm-chart-name> \
--reuse-values \
-f <your-autoscaling-configuration-yaml>

Example:

helm -n gateway upgrade \
ingress flex-gateway/flex-gateway \
--reuse-values \
-f autoscaling.yamlhelm

When successful, the command prints output similar to this example:

Release "ingress" has been upgraded. Happy Helming!
NAME: ingress
LAST DEPLOYED: Tue Apr 18 15:46:38 2023
NAMESPACE: gateway
STATUS: deployed
REVISION: 33
TEST SUITE: None

If the Helm chart is not installed, run the following command to apply the your autoscaling configuration:

Syntax:

helm -n <namespace> \
upgrade -i --create-namespace \
<release-name> <helm-repository-name>/<helm-chart-name> \
--set-file registration.content=<registration file> \
-f <your-autoscaling-configuration-yaml>

Example:

helm -n gateway \
upgrade -i --create-namespace \
ingress flex-gateway/flex-gateway \
--set-file registration.content=registration.yaml \
-f autoscaling.yamlhelm

When successful, the command prints output similar to this example:

NAME: ingress
LAST DEPLOYED: Tue Apr 18 15:50:29 2023
NAMESPACE: gateway
STATUS: deployed
REVISION: 1
TEST SUITE: None

Verify your autoscaling configuration by running the following command:

Syntax:

kubectl get hpa -n <namespace>kubectl

Example:

kubectl get hpa -n gatewaykubectl

The command prints output similar to this example:

NAME     REFERENCE           TARGETS          MINPODS  MAXPODS  REPLICAS  AGE
ingress  Deployment/ingress  66%/50%, 5%/50%  2        11       2         59s

Autoscaling Parameters

To use autoscaling, enable Horizontal Pod Autoscaler (HPA), and configure other Helm chart settings to define autoscaling behavior.

Parameter Description

Parameter	Description
`autoscaling.enabled`	Boolean that indicates whether the Horizontal Pod Autoscaler (HPA) is enabled. Defaults to `false`.
`autoscaling.minReplicas`	The minimum number of replicas that the HPA scaler is allowed to create. Defaults to `2`.
`autoscaling.maxReplicas`	The maximum number of replicas that the HPA scaler is allowed to create. Defaults to `11`.
`autoscaling.targetCPUUtilizationPercentage`	A resource metric that sets the average CPU usage percentage of all deployed Pods. Defaults to `50`.
`autoscaling.targetMemoryUtilizationPercentage`	A string representing the average memory usage percentage of all deployed Pods. Defaults to `nil`.
`autoscaling.behavior`	A setting that supports the `HorizontalPodAutoscaler` (HPA) object in Kubernetes. HPA settings control autoscaling behavior, such as the number of application resources based on the rate of change to the workload.
`autoscaling.behavior.scaleUp`, `autoscaling.behavior.scaleDown`	Settings for autoscaling behavior when the HPA scaler increases (`scaleUp`) or decreases (`scaleDown`) the number of replicas. Define scaling behavior with nested parameters: `selectPolicy`: If a metric of configured resources indicates that scaling is required, this setting determines when to scale Pods. This setting is necessary because Pods can contain different numbers of replicas. Valid values are `Disabled`, `Min`, and `Max`. Defaults to `Max`. The `Min` and `Max` values indicate whether to scale Pods with the lowest (`Min`) or highest (`Max`) number of replicas. `stabilizationWindowSeconds`: The minimum number of seconds to wait to rescale an application after a change to the workload. The goal is to prevent an overly aggressive or unstable response by HPA when scaling too frequently. `policies`: A list of policies that determine scaling behavior: `type`: The type of value for a given policy, 'Percent' or 'Pods'. `value`: The value of the type for a given policy. `periodSeconds`: The number of seconds between scaling operations for a given policy.

autoscaling.enabled

Boolean that indicates whether the Horizontal Pod Autoscaler (HPA) is enabled. Defaults to false.

autoscaling.minReplicas

The minimum number of replicas that the HPA scaler is allowed to create. Defaults to 2.

autoscaling.maxReplicas

The maximum number of replicas that the HPA scaler is allowed to create. Defaults to 11.

autoscaling.targetCPUUtilizationPercentage

A resource metric that sets the average CPU usage percentage of all deployed Pods. Defaults to 50.

autoscaling.targetMemoryUtilizationPercentage

A string representing the average memory usage percentage of all deployed Pods. Defaults to nil.

autoscaling.behavior

A setting that supports the HorizontalPodAutoscaler (HPA) object in Kubernetes. HPA settings control autoscaling behavior, such as the number of application resources based on the rate of change to the workload.

autoscaling.behavior.scaleUp, autoscaling.behavior.scaleDown

Settings for autoscaling behavior when the HPA scaler increases (scaleUp) or decreases (scaleDown) the number of replicas. Define scaling behavior with nested parameters:

selectPolicy: If a metric of configured resources indicates that scaling is required, this setting determines when to scale Pods. This setting is necessary because Pods can contain different numbers of replicas. Valid values are Disabled, Min, and Max. Defaults to Max. The Min and Max values indicate whether to scale Pods with the lowest (Min) or highest (Max) number of replicas.
stabilizationWindowSeconds: The minimum number of seconds to wait to rescale an application after a change to the workload. The goal is to prevent an overly aggressive or unstable response by HPA when scaling too frequently.
policies: A list of policies that determine scaling behavior:
- type: The type of value for a given policy, 'Percent' or 'Pods'.
- value: The value of the type for a given policy.
- periodSeconds: The number of seconds between scaling operations for a given policy.

For more information about autoscaling, see the Kubernetes documentation Leaving the Site .