Configuring Horizontal Autoscaling (HPA) for CloudHub 2.0 Deployments

Configure CPU-based horizontal scaling for Mule applications to make them responsive to CPU usage by automatically scaling up or down the deployment replicas as needed.

This feature applies only to customers who are opted into the new pricing and packaging model. For more details, see Anypoint Platform Pricing

In Kubernetes, a Horizontal Pod Autoscaler (HPA) automatically updates a workload resource to scale the workload to match demand. Horizontal scaling automatically deploys more pods as a response to an increased load. For more information, visit the Kubernetes documentaton.

The Global Resource Pool Limit does not limit the applications configured with Horizontal Pod Autoscaling.

Configure Horizontal Pod Autoscaling

To configure horizontal autoscaling for Mule apps deployed to CloudHub 2.0, follow these steps:

From Anypoint Platform, select Runtime Manager > Applications.
Click Deploy application.
In the Runtime tab, check the Enable CPU Based Horizontal Autoscaling box.
Set the minimum and maximum Replica Count limits.

You can deploy your application with up to 8 replicas, or you can deploy with up to 16 replicas if you have the Anypoint Integration Advanced package or a Platinum or Titanium subscription to Anypoint Platform.
Select the Replica Size from the dropdown menu.

If you have the Anypoint Integration Advanced package, you can select from Micro, Micro.Mem, and Small replica sizes. If you have a different pricing plan, select Micro or Micro.Mem replica sizes. For more information, see CloudHub 2.0 Application Deployment.
Click Deploy Application.

Autoscaling Status and Logs

When an autoscaling event occurs and your Mule application with horizontal autoscaling scales up, you can check the Scaling status by clicking View status in your application’s details window. You can also see your application’s Scaling status in the Applications list.

To track the scaled-up replicas startup and the number of replicas your application scaled from and to, check the application’s logs:

From Anypoint Platform, select Runtime Manager > Applications.
Click the row of the application with autoscaling.
Click Manage application.
Select the Logs tab.

Example application log

Info	8 minutes ago - 2023-11-08 14:35:01.466 PST - Runtime Manager
Application id:<app-ID> scaled UP from 1 to 2 replicas.
Info	a minute ago - 2023-11-08 14:41:24.819 PST - Runtime Manager
Application id:<app-ID> scaled DOWN from 2 to 1 replicas. :

You can track autoscaling events through Audit Logs in Access Management. Each time an application deployment scales, an audit log is published under the product Runtime Manager, by user Anypoint Staff. The log has Action set to Scaling, and the Object is the application ID.

The following is an example log payload:

{"properties":{"organizationId":"my-orgID-abc","environmentId":"my-envID-xyz","response":{"message":{"message":"Application id:my-appID-123 scaled DOWN from 3 to 2 replicas.","logLevel":"INFO","context":{"logger":"Runtime Manager"},"timestamp":1700234556678}},"deploymentId":"my-appID-123","initialRequest":"/organizations/my-orgID-abc/environments/my-envID-xyz/deployments/my-appID-123/specs/my-specID-456"},"subaction":"Scaling"}

Understand CPU-Based Autoscaling Policy

MuleSoft owns and applies the autoscaling policy for your Mule application deployments.

The CPU-based HPA policy used for all Mule apps deployed to CloudHub 2.0 is as follows:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
  namespace: app-namespace
spec:
  behavior:
    scaleDown:
      policies:
      - periodSeconds: 15
        type: Percent
        value: 100
      selectPolicy: Max
      stabilizationWindowSeconds: 1800
    scaleUp:
      policies:
      - periodSeconds: 180
        type: Percent
        value: 100
      selectPolicy: Max
      stabilizationWindowSeconds: 0
  maxReplicas: 3
  metrics:
  - resource:
      name: cpu
      target:
        averageUtilization: 70
        type: Utilization
    type: Resource
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app

Some points to consider:

Scale-up can occur, at most, every 180 seconds. Horizontal autoscaling might add up to 100% of the currently running replicas until reaching the maximum configured replicas. There is no stabilization window for scaling up. The target scales up immediately after the metrics indicate a scale-up is needed.
Scale-down can occur, at most, every 15 seconds. Horizontal autoscaling might remove up to 100% of the currently running replicas, scaling down the target to the minimum allowed replicas. The number of replicas removed is based on the aggregated calculations over the past 1800 seconds of the stabilization window.

Min replicas:

The minimum number of replicas guaranteed to run at any given moment.
Scale-down policy should never remove replicas below this number.

Max replicas:

The maximum number of capped replicas. No more replicas can be added above this number for scaling up.
Scale-up policy should never add replicas above this number.

Enabling HPA can result in customers incurring additional flow usage when your application scales horizontally. To avoid overages from unpredicted scaling, configure the maximum configured replicas judiciously to stay within purchased flow limits. Track your incurred flow usage through usage reports.

Performance Considerations

For a successful horizontal autoscaling of your Mule apps, review the following performance considerations:

Mule apps that scale based on CPU usage are a good fit for CPU-based HPA. For example:
- HTTP/HTTPS applications with async requests.
- Reverse Proxies.
- Applications with low latency and high throughput.
- DataWeave Transformations.
- APIKit Routing.
- API Gateways with policies.
Non-reentrant applications that don’t have built-in parallel processing such as batch jobs, scheduler applications without re-entrance and duplicate scheduling across applications, and applications with low throughput and high latency with large requests are not a good fit for CPU-based HPA.