Resource Allocation and Performance on Anypoint Runtime Fabric on VMs/Bare Metal

Before deploying a Mule application to Anypoint Runtime Fabric, determine the correct number of resources to allocate. Determining resource allocation is also important when configuring the internal load balancers on Anypoint Runtime Fabrics.

When a Mule application is deployed to Runtime Fabric, the application is deployed with its own Mule runtime engine (Mule). The number of replicas, or instances of that application and runtime, are also specified. The resources available for each replica are determined by the values you set when deploying an application.

The performance information provided in this topic is based on a Runtime Fabric cluster with controller nodes using AWS EC2 M4 instances. The cluster was configured with three m4.large controller nodes and three r4.large worker nodes. M4 instances feature a custom Intel Xeon E5-2676 v3 Haswell processor optimized specifically for EC2. The processors run at a base clock rate of 2.4 GHz, and can go as high as 3.0 GHz with Intel Turbo Boost. The load generator used was hosted on a separate AWS instance in the same region.

You can allocate the following resources when deploying an application:

vCPU cores
- Reserved vCPU
  
  The amount of vCPU guaranteed to the application and reserved for its use.
- vCPU Limit
  
  The maximum amount of vCPU the application can use (the level to which it can burst). This is shared CPU on the worker node.
Memory

MuleSoft License Compliance

You are responsible for ensuring that you remain compliant with your MuleSoft licensing. To remain compliant, ensure one of the following:

The sum of CPU Limits allocated to each Mule application replica is less than or equal to the total license quantity.
The total CPU capacity provided by Runtime Fabric worker nodes running Mule applications is less than or equal to the licensed quantity.

For instructions on how to track Reserved vCPU and vCPU Limit, refer to How to Get Runtime Fabric (RTF) CPU Reserved and Limit for Self Managed Kubernetes and RTF Appliance.

For instructions on how to monitor your usage and get alerts if your usage exceeds your license, refer to Cronjob Script to Monitor RTF Appliance CPU of Mule Apps from the MuleSoft Knowledge Base.

vCPU Allocation

When the Reserved vCPU and vCPU Limit are equal, the CPU on the worker node is allocated in a guaranteed model. The container always has the specified amount of CPU available. The vCPU is reserved for each replica of the application and is guaranteed regardless of what else is running on Runtime Fabric.

Guaranteed CPU allocation allows applications to perform consistently by ensuring there is no CPU contention between other applications. It guarantees performance in all cases.

When the value of vCPU Limit is set higher than the value of Reserved vCPU, the application can burst up to the either the vCPU limit value or the amount of unreserved vCPU on the worker node, whichever is less.

When estimating the CPU and memory required for worker nodes, Anypoint Monitoring agent requires at least 0.05 vCPU cores and 50 Mi of memory for Anypoint Monitoring. This container is used to stream application metrics to the control plane and cannot be disabled. These core and memory allocations are not used for Mule licensing.

CPU Bursting

When using CPU bursting, you must consider the following factors:

Minimum reserved vCPU: 0.02 vCPU.
Minimum vCPU limit: 0.07 vCPU.

Resource intensive applications might require additional CPU capacity. If an application deployment fails using the minimum vCPU limit, increase the vCPU limit and redeploy the application.
The CPU limit is upper-bounded by the CPU cores provided on the worker nodes.
The maximum recommended utilization per CPU core is 20 - 25 simple applications and API gateways. More complex applications with higher usage may need more capacity.
Runtime Fabric runs a small number of services on worker nodes. These services consume between 0.3 and 0.5 vCPU, depending on whether log forwarding is enabled for Anypoint Monitoring.
Each application replica must reserve 50m CPU and 50Mi memory for the container that runs Anypoint Monitoring. This container streams application metrics to the control plane and cannot be disabled.
Applications compete for the unallocated CPU remaining on their worker node. As more applications are deployed on a worker node, less unallocated CPU is available to share with other applications and can result in poor performance. The following strategies can be considered for minimizing CPU contention with your applications:
- Deploy nightly batch applications with other applications that have peak load during the day.
- Deploy multiple replicas of an application so that they are distributed across worker nodes to maximize the surface area for unallocated CPU.
  
  Regardless of what approach is chosen, make sure you run tests to validate performance.

Memory Allocation

The minimum values for the amount of memory allocated to each replica of a Mule application or API gateway are the following:

0.7 GiB memory for Mule 4
0.5 GiB memory for Mule 3

When Anypoint Monitoring is enabled, 50MB memory is reserved for running the Anypoint Monitoring sidecar containers, which can burst up to 70MB for each application deployed when processing metrics data.

Calculating Memory Allocation on CloudHub and Runtime Fabric

Anypoint Platform allocates native and heap memory for a deployed application. Heap memory is the portion of the total memory that is made available to the Mule runtime and the application. Heap memory is used for tasks like handling payload.

Both CloudHub and Anypoint Runtime Fabric allocate both types of memory. However, they differ in how the memory allocation of each memory type is calculated.

Runtime Fabric lists the total memory available for an application.
The available heap memory is approximately half of the total memory allocated to an application.
CloudHub describes minimum memory requirements in terms of the heap memory available to an application.

Application Startup Times

The startup time for a Mule application is correlated to the total number of vCPU cores the application has access to. The following lists approximate startup times for a simple Mule proxy application performing processing on a payload:

vCPU Cores Approximate Startup Time

vCPU Cores	Approximate Startup Time
`1.00`	Less than 1 minute
`0.50`	Under 2 minutes
`0.10`	6 to 8 minutes
`0.07`	10 to 14 minutes

1.00

Less than 1 minute

0.50

Under 2 minutes

0.10

6 to 8 minutes

0.07

10 to 14 minutes

Application Performance

The resources allocated to your Mule application determine the application’s performance. The following table lists approximate values for throughput based on the total number of vCPU cores allocated for a single Mule application performing simple processing on a 10-KB payload:

vCPU Cores	Concurrent Connections	Avg Response Time (ms)
`1.00`	10	15
`0.50`	5	15
`0.10`	1	25
`0.07`	1	78

Run performance and load testing on your Mule applications to determine the number of resources to allocate.

Internal Load Balancer Memory Allocation

Internal load balancer memory requirements are impacted by the number of threads, response time latency, and message sizes. Use the following guidelines when allocating memory:

.5 GB (Default): For fewer than 500 simultaneous active connections.
1.5 GB (Large): For one or both of the following scenarios:
- 500 or more simultaneous active connections.
- Security policies are enabled.

These are general guidelines, and individual environments might require adjustments.

Internal Load Balancer

Inbound traffic is processed using an internal load balancer managed by Anypoint Runtime Fabric. Because this load balancer is responsible for TLS termination, the number of resources required scales based on the number of incoming connections and the average payload size for each request.

Performance test results are based on an Runtime Fabric cluster with controller nodes using AWS EC2 M4 instances. The cluster was configured with three m4.large controller nodes and three r4.large worker nodes. The load generator used in the performance test was hosted on a separate AWS instance in the same region. The M4 instances featured a custom Intel Xeon E5-2676 v3 Haswell processor optimized specifically for EC2, which ran at a base clock rate of 2.4 GHz. Using Intel Turbo Boost increased the clock rate to go as high as 3.0 GHz.

A load generator based on C++, which has better efficiency in SSL connections was used to yield the maximum throughput.

The following table summarizes the approximate requests (averaging 10 KB) that can be served with a single replica of the internal load balancer, based on the number of CPU cores. In most cases, Elliptical Curve Digital Signature Algorithm (ECDSA) provides double the performance of a 2K RSA key. Supported curves are secp521r1 (P-521), secp384r1 (P-384), and secp256r1, also known as prime256v1 (P-256).

Key Type	CPU	TLS Without Connection Reuse	TLS with Connection Reuse
RSA 2K	0.25	94 msg/sec	1100 msg/sec
RSA 2K	0.5	189 msg/sec	2250 msg/sec
RSA 2K	1	380 msg/sec	4000 msg/sec
RSA 4K*	0.25	14 msg/sec	1048 msg/sec
RSA 4K*	0.5	30 msg/sec	2087 msg/sec
RSA 4K*	1	59 msg/sec	3700 msg/sec
ECDSA P-256	0.25	234 msg/sec	1150 msg/sec
ECDSA P-256	0.5	451 msg/sec	2257 msg/sec
ECDSA P-256	1	860 msg/sec	4100 msg/sec

Key Type

CPU

TLS Without Connection Reuse

TLS with Connection Reuse

RSA 2K

0.25

94 msg/sec

1100 msg/sec

RSA 2K

0.5

189 msg/sec

2250 msg/sec

RSA 2K

380 msg/sec

4000 msg/sec

RSA 4K*

0.25

14 msg/sec

1048 msg/sec

RSA 4K*

0.5

30 msg/sec

2087 msg/sec

RSA 4K*

59 msg/sec

3700 msg/sec

ECDSA P-256

0.25

234 msg/sec

1150 msg/sec

ECDSA P-256

0.5

451 msg/sec

2257 msg/sec

ECDSA P-256

860 msg/sec

4100 msg/sec

*Doubling the RSA key length degrades performance by at least a factor of 6.

The internal load balancer runs on the controller VMs of Runtime Fabric. Size the VMs based on the amount and type of inbound traffic. You can allocate only half of the available CPU cores on each VM to the internal load balancer.

CPU Requirements for Keys and Certificates

Ensure that you allocate enough CPU resources to support a minmum of 10 PEM/P12 or 8 JKS/JCEKS certificates. The number of recommended cores are:

Cores	PEM/P12	JKS/JCEKS
0.25	8	2
0.5	10	4
0.75	10	6
<= 1	10	8

Key Types

RSA keys are the most common type of keys. RSA keys of 2K length offer the best compromise between security and performance.

RSA keys larger than 2K protect against brute force cracking and are appropriate for certificates that have expirations of many years. However, whenever key length is doubled, for example, from 2k to 4k, performance is reduced by a factor greater than 6.

ECDSA keys are also supported. In most cases, ECDSA doubles the performance of a 2K RSA key. Supported curves are:

secp521r1 (P-521)
secp384r1 (P-384)
secp256r1 (also known as prime256v1 (P-256))

Resource Allocation and Performance on Anypoint Runtime Fabric on VMs/Bare Metal

MuleSoft License Compliance

vCPU Allocation

CPU Bursting

Memory Allocation

Calculating Memory Allocation on CloudHub and Runtime Fabric

Application Startup Times

Application Performance

Internal Load Balancer Memory Allocation

Internal Load Balancer

CPU Requirements for Keys and Certificates

Key Types

See Also