CloudHub High Availability and Disaster Recovery
CloudHub provides high availability (HA) and disaster recovery for application and hardware failures.
CloudHub uses Amazon AWS for its cloud infrastructure, so availability is dependent on Amazon. The availability and deployments in CloudHub are separated into different regions, which in turn point to the corresponding Amazon regions. If an Amazon region goes down, the applications within the region are unavailable and not automatically replicated in other regions.
For example, if the US East region is unavailable, the CloudHub management UI, as well as the various REST services that enable deployments, are unavailable until the region’s availability is restored. New applications can’t be deployed while US East is down.
While the control plane is unavailable, the runtime plane continues to send log data and other telemetry data, which the worker buffers (up to 1 GB) until availability is restored.
CloudHub provides an internal messaging mechanism, in the form of persistent queues, that is used for message reliability. While persistent queues are highly available within a region, they might not be accessible if the region or part of the region is unavailable (usually a few seconds or minutes), which could result in some data loss. After the region is available again, CloudHub resumes communication with the queues.
Some CloudHub modules, such as Anypoint Object Store v1, application settings, and Insight-related information, are maintained in the US East region for all applications regardless of the region where they are deployed. Anypoint Object Store v2 is maintained in the same region as the deployed CloudHub application. For both Anypoint Object Store v1 and v2, if a region is unavailable, the data persists and becomes available again after the region returns to service.
Anypoint Virtual Private Cloud (Anypoint VPC) is set up at the region level. If a region is unavailable, Anypoint VPC is unavailable unless a previous Anypoint VPC instance is set up for the other region.
If the application uses multiple workers, CloudHub deploys the workers in separate availability zones by default, providing HA across availability zones. The distance between the availability zones is variable and generally doesn’t exceed 350 miles.
If an application uses a single worker, when the availability zone is unavailable, CloudHub automatically restarts the application in a different availability zone. In this case, the application might experience downtime.
You can set up
status.mulesoft.com to receive alerts when a failure occurs in an availability zone or region.
You can use a load balancer (cloud or on-premises) for applications deployed to different regions to provide a better disaster recovery strategy.
Ensure that integrations are stateless. Transactional information isn’t shared between client invocations or executions (in case of scheduled services). If the middleware must maintain some data because of a system limitation, ensure that the data persists in an external store, such as a database or a messaging queue, and not within the middleware infrastructure or memory.
As you scale, especially in the cloud, ensure that the state of and resources used by each worker or node are independent of other workers. This model provides better performance and scalability, as well as reliability.