/opt/anypoint/runtimefabric/rtfctl status
Troubleshooting Guide for Runtime Fabric
This topic describes common errors and steps to resolve them when installing Anypoint Runtime Fabric on VMs / Bare Metal.
Obtain a Full Network Assessment
Run the following command for an overall health assessment of the network:
Troubleshoot Network Connectivity Using rtfctl
Every Anypoint Runtime Fabric cluster requires connectivity with Anypoint control plane, and any interference with connectivity can limit functionality, resulting in application deployment failures or degraded status in Anypoint Runtime Manager.
You can use rtfctl
to verify that Runtime Fabric has the required outbound connectivity as well as troubleshoot connectivity issues.
Verify Outbound Connectivity
On each node, follow the instructions in Install rtfctl to install rtfctl
.
Run the following command in all controller and worker nodes on the cluster to verify the required outbound connectivity:
sudo ./rtfctl test outbound-network
Sample output:
[root@rtf-controller-1 runtimefabric]# sudo ./rtfctl test outbound-network
Using proxy configuration from Runtime Fabric (proxy "", no proxy "")
Using 'US' region
transport-layer.prod.cloudhub.io:443 ✔
https://anypoint.mulesoft.com ✔
https://worker-cloud-helm-prod.s3.amazonaws.com ✔
https://exchange2-asset-manager-kprod.s3.amazonaws.com ✔
https://ecr.us-east-1.amazonaws.com ✔
https://494141260463.dkr.ecr.us-east-1.amazonaws.com ✔
https://prod-us-east-1-starport-layer-bucket.s3.amazonaws.com ✔
https://runtime-fabric.s3.amazonaws.com ✔
tcp://dias-ingestor-nginx.prod.cloudhub.io:443 ✔
If you have outbound connectivity issues that prevent Runtime Fabric from reaching any of the required Anypoint control plane services, work with your network team to verify that you have added the required port IPs and hostnames to the allowlist as described in Port IP Addresses and Hostnames to Add to the Allowlist.
Troubleshoot Chrony Synchronization Check Failure
Runtime Fabric requires chrony for its time synchronization daemon. If chrony is not found by the init.sh
installation script, Runtime Fabric installation fails with the following error:
============================================================================================
1 / 10: Install required packages
================================================
chrony-3.4-1.el7.x86_64
Checking chrony sync status...Retrying in 30 seconds...
Retrying in 30 seconds...
Error: chrony sync check failed 3 times, giving up.
***********************************************************
** Oh no! Your installation has stopped due to an error. **
***********************************************************
1. Visit the troubleshooting guide for help:
xref:runtime-fabric::troubleshoot-guide.adoc#troubleshoot-install-package-issues[Troubleshoot Install Package Issues].
2. Resume installation by running /opt/anypoint/runtimefabric/init.sh
Additional information: Error code: 1; Step: install_required_packages; Line: -;
Perform the following steps to fix this issue:
-
Verify that chrony is enabled:
systemctl enable chronyd
-
Verify that Network Time Protocol (NTP) is disabled and is not running:
systemctl stop ntpd; systemctl disable ntpd
-
Contact your network team to verify that the time servers in
/etc/chrony.conf
are reachable. -
Verify that chrony is synced with sources:
chronyc sourcestats -v Number of sources = 4 Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev ============================================================================== ns3.turbodns.co.uk 16 12 58m -0.006 0.014 +370us 14us test.diarizer.com 16 9 327m -0.184 0.144 -1618us 799us ntp1.vmar.se 6 3 21m +0.117 0.074 +228us 10us time.cloudflare.com 6 5 86m +0.014 0.214 +2040ns 94us
-
Verify that the
Leap status
value ofchronyc
tracking isNormal
:chronyc tracking Reference ID : A29FC87B (time.cloudflare.com) Stratum : 4 Ref time (UTC) : Mon Jul 20 15:42:57 2020 System time : 0.000000003 seconds slow of NTP time Last offset : +0.000344024 seconds RMS offset : 0.000172783 seconds Frequency : 2.265 ppm slow Residual freq : +0.149 ppm Skew : 0.124 ppm Root delay : 0.003362593 seconds Root dispersion : 0.000759320 seconds Update interval : 1031.1 seconds Leap status : Normal
-
Retry the installation by running
init.sh
.
Troubleshoot Cluster Issues
When filing a support case, the support team might ask you to run one or both of the following commands to generate debugging information:
-
The
rtfctl report
command to generate an archive containing only Kubernetes objects and logs. -
The
rtfctl appliance report
command to collect diagnostics from all cluster nodes.
The support team might also ask you to download information through Ops Center as described in Download Debug Info.
Troubleshoot Application Deployment Issues
In rare situations, the Anypoint Monitoring agent might prevent an application from deploying. In these situations, you might see the following messages:
-
The application remains in the
Deploying
state, or -
Error starting monitoring agent (code -1)
In this situation, redeploy your application and set the following custom property:
anypoint.platform.config.analytics.agent.enabled=false
The Anypoint Monitoring agent might also change the state of a deployed application. If you see one of the following:
-
The application transitions from
Running
toApplying
, or -
Monitoring agent has exited with code -1
This indicates that the agent is restarting. There should be no impact to the running application. Application metrics are queued, and are again collected after the agent restarts.
Troubleshoot Application Runtime Issues
If any of the following Runtime Fabric alert messages are reported, you might need to recover one or more controller nodes.
Management plane is unreachable InfluxDB is down or no connection between Kapacitor and InfluxDB Node is down CRITICAL / Kubernetes node is not ready: <ip_address> CRITICAL / etcd: cluster is unhealthy
Open a terminal and run the gravity status
command to obtain the health status of the cluster
as well as individual components.
To recover a node, follow the instructions provided in Add or Remove a Node from a Runtime Fabric.
Troubleshoot Environment Variable Issues
This step detects the variables which the installation process needs to carry out its procedures. The methods for providing these variables to the installation vary based upon where the installation is running.
-
For AWS, the variables are set in the terraform script, and outputted to a file located in
/opt/anypoint/runtimefabric/env
. -
For Azure, the variables are set when running the ARM template, and are retrieved as tags on the Virtual Machine instances.
-
For manual installations, the user creates a file with the values located in
/opt/anypoint/runtimefabric/env
.
After these properites are retrieved, a procedure will run to connect to Anypoint Platform and
retrieve additional values based upon the RTF_ACTIVATION_DATA
value.
Common errors
The following error may occur if there is an issue with the activation data value, or if there is trouble reaching the internet on the instance:
curl: (7) Failed connect to anypoint.mulesoft.com:443; Operation now in progress Error: Failed to fetch registration properties. (000). Please check your token is valid ============ Error ============ Exit code: 1 Line:
If this error is observed, try the following:
-
Ensure your instance has outbound internet connectivity. A simple way to validate is to run the following command and verify a 301 response is returned:
curl https://anypoint.mulesoft.com
-
Re-try running the installation procedure, in case the network connectivity was not finished initalizing.
-
On Azure, the script should be located at
/opt/anypoint/runtimefabric/script.sh foreground
-
On AWS and manual installatons, the script should be located at
/opt/anypoint/runtimefabric/init.sh foreground
-
-
Validate the activation data value is correct by comparing with the Runtime Fabric created in Anypoint Runtime Manager.
If you are still encountering issues, file a support ticket for further assistance.
Resume a Failed Installation
You can resume an installation at the point where it failed by running the init script:
-
AWS and manual installations:
/opt/anypoint/runtimefabric/init.sh
-
Azure installations:
/opt/anypoint/runtimefabric/script.sh
Troubleshoot Install Package Issues
This step will install required packages on the instance. It uses the yum
package repository
to download and install the required packages.
Common errors
If there is a failure on this step, verify the following:
-
Ensure your instance has outbound internet connectivity. A simple way to validate is to run the following command and verify a 301 response is returned:
curl https://anypoint.mulesoft.com
-
If running a manual installation, ensure the
init.sh
script is run with root privledges:sudo ./init.sh foreground
-
Manually install one of the required packages to determine if it is successful outside of the installation script:
sudo yum install -y chrony
-
If not successful, work with your operations team for assistance. You may need to ask for elevated access to the instance.
-
If manual installation of a package was successful, or if you still require assistance, file a support ticket.
Troubleshoot Ops Center Monitoring and Logs Issues
If Ops Center monitoring and logging fails to restart after restarting one or more nodes, ensure port forwarding rules are applied on all VMs so that traffic can communicate with the Kubernetes pods running on the VMs. Refer to Enable Forwarding When Using firewalld for additional information.
Format and Mount Disks
This step performs the following tasks on the block devices or disks provided with the
RTF_DOCKER_DEVICE
and/or RTF_ETCD_DEVICE
variables:
-
Performs a check to confirm the values map to block devices available on the instance.
-
Unmounts the disks in case they were previously mounted.
-
Formats the disks.
-
Adds an mount entry in the
/etc/fstab
file. -
Creates directories based upon the values in
$DOCKER_MOUNT
and/or$ETCD_MOUNT
. -
Mounts the disks to the expected directories created above.
Install RTF Components
This step connects to Anypoint Platform to download and install the Runtime Fabric components on the cluster.
In some cases, this step may return an error if the deployment failed to complete within the time limit:
... [OK] Installing Runtime Fabric Agent. This may take several minutes... configmap "grafana-dashboards" deleted configmap "kapacitor-alerts" deleted Release "runtime-fabric" does not exist. Installing it now. The following deployments failed to become ready within the time limit: monitor --- Name: monitor-79c7564b77-wb9c6 Namespace: rtf Node: 10.165.12.87/10.165.12.87 Start Time: Thu, 13 Dec 2018 20:23:59 +0000 Labels: app=monitor pod-template-hash=3573120633 Annotations: checksum/config=4c4aac48d9cc8b24828b38ba0eb587840bc17b2449a54d593f74e2d58e5c12ae kubernetes.io/psp=privileged seccomp.security.alpha.kubernetes.io/pod=docker/default Status: Running IP: 10.244.82.17 Controlled By: ReplicaSet/monitor-79c7564b77 Containers: ... << More information displayed that describes the deployment manifest and stack trace >>
If this error is observed:
-
Verify outbound TCP port 5672 is open to the Internet. Connections should be allowed from the controller VM(s) running in your internal network to this hostname on the Internet.
-
A TCP proxy may be needed to establish a connection to the Internet. Check with your network team to verify and configure if needed. Refer to Anypoint Runtime Fabric Installation Prerequisites.