Configure Alerts for Anypoint Platform PCE

Anypoint Platform Private Cloud Edition (Anypoint Platform PCE) provides built-in alerts that are triggered when a condition specified in any alert definition is detected.

Measurements are stored in Prometheus Leaving the Site and read by Alertmanager. Alertmanager sends emails when an alert is triggered.

Alert Definitions

Default alerts are listed in the following table:

Table 1. Alert Definitions
Component	Alert	Description
CPU	High CPU usage	Triggers a warning when > 75% used; triggers a critical error when > 90% used
Memory	High memory usage	Triggers a warning when > 80% used; triggers a critical error when > 90% used
Systemd	Overall systemd health	Triggers an error when systemd detects a failed service
Systemd	Individual systemd unit health	Triggers an error when a systemd unit is not loaded or active
Filesystem	High disk space usage	Triggers a warning when > 80% used; triggers a critical error when > 90% used
Filesystem	High inode usage	Triggers a warning when > 90% used; triggers a critical error when > 95% used
System	Uptime	Triggers a warning when the uptime for a node is less than five minutes
System	Kernel parameters	Triggers an error if a parameter is not set. See value matrix for details.
Etcd	Etcd instance health	Triggers an error when an etcd leader is down longer than five minutes
Etcd	Etcd latency check	Triggers a warning when follower <→ leader latency exceeds 500 ms; triggers an error when it exceeds one second over a period of one minute
Docker	Docker daemon health	Triggers an error when the Docker daemon is down
Kubernetes	Kubernetes node readiness	Triggers an error when the node is not ready

Configure Alert Definitions

You define new alerts using a gravity resource called alert, as shown in the following example:

kind: alert
version: v2
metadata:
name: cpu-alert
spec:
# the alert name
alert_name: CPUAlert
# the rule group the alert belongs to
group_name: test-group
# the alert expression
formula: |
    node:cluster_cpu_utilization:ratio * 100 > 80
# the alert labels
labels:
    severity: info
# the alert annotations
annotations:
    description: |
    Cluster CPU usage exceeds 80%.copy

See the Alerting Rules Leaving the Site documentation for more details about Prometheus alerts.

To create an alert, run:
```
gravity resource create alert.yaml
```
To view existing alerts, run:
```
gravity resource get alerts
```
To remove an alert, run:
```
gravity resource rm alert cpu-alert
```

Configure Alerts Delivery

To configure Alertmanager to send email alerts, create the following gravity resources:

Using the following spec, create a file named smtp-config.yaml inside gravity, and replace the placeholder values with those of your SMTP configuration:

kind: smtp
version: v2
metadata:
    name: smtp
spec:
   host: <SMTP_HOST>
   port: <SMTP_PORT>
   username: <SMTP_USERNAME>
   password: <SMTP_PASSWORD>
---
kind: alerttarget
version: v2
metadata:
    name: email-alerts
spec:
    # email address of the alerts recipient
    email: <RECIPIENT_EMAIL>copy

Run gravity resource create smtp-config.yaml. You should see the following output:

Created cluster SMTP configuration
Created monitoring alert target "email-alerts"

Add a default router to the Alertmanager configuration inside gravity:

kubectl get secret -n monitoring alertmanager-monitoring-kube-prometheus-alertmanager -o json | jq --arg foo "$(kubectl get secret -n monitoring alertmanager-monitoring-kube-prometheus-alertmanager -o json | jq -r '.data["alertmanager.yaml"]' | base64 -d | yq r - --tojson | jq -r '.route.routes[1] |= . + {"match":{"alertname": "Watchdog", "receiver": "default", "continue": true}}' | jq -r '.route.routes[0].match += {"continue":true}' | yq r - -P | base64 | tr -d '\n')" '.data["alertmanager.yaml"]=$foo' | kubectl apply -f -

Configure the FROM email address by replacing the <SMTP_FROM> value:

kubectl get secret -n monitoring alertmanager-monitoring-kube-prometheus-alertmanager -o json | jq --arg foo "$(kubectl get secret -n monitoring alertmanager-monitoring-kube-prometheus-alertmanager -o json | jq -r '.data["alertmanager.yaml"]' | base64 -d | yq w - 'global.smtp_from' <SMTP_FROM> | base64 | tr -d '\n')" '.data["alertmanager.yaml"]=$foo' | kubectl apply -f -

Restart Alertmanager pods:

kubectl delete pod -n monitoring -l app=alertmanager

Test Alertmanager by running the following command inside gravity:

curl -H 'Content-Type: application/json' -d '[{"labels":{"alertname":"test-alert","state":"firing"}}]' http://monitoring-kube-prometheus-alertmanager.monitoring.svc.cluster.local:9093/api/v1/alerts

Troubleshooting Alerts

Common troubleshooting tasks include the following:

Verify that your SMTP server can send and receive emails using the addresses you defined as the FROM and TO addresses when you configured alerts delivery.

Verify that your cluster nodes can communicate with your SMTP server.

For example, use telnet to connect to your SMTP server from one of your cluster nodes:

telnet my.smtp.server.com 587
Trying XXX.XXX.XXX.XXX...
Connected to my.smtp.server.com.
Escape character is '^]'.
220 my.smtp.server.com ESMTP
^[^]
telnet> quit
Connection closed.

Configure Alerts for Anypoint Platform PCE

Alert Definitions

Configure Alert Definitions

Configure Alerts Delivery

Troubleshooting Alerts

See Also