Contact Us 1-800-596-4880

Configure Alerts for Anypoint Platform PCE

Anypoint Platform Private Cloud Edition (Anypoint Platform PCE) provides built-in alerts that are triggered when a condition specified in any alert definition is detected.

Measurements are stored in Prometheus and read by Alertmanager. Alertmanager sends emails when an alert is triggered.

Alert Definitions

Default alerts are listed in the following table:

Table 1. Alert Definitions
Component Alert Description

CPU

High CPU usage

Triggers a warning when > 75% used; triggers a critical error when > 90% used

Memory

High memory usage

Triggers a warning when > 80% used; triggers a critical error when > 90% used

Systemd

Overall systemd health

Triggers an error when systemd detects a failed service

Systemd

Individual systemd unit health

Triggers an error when a systemd unit is not loaded or active

Filesystem

High disk space usage

Triggers a warning when > 80% used; triggers a critical error when > 90% used

Filesystem

High inode usage

Triggers a warning when > 90% used; triggers a critical error when > 95% used

System

Uptime

Triggers a warning when the uptime for a node is less than five minutes

System

Kernel parameters

Triggers an error if a parameter is not set. See value matrix for details.

Etcd

Etcd instance health

Triggers an error when an etcd leader is down longer than five minutes

Etcd

Etcd latency check

Triggers a warning when follower <→ leader latency exceeds 500 ms; triggers an error when it exceeds one second over a period of one minute

Docker

Docker daemon health

Triggers an error when the Docker daemon is down

Kubernetes

Kubernetes node readiness

Triggers an error when the node is not ready

Configure Alert Definitions

You define new alerts using a gravity resource called alert, as shown in the following example:

kind: alert
version: v2
metadata:
name: cpu-alert
spec:
# the alert name
alert_name: CPUAlert
# the rule group the alert belongs to
group_name: test-group
# the alert expression
formula: |
    node:cluster_cpu_utilization:ratio * 100 > 80
# the alert labels
labels:
    severity: info
# the alert annotations
annotations:
    description: |
    Cluster CPU usage exceeds 80%.

See the Alerting Rules documentation for more details about Prometheus alerts.

  • To create an alert, run:

    gravity resource create alert.yaml
  • To view existing alerts, run:

    gravity resource get alerts
  • To remove an alert, run:

    gravity resource rm alert cpu-alert

Configure Alerts Delivery

To configure Alertmanager to send email alerts, create the following gravity resources:

  1. Using the following spec, create a file named smtp-config.yaml inside gravity, and replace the placeholder values with those of your SMTP configuration:

    kind: smtp
    version: v2
    metadata:
        name: smtp
    spec:
       host: <SMTP_HOST>
       port: <SMTP_PORT>
       username: <SMTP_USERNAME>
       password: <SMTP_PASSWORD>
    ---
    kind: alerttarget
    version: v2
    metadata:
        name: email-alerts
    spec:
        # email address of the alerts recipient
        email: <RECIPIENT_EMAIL>
  2. Run gravity resource create smtp-config.yaml. You should see the following output:

    Created cluster SMTP configuration
    Created monitoring alert target "email-alerts"
  3. Add a default router to the Alertmanager configuration inside gravity:

    kubectl get secret -n monitoring alertmanager-monitoring-kube-prometheus-alertmanager -o json | jq --arg foo "$(kubectl get secret -n monitoring alertmanager-monitoring-kube-prometheus-alertmanager -o json | jq -r '.data["alertmanager.yaml"]' | base64 -d | yq r - --tojson | jq -r '.route.routes[1] |= . + {"match":{"alertname": "Watchdog", "receiver": "default", "continue": true}}' | jq -r '.route.routes[0].match += {"continue":true}' | yq r - -P | base64 | tr -d '\n')" '.data["alertmanager.yaml"]=$foo' | kubectl apply -f -
  4. Configure the FROM email address by replacing the <SMTP_FROM> value:

    kubectl get secret -n monitoring alertmanager-monitoring-kube-prometheus-alertmanager -o json | jq --arg foo "$(kubectl get secret -n monitoring alertmanager-monitoring-kube-prometheus-alertmanager -o json | jq -r '.data["alertmanager.yaml"]' | base64 -d | yq w - 'global.smtp_from' <SMTP_FROM> | base64 | tr -d '\n')" '.data["alertmanager.yaml"]=$foo' | kubectl apply -f -
  5. Restart Alertmanager pods:

    kubectl delete pod -n monitoring -l app=alertmanager
  6. Test Alertmanager by running the following command inside gravity:

    curl -H 'Content-Type: application/json' -d '[{"labels":{"alertname":"test-alert","state":"firing"}}]' http://monitoring-kube-prometheus-alertmanager.monitoring.svc.cluster.local:9093/api/v1/alerts

Troubleshooting Alerts

Common troubleshooting tasks include the following:

  • Verify that your SMTP server can send and receive emails using the addresses you defined as the FROM and TO addresses when you configured alerts delivery.

  • Verify that your cluster nodes can communicate with your SMTP server.

    For example, use telnet to connect to your SMTP server from one of your cluster nodes:

    telnet my.smtp.server.com 587
    Trying XXX.XXX.XXX.XXX...
    Connected to my.smtp.server.com.
    Escape character is '^]'.
    220 my.smtp.server.com ESMTP
    ^[^]
    telnet> quit
    Connection closed.

See Also