Performing Scheduled Maintenance on Runtime Fabric on VMs / Bare Metal

Where possible, we changed noninclusive terms to align with our company value of Equality. We maintained certain terms to avoid any effect on customer implementations.

When using Runtime Fabric on VMs / Bare Metal you can Use the following procedures to perform scheduled maintenance. This includes applying OS patches and security updates.

Controller Node Maintenance

Controller nodes are entry points to your Runtime Fabric cluster; they do not run application workloads.

Update all controller nodes, one at a time:

On your upstream TCP load balancer, remove a controller node from your load balancing pool.
Perform the required patching and reboot the node.
When updates are complete on the controller node, add the node back to the cluster and continue with the next one.

Worker Node Maintenance

To update worker nodes:

On a controller node, run the gravity shell command.

Verify that the status for all nodes is READY:

$ kubectl get nodes

NAME          STATUS   ROLES    AGE   VERSION
172.31.0.9    Ready    node     1d   v1.x.x
172.31.0.8    Ready    master   1d   v1.x.x

Retrieve the application namespace (environment ID):

$ kubectl get ns

NAME                                   STATUS   AGE
default                                Active   1d
fff5df7b-c49d-48c9-967d-0071412722x4   Active   1d
kube-node-lease                        Active   1d
kube-public                            Active   1d
kube-system                            Active   1d
monitoring                             Active   1d
rtf                                    Active   1d

In the command output, the application namespace is represented by a long string of characters. In the previous example, the application namespace is fff5df7b-c49d-48c9-967d-0071412722x4.

More than one application namespace value is displayed if the RTF cluster is associated with more than one environment.

Verify that the pods and nodes are still running in your namespace:

$ kubectl -n <namespace> get pod -owide

The command output displays Running in the READY column:

NAME                                         READY   STATUS    RESTARTS   AGE     IP             NODE          NOMINATED NODE   READINESS GATES
my-app-name-1-84f95cb7c9-glktk     2/2     Running   0          9m27s   10.244.88.29   172.31.0.9   <none>           <none>

If you have multiple environments, you must perform the remaining steps in each environment.

For each worker node that require updates:
1. Drain the node.
  $ kubectl drain <node name> --ignore-daemonsets --delete-local-data
  The following actions are performed when a node is drained:
  - The application is scheduled on another node and is then deleted.
  - Node status is updated to SchedulingDisabled.
    
    Unless you are running at least two replicas of your application and specified Enforce deploying replicas across node when you deployed it, service is disrupted during the drain process.
2. Verify that the node’s status is Ready,SchedulingDisabled:
  $ kubectl get node NAME STATUS ROLES AGE VERSION 172.31.0.9 Ready,SchedulingDisabled node 1d v1.x.x
3. Verify that no application pods are running on the drained node:
  $ kubectl get pod -n <namespace> -owide
  The command output displayed in the NODE column should indicate that no application pods are running on the drained node.
4. Proceed with OS patching and reboot the node.
5. After the reboot is complete, reenable scheduling on the node:
  $ kubectl uncordon 172.31.0.9 node/172.31.0.9 uncordoned $ kubectl get node NAME STATUS ROLES AGE VERSION 172.31.0.9 Ready node 1d v1.x.x