Resize Machines
Steps to resize a Machine on OpenShift cluster.
Important Note
- All steps described here will follow the safety way to resize a Machine in OCP 4.x.
- This is not a official documentation and those steps were tested on versions 4.9 and 4.10.
Overview of steps:
- Gather cluster information
- Set target Machines to resize
- Set the new size
- Graceful Power off
- Change Machine size
- Power on
- Patch Machine Object spec
Supported/documented platforms:
- AWS
- Azure
Gather cluster information
Check the provider
Make sure you are running the steps for the correct Cloud Provider:
Check the cluster version
Check all the nodes are Ready
Make sure that all group of nodes that will be resized are with the Status=Ready.
Theme extension prerequisites
All steps described here was done on master nodes
Sample output:
NAME                     STATUS   ROLES    AGE     VERSION
mrbaz01-2754r-master-0   Ready    master   5h57m   v1.22.0-rc.0+8719299
mrbaz01-2754r-master-1   Ready    master   5h57m   v1.22.0-rc.0+8719299
mrbaz01-2754r-master-2   Ready    master   5h56m   v1.22.0-rc.0+8719299
Check all the machines are Running
Make sure that all group of nodes that will be resized are with the Status=Ready.
Notes:
- Sample steps filtering the group of nodes: master
oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master
NAME                     PHASE     TYPE              REGION   ZONE   AGE
mrbaz01-2754r-master-0   Running   Standard_D4s_v3   eastus   1      6h1m
mrbaz01-2754r-master-1   Running   Standard_D4s_v3   eastus   3      6h1m
mrbaz01-2754r-master-2   Running   Standard_D4s_v3   eastus   2      6h1m
Gather Machine Information
Gather Cloud provider information from Machine object.
Choose the Cloud Provider
oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master \
    -o json \
    | jq -r '.items[]| (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceId: "+ .status.providerStatus.instanceId,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'
oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master \
    -o json \
    | jq -r '.items[]| (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceId: "+ .status.providerStatus.vmId,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'
Sample output
node_name: mrbaz01-2754r-master-0
machine_name: mrbaz01-2754r-master-0
instanceId: /subscriptions/a-b-c-d-xyz/resourceGroups/mrbaz01-2754r-rg/providers/Microsoft.Compute/virtualMachines/mrbaz01-2754r-master-0
instanceTypeSpec: Standard_D4s_v3
instanceTypeMeta: Standard_D4s_v3
node_name: mrbaz01-2754r-master-1
machine_name: mrbaz01-2754r-master-1
instanceId: /subscriptions/a-b-c-d-xyz/resourceGroups/mrbaz01-2754r-rg/providers/Microsoft.Compute/virtualMachines/mrbaz01-2754r-master-1
instanceTypeSpec: Standard_D4s_v3
instanceTypeMeta: Standard_D4s_v3
node_name: mrbaz01-2754r-master-2
machine_name: mrbaz01-2754r-master-2
instanceId: /subscriptions/a-b-c-d-xyz/resourceGroups/mrbaz01-2754r-rg/providers/Microsoft.Compute/virtualMachines/mrbaz01-2754r-master-2
instanceTypeSpec: Standard_D4s_v3
instanceTypeMeta: Standard_D4s_v3
General steps to resize each machine
Tip
Repeat the steps bellow for each machine you want to resize
Set the machine_name variable value.
Warning
The variable machine_name should be set specific for your environment,
and updated for each machine to resize.
Set the new Machine size
Example by Cloud Provider
To check EC2 compatibility with OCP, please check this doc, then set:
Collect Machine info
Attention
You shouldn't change any step describe below, just run according your environment.
Discovery variable values based on ${machine_name}
Choose the Cloud Provider
- Make sure all varialbes are set:
Graceful Power off
- Cordon the node
- Drain the node
- Shutdown
- Wait the node to shutdown
Attention
Wait until node is Status=NotReady
- Wait until the Instance/VM is in stopped state (by Cloud provider)
Choose the Cloud Provider
while true; do \
    st=$(az vm get-instance-view \
        --resource-group ${resource_group} \
        --name ${machine_name} \
        --output json \
        | jq -e '.instanceView.statuses[] \
        | select( .code | startswith("PowerState") ).code'); \
    echo “state=$st”; \
    test $st == "\"PowerState/stopped\"" && break; \
    test $st == "\"PowerState/running\"" && ( \
        echo "state=$st; sleeping 15s"; \
        sleep 15;\
    ); \
done
- Make sure that the node is turned off
Choose the Cloud Provider
Change instance Type
- Change the size
Choose the Cloud Provider
- Check the current [new] size
Choose the Cloud Provider
Power on
- Power on the VM
Choose the Cloud Provider
- Wait until the Instance is in running state from Cloud Provider
Choose the Cloud Provider
while true; do
    st=$(az vm get-instance-view \
        --resource-group ${resource_group} \
        --name ${machine_name} \
        --output json \
        | jq -e '.instanceView.statuses[] | select( .code | startswith("PowerState") ).code');
    echo "state=$st";
    test $st == "\"PowerState/running\"" && break;
    test $st == "\"PowerState/stopped\"" && ( \
        echo "state=$st; sleeping 15s"; \
        sleep 15;\
    );
done
- Wait the node to be in Ready (STATUS=Ready)
- Wait MAPI to reconcile and update the new machine size (TYPE)
Sample output
- Make sure that no csr is pending (it shouldn't have any pending)
All certs should be issued and approved, just make sure if there was any issue in that step.
- Some operators should be degraded, review it:
- Uncordon the node
- Wait until all operators clear the degraded state
- Review the Machine object attributes
Choose the Cloud Provider
oc get machine ${machine_name} \
    -n openshift-machine-api \
    -o json \
    | jq -r '. | (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'
Patch Machine API
Patch Machine Object:
Choose the Cloud Provider
- Review if the Machine Type was changed:
Example output
oc get machines ${machine_name} \
    -n openshift-machine-api \
    -o json \
    | jq -r '. | (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'
Sample output:
oc get machines ${machine_name} \
    -n openshift-machine-api \
    -o json \
    | jq -r '. | (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'
Sample output:
Check services
- Check all cluster operators
- Review Kube apiservers
- Review etcd cluster
Pods
Members
Example output
+------------------+---------+------------------------+-----------------------+-----------------------+------------+
|        ID        | STATUS  |          NAME          |      PEER ADDRS       |     CLIENT ADDRS      | IS LEARNER |
+------------------+---------+------------------------+-----------------------+-----------------------+------------+
| 612953730164bdff | started | mrbaz01-2754r-master-2 | https://10.0.0.6:2380 | https://10.0.0.6:2379 |      false |
| 8bf6319e4243538c | started | mrbaz01-2754r-master-0 | https://10.0.0.7:2380 | https://10.0.0.7:2379 |      false |
| de0c658dd1ee52b8 | started | mrbaz01-2754r-master-1 | https://10.0.0.8:2380 | https://10.0.0.8:2379 |      false |
+------------------+---------+------------------------+-----------------------+-----------------------+------------+
Endpoints healthy (HEALTH=true)
Example output
+-----------------------+--------+-------------+-------+
|       ENDPOINT        | HEALTH |    TOOK     | ERROR |
+-----------------------+--------+-------------+-------+
| https://10.0.0.8:2379 |   true | 16.361971ms |       |
| https://10.0.0.6:2379 |   true | 16.523072ms |       |
| https://10.0.0.7:2379 |   true | 15.879969ms |       |
+-----------------------+--------+-------------+-------+
Repeat the steps for each machine
Repeat the section "General steps to resize each machine" for each new machine to resize
Review all changes
- Review Nodes
- Gather current Machine summary
oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master
- Review Machines attributes from all machines
Choose the Cloud Provider
oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master \
    -o json \
    | jq -r '.items[]| (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'
oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master \
    -o json \
    | jq -r '.items[]| (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'
Suggested Next Steps
- Create a kubectl plugin to handle all the steps covered here