Skip to content

Resize Machines

Steps to resize a Machine on OpenShift cluster.

Important Note

  • All steps described here will follow the safety way to resize a Machine in OCP 4.x.
  • This is not a official documentation and those steps were tested on versions 4.9 and 4.10.

Overview of steps:

Supported/documented platforms:

  • AWS
  • Azure

Gather cluster information

Check the provider

Make sure you are running the steps for the correct Cloud Provider:

oc get infrastructures \
    -o jsonpath='{.items[*].status.platformStatus.type}'

Example output

AWS
Azure

Check the cluster version

oc get clusterversion

Check all the nodes are Ready

Make sure that all group of nodes that will be resized are with the Status=Ready.

Theme extension prerequisites

All steps described here was done on master nodes

oc get nodes \
    -l kubernetes.io/os=linux,node-role.kubernetes.io/master=

Sample output:

NAME                     STATUS   ROLES    AGE     VERSION
mrbaz01-2754r-master-0   Ready    master   5h57m   v1.22.0-rc.0+8719299
mrbaz01-2754r-master-1   Ready    master   5h57m   v1.22.0-rc.0+8719299
mrbaz01-2754r-master-2   Ready    master   5h56m   v1.22.0-rc.0+8719299

Check all the machines are Running

Make sure that all group of nodes that will be resized are with the Status=Ready.

Notes: - Sample steps filtering the group of nodes: master

oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master
NAME                     PHASE     TYPE              REGION   ZONE   AGE
mrbaz01-2754r-master-0   Running   Standard_D4s_v3   eastus   1      6h1m
mrbaz01-2754r-master-1   Running   Standard_D4s_v3   eastus   3      6h1m
mrbaz01-2754r-master-2   Running   Standard_D4s_v3   eastus   2      6h1m

Gather Machine Information

Gather Cloud provider information from Machine object.

Choose the Cloud Provider

oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master \
    -o json \
    | jq -r '.items[]| (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceId: "+ .status.providerStatus.instanceId,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'
oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master \
    -o json \
    | jq -r '.items[]| (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceId: "+ .status.providerStatus.vmId,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'

Sample output

N/A
node_name: mrbaz01-2754r-master-0
machine_name: mrbaz01-2754r-master-0
instanceId: /subscriptions/a-b-c-d-xyz/resourceGroups/mrbaz01-2754r-rg/providers/Microsoft.Compute/virtualMachines/mrbaz01-2754r-master-0
instanceTypeSpec: Standard_D4s_v3
instanceTypeMeta: Standard_D4s_v3

node_name: mrbaz01-2754r-master-1
machine_name: mrbaz01-2754r-master-1
instanceId: /subscriptions/a-b-c-d-xyz/resourceGroups/mrbaz01-2754r-rg/providers/Microsoft.Compute/virtualMachines/mrbaz01-2754r-master-1
instanceTypeSpec: Standard_D4s_v3
instanceTypeMeta: Standard_D4s_v3

node_name: mrbaz01-2754r-master-2
machine_name: mrbaz01-2754r-master-2
instanceId: /subscriptions/a-b-c-d-xyz/resourceGroups/mrbaz01-2754r-rg/providers/Microsoft.Compute/virtualMachines/mrbaz01-2754r-master-2
instanceTypeSpec: Standard_D4s_v3
instanceTypeMeta: Standard_D4s_v3

General steps to resize each machine

Tip

Repeat the steps bellow for each machine you want to resize

Set the machine_name variable value.

Warning

The variable machine_name should be set specific for your environment, and updated for each machine to resize.

machine_name=mrbaz01-2754r-master-0

Set the new Machine size

new_machine_type="<cloud_provider_size>"

Example by Cloud Provider

To check EC2 compatibility with OCP, please check this doc, then set:

new_machine_type="m5.xlarge"

To check VM size available for specific VM, run:

az vm list-vm-resize-options \
    --resource-group ${resource_group} \
    --name ${machine_name} \
    --output table

Then set the desired value:

new_machine_type="Standard_D8s_v3"

Collect Machine info

Attention

You shouldn't change any step describe below, just run according your environment.

Discovery variable values based on ${machine_name}

Choose the Cloud Provider

instanceId=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.providerStatus.instanceId})

node_name=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.nodeRef.name})
resource_group=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.spec.providerSpec.value.resourceGroup})

instanceId=${machine_name}

node_name=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.nodeRef.name})
  • Make sure all varialbes are set:
echo "[${instanceId}] [${node_name}] ${resource_group:-}"

Graceful Power off

  • Cordon the node
oc adm cordon ${node_name}
  • Drain the node
oc adm drain ${node_name} --ignore-daemonsets --grace-period=60
  • Shutdown
oc debug node/${node_name} -- chroot /host shutdown -h 1
  • Wait the node to shutdown

Attention

Wait until node is Status=NotReady

oc get node ${node_name} -w
  • Wait until the Instance/VM is in stopped state (by Cloud provider)

Choose the Cloud Provider

while true; do \
    st=$(aws ec2 describe-instance-status \
        --instance-id ${instanceId} \
        | jq -r .InstanceStatuses[0].InstanceState.Name); \
    echo “state=$st; \
    test $st == "null" && break; \
    test $st == "running" && ( \
        echo "state=$st; sleeping 15s"; \
        sleep 15;\
    ); \
done
while true; do \
    st=$(az vm get-instance-view \
        --resource-group ${resource_group} \
        --name ${machine_name} \
        --output json \
        | jq -e '.instanceView.statuses[] \
        | select( .code | startswith("PowerState") ).code'); \
    echo “state=$st; \
    test $st == "\"PowerState/stopped\"" && break; \
    test $st == "\"PowerState/running\"" && ( \
        echo "state=$st; sleeping 15s"; \
        sleep 15;\
    ); \
done
  • Make sure that the node is turned off

Choose the Cloud Provider

aws ec2 describe-instance-status \
    --instance-id ${instanceId}

Expected result:

{
    "InstanceStatuses": []
}
az vm get-instance-view \
    --resource-group ${resource_group} \
    --name ${machine_name}  \
    --output table

Expected result:

Name                    ResourceGroup     Location    ProvisioningState    PowerState
----------------------  ----------------  ----------  -------------------  ------------
mrbaz01-2754r-master-0  mrbaz01-2754r-rg  eastus      Succeeded            VM stopped

Change instance Type

  • Change the size

Choose the Cloud Provider

aws ec2 modify-instance-attribute \
    --instance-id ${instanceId} \
    --instance-type ${new_machine_type}
az vm resize \
    --resource-group ${resource_group} \
    --name ${machine_name} \
    --size ${new_machine_type}
  • Check the current [new] size

Choose the Cloud Provider

aws ec2 describe-instance-attribute \
    --instance-id ${instanceId} \
    --attribute instanceType
az vm get-instance-view \
    --resource-group ${resource_group} \
    --name ${machine_name} \
    --output json \
    | jq -r '.hardwareProfile.vmSize'

Power on

  • Power on the VM

Choose the Cloud Provider

aws ec2 start-instances \
    --instance-ids ${instanceId}
az vm start \
    --resource-group ${resource_group} \
    --name ${machine_name} \
    --output table
  • Wait until the Instance is in running state from Cloud Provider

Choose the Cloud Provider

while true; do \
    st=$(aws ec2 describe-instance-status \
        --instance-id ${instanceId} \
        | jq -r .InstanceStatuses[0].InstanceState.Name \
    ); \
    echo "state=$st"; \
    test $st == "running" && break; \
    test $st == "null" && ( \
        echo "state=$st; sleeping 15s"; \
        sleep 15;\
    ); \
done
while true; do
    st=$(az vm get-instance-view \
        --resource-group ${resource_group} \
        --name ${machine_name} \
        --output json \
        | jq -e '.instanceView.statuses[] | select( .code | startswith("PowerState") ).code');
    echo "state=$st";
    test $st == "\"PowerState/running\"" && break;
    test $st == "\"PowerState/stopped\"" && ( \
        echo "state=$st; sleeping 15s"; \
        sleep 15;\
    );
done
  • Wait the node to be in Ready (STATUS=Ready)
oc get node ${node_name} -w
  • Wait MAPI to reconcile and update the new machine size (TYPE)
oc get machine ${machine_name} \
    -n openshift-machine-api

Sample output

NAME                   PHASE     TYPE        REGION      ZONE         AGE
mrbg3-4glln-master-0   Running   m5.xlarge   us-east-1   us-east-1a   48m
NAME                     PHASE     TYPE              REGION   ZONE   AGE
mrbaz01-2754r-master-0   Running   Standard_D8s_v3   eastus   1      7h8m
  • Make sure that no csr is pending (it shouldn't have any pending)

All certs should be issued and approved, just make sure if there was any issue in that step.

oc get csr
  • Some operators should be degraded, review it:
oc get co
  • Uncordon the node
oc adm uncordon ${node_name}
  • Wait until all operators clear the degraded state
oc get co -w
  • Review the Machine object attributes

Choose the Cloud Provider

oc get machine ${machine_name} \
    -n openshift-machine-api \
    -o json \
    | jq -r '. | (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'
oc get machine ${machine_name} \
    -n openshift-machine-api \
    -o json \
    | jq -r '. | (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'

Patch Machine API

Patch Machine Object:

Choose the Cloud Provider

oc patch machine ${machine_name} \
    -n openshift-machine-api \
    --type=merge \
    -p "{\"spec\":{\"providerSpec\":{\"value\":{\"instanceType\":\"${new_machine_type}\"}}}}"
oc patch machine ${machine_name} \
    -n openshift-machine-api \
    --type=merge \
    -p "{\"spec\":{\"providerSpec\":{\"value\":{\"vmSize\":\"${new_machine_type}\"}}}}"
  • Review if the Machine Type was changed:

Example output

oc get machines ${machine_name} \
    -n openshift-machine-api \
    -o json \
    | jq -r '. | (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'

Sample output:

node_name: ip-10-0-133-111.ec2.internal
machine_name: mrbg3-4glln-master-0
instanceTypeSpec: m5.xlarge
instanceTypeMeta: m5.xlarge

oc get machines ${machine_name} \
    -n openshift-machine-api \
    -o json \
    | jq -r '. | (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'

Sample output:

node_name: mrbaz01-2754r-master-1
machine_name: mrbaz01-2754r-master-1
instanceTypeSpec: Standard_D8s_v3
instanceTypeMeta: Standard_D8s_v3

Check services

  • Check all cluster operators
oc get co
  • Review Kube apiservers
oc get pod kube-apiserver-${node_name} \
    -n openshift-kube-apiserver
  • Review etcd cluster

Pods

oc get pod etcd-${node_name} \
    -n openshift-etcd

Example output

NAME                          READY   STATUS    RESTARTS   AGE
etcd-mrbaz01-2754r-master-1   4/4     Running   4          7h12m

Members

oc exec \
    -n openshift-etcd \
    etcd-${node_name} -- etcdctl member list -w table 2>/dev/null

Example output

+------------------+---------+------------------------+-----------------------+-----------------------+------------+
|        ID        | STATUS  |          NAME          |      PEER ADDRS       |     CLIENT ADDRS      | IS LEARNER |
+------------------+---------+------------------------+-----------------------+-----------------------+------------+
| 612953730164bdff | started | mrbaz01-2754r-master-2 | https://10.0.0.6:2380 | https://10.0.0.6:2379 |      false |
| 8bf6319e4243538c | started | mrbaz01-2754r-master-0 | https://10.0.0.7:2380 | https://10.0.0.7:2379 |      false |
| de0c658dd1ee52b8 | started | mrbaz01-2754r-master-1 | https://10.0.0.8:2380 | https://10.0.0.8:2379 |      false |
+------------------+---------+------------------------+-----------------------+-----------------------+------------+

Endpoints healthy (HEALTH=true)

oc exec \
    -n openshift-etcd \
    etcd-${node_name} -- etcdctl endpoint health -w table 2>/dev/null

Example output

+-----------------------+--------+-------------+-------+
|       ENDPOINT        | HEALTH |    TOOK     | ERROR |
+-----------------------+--------+-------------+-------+
| https://10.0.0.8:2379 |   true | 16.361971ms |       |
| https://10.0.0.6:2379 |   true | 16.523072ms |       |
| https://10.0.0.7:2379 |   true | 15.879969ms |       |
+-----------------------+--------+-------------+-------+

Repeat the steps for each machine

Repeat the section "General steps to resize each machine" for each new machine to resize

Review all changes

  • Review Nodes
oc get nodes \
    -l kubernetes.io/os=linux,node-role.kubernetes.io/master=
  • Gather current Machine summary
oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master
  • Review Machines attributes from all machines

Choose the Cloud Provider

oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master \
    -o json \
    | jq -r '.items[]| (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'
oc get machines \
    -n openshift-machine-api \
    -l machine.openshift.io/cluster-api-machine-role=master \
    -o json \
    | jq -r '.items[]| (\
        "node_name: " + .status.nodeRef.name,\
        "machine_name: "+ .metadata.name,\
        "instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
        "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
        "")'

Suggested Next Steps