Install OpenShift in the cloud edge with AWS Local Zones

This article describes the steps to install the OpenShift cluster in an existing VPC with Local Zones subnets, extending compute nodes to the edge locations with MachineSets.

Table Of Contents:

Summary
Steps to create the cluster
Create the network stack
- Create the network (VPC and dependencies)
- Create the Local Zones subnet
Create the installer configuration
Create the installer manifests
- Create the Machine Set manifest for Local Zones pool
- Create IngressController manifest to use NLB
Update the VPC tag with the InfraID
Install the cluster
Steps to Destroy the Cluster
Final notes / conclusion
References

Summary

Reference Architecture

The following network assets will be created in this article:

1 VPC with CIDR 10.0.0.0/16
4 Public subnets on the zones: us-east-1a, us-east-1b, us-east-1c, us-east-1-nyc-1a
3 Private subnets on the zones: us-east-1a, us-east-1b, us-east-1c
3 NAT Gateway, one per private subnet
1 Internet gateway
4 route tables, 3 for private subnets and one for public subnets

The following OpenShift cluster nodes will be created:

3 Control Plane nodes running in the subnets on the "parent region" (us-east-1{a,b,c})
3 Compute nodes (Machine Set) running in the subnets on the "parent region" (us-east-1{a,b,c})
1 Compute node (Machine Set) running in the edge location us-east-1-nyc-1a (NYC Local Zone)

Requirements

OpenShift CLI (oc)
AWS CLI (aws)

Preparing the environment

Export the common environment variables (change me)

export VERSION=4.11.0
export PULL_SECRET_FILE=${HOME}/.openshift/pull-secret-latest.json
export SSH_PUB_KEY_FILE="${HOME}/.ssh/id_rsa.pub"

Install the clients

oc adm release extract \
    --tools "quay.io/openshift-release-dev/ocp-release:${VERSION}-x86_64" \
    -a "${PULL_SECRET_FILE}"

tar xvfz openshift-client-linux-${VERSION}.tar.gz
tar xvfz openshift-install-linux-${VERSION}.tar.gz

Opt-in the Local Zone locations

For each Local Zone location, you must opt-in on the EC2 configuration - it's opt-out by default.

You can use the describe-availability-zones to check the location available in the region running your cluster.

Export the region of your OpenShift cluster will be created:

export CLUSTER_REGION="us-east-1"
# Using NYC Local Zone (choose yours)
export ZONE_GROUP_NAME="${CLUSTER_REGION}-nyc-1"

Check the AZs available in your region:

aws ec2 describe-availability-zones \
    --filters Name=region-name,Values=${CLUSTER_REGION} \
    --query 'AvailabilityZones[].ZoneName' \
    --all-availability-zones

Depending on the region, that list can be long. Things you need to know:

${REGION}[a-z] : Availability Zones available in the Region (parent)
${REGION}-LID-N[a-z] : Local Zones available, where LID-N is the location identifier, and [a-z] is the zone identifier.
${REGION}-wl1-LID-wlz-[1-9] : Available Wavelength zones

Opt-in the location to your AWS Account - in this example US East (New York):

aws ec2 modify-availability-zone-group \
    --group-name "${ZONE_GROUP_NAME}" \
    --opt-in-status opted-in

Steps to create the Cluster

Create the network stack

Steps to network stack describe how to:

create the Network (VPC, subnets, Nat Gateways) in the parent/main zone
create the subnet on the Local Zone location

Create the network (VPC and dependencies)

The first step is to create the network resources in the zones located in the parent region. Those steps reuse the VPC stack as described in the documentation[1], adapting it to tag the subnets with proper values[2] used by Kubernetes Controller Manager to discover the subnets used to create the Load Balancer used by the default router (ingress).

[1] OpenShift documentation / CloudFormation template for the VPC

[2] AWS Load Balancer Controller / Subnet Auto Discovery

Steps to create the VPC stack:

Set the environment variables

export CLUSTER_NAME="lzdemo"
export VPC_CIDR="10.0.0.0/16"

Create the Template vars file

cat <<EOF | envsubst > ./stack-vpc-vars.json
[
  {
    "ParameterKey": "ClusterName",
    "ParameterValue": "${CLUSTER_NAME}"
  },
  {
    "ParameterKey": "VpcCidr",
    "ParameterValue": "${VPC_CIDR}"
  },
  {
    "ParameterKey": "AvailabilityZoneCount",
    "ParameterValue": "3"
  },
  {
    "ParameterKey": "SubnetBits",
    "ParameterValue": "12"
  }
]
EOF

Download the CloudFormation Template for VPC stack
Create the VPC Stack

STACK_VPC=${CLUSTER_NAME}-vpc
STACK_VPC_TPL="${PWD}/ocp-aws-local-zones-day-0_cfn-net-vpc.yaml"
STACK_VPC_VARS="${PWD}/stack-vpc-vars.json"
aws cloudformation create-stack --stack-name ${STACK_VPC} \
     --template-body file://${STACK_VPC_TPL} \
     --parameters file://${STACK_VPC_VARS}

Wait for the stack to be completed (StackStatus=CREATE_COMPLETE)

aws cloudformation describe-stacks --stack-name ${STACK_VPC}

(optional) Update the stack

aws cloudformation update-stack \
  --stack-name ${STACK_VPC} \
  --template-body file://${STACK_VPC_TPL} \
  --parameters file://${STACK_VPC_VARS}

Create the Local Zones subnet

Set the environment the variables to create the Local Zone subnet

export CLUSTER_REGION="us-east-1"
export LZ_ZONE_NAME="${CLUSTER_REGION}-nyc-1a"
export LZ_ZONE_SHORTNAME="nyc1"
export LZ_ZONE_CIDR="10.0.128.0/20"

export VPC_ID=$(aws cloudformation describe-stacks \
  --stack-name ${STACK_VPC} \
  | jq -r '.Stacks[0].Outputs[] | select(.OutputKey=="VpcId").OutputValue' )
export VPC_RTB_PUB=$(aws cloudformation describe-stacks \
  --stack-name ${STACK_VPC} \
  | jq -r '.Stacks[0].Outputs[] | select(.OutputKey=="PublicRouteTableId").OutputValue' )

Create the template vars file

cat <<EOF | envsubst > ./stack-lz-vars-${LZ_ZONE_SHORTNAME}.json
[
  {
    "ParameterKey": "ClusterName",
    "ParameterValue": "${CLUSTER_NAME}"
  },
  {
    "ParameterKey": "VpcId",
    "ParameterValue": "${VPC_ID}"
  },
  {
    "ParameterKey": "PublicRouteTableId",
    "ParameterValue": "${VPC_RTB_PUB}"
  },
  {
    "ParameterKey": "LocalZoneName",
    "ParameterValue": "${LZ_ZONE_NAME}"
  },
  {
    "ParameterKey": "LocalZoneNameShort",
    "ParameterValue": "${LZ_ZONE_SHORTNAME}"
  },
  {
    "ParameterKey": "PublicSubnetCidr",
    "ParameterValue": "${LZ_ZONE_CIDR}"
  }
]
EOF

Download the CloudFormation Template for Local Zone subnet stack
Create the Local Zones subnet stack

STACK_LZ=${CLUSTER_NAME}-lz-${LZ_ZONE_SHORTNAME}
STACK_LZ_TPL="${PWD}/ocp-aws-local-zones-day-0_cfn-net-lz.yaml"
STACK_LZ_VARS="${PWD}/stack-lz-vars-${LZ_ZONE_SHORTNAME}.json"
aws cloudformation create-stack \
  --stack-name ${STACK_LZ} \
  --template-body file://${STACK_LZ_TPL} \
  --parameters file://${STACK_LZ_VARS}

Check the status (wait to be finished)

aws cloudformation describe-stacks --stack-name ${STACK_LZ}

Repeat the steps above for each location.

Create the installer configuration

Set the vars used on the installer configuration

export BASE_DOMAIN="devcluster.openshift.com"

# Parent region (main) subnets only: Public and Private
mapfile -t SUBNETS < <(aws cloudformation describe-stacks \
  --stack-name "${STACK_VPC}" \
  | jq -r '.Stacks[0].Outputs[0].OutputValue' | tr ',' '\n')
mapfile -t -O "${#SUBNETS[@]}" SUBNETS < <(aws cloudformation describe-stacks \
  --stack-name "${STACK_VPC}" \
  | jq -r '.Stacks[0].Outputs[1].OutputValue' | tr ',' '\n')

Create the install-config.yaml file, setting the subnets recently created (parent region only)

Adapt it as your usage, the requirement is to set the field platform.aws.subnets with the subnet IDs recently created

cat <<EOF > ${PWD}/install-config.yaml
apiVersion: v1
publish: External
baseDomain: ${BASE_DOMAIN}
metadata:
  name: "${CLUSTER_NAME}"
platform:
  aws:
    region: ${CLUSTER_REGION}
    subnets:
$(for SB in ${SUBNETS[*]}; do echo "    - $SB"; done)
pullSecret: '$(cat ${PULL_SECRET_FILE} |awk -v ORS= -v OFS= '{$1=$1}1')'
sshKey: |
  $(cat ${SSH_PUB_KEY_FILE})
EOF

(Optional) Back up the install-config.yaml

cp -v ${PWD}/install-config.yaml \
    ${PWD}/install-config-bkp.yaml

Create the installer manifests

Create the manifests

./openshift-install create manifests

Patch the cluster to decrease the MTU to 1200.

Set the CNO configuration to decrease the MTU to 1200:

AWS Local Zones must use 1300 of MTU to communicate with nodes in the regular cluster:

Steps customize the of the SDN are described on the OCP docs

cat <<EOF > manifests/cluster-network-03-config.yml
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    ovnKubernetesConfig:
      mtu: 1200
EOF

<!-- patch:

yq -y --in-place ".spec.defaultNetwork.ovnKubernetesConfig.mtu=1200" \
  manifests/cluster-network-02-config.yml
``` -->

<!-- - Get the `InfraId` used in the next sections

```bash
export CLUSTER_ID="$(awk '/infrastructureName: / {print $2}' manifests/cluster-infrastructure-02-config.yml)"

option 2) (optional?) change the NM

cp ../mtu-manifests/control-plane-interface.yaml openshift/99_openshift-machineconfig_99-master-interface.yaml
cp ../mtu-manifests/worker-interface.yaml openshift/99_openshift-machineconfig_99-worker-interface.yaml
``` -->


#### Create the Machine Set manifest for Local Zones pool <a name="steps-create-manifests-ms"></a>

- Set the variables used to create the Machine Set

> Adapt the instance type as you need, as supported on the Local Zone

```bash
export INSTANCE_TYPE="c5d.2xlarge"

export AMI_ID=$(grep ami \
  openshift/99_openshift-cluster-api_worker-machineset-0.yaml \
  | tail -n1 | awk '{print$2}')

export SUBNET_ID=$(aws cloudformation describe-stacks \
  --stack-name "${STACK_LZ}" \
  | jq -r .Stacks[0].Outputs[0].OutputValue)

Create the Machine Set for nyc1 nodes

publicIp: true should be set to deploy the node in the public subnet in Local Zone.

The public IP mapping is used merely to get access to the internet (required), optionally you can modify the network topology to use a private subnet, associating correctly the Local Zone private subnet to a valid route table that has correct routing entries to the internet. Or explore the disconnected installations. None of those options will be covered in this article.

Only OCP 4.11+ is supported!

The value of spec.template.spec.providerSpec.value.apiVersion changed from awsproviderconfig.openshift.io/v1beta1 to machine.openshift.io/v1beta1 on 4.11

cat <<EOF > openshift/99_openshift-cluster-api_worker-machineset-nyc1.yaml
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
  name: ${CLUSTER_ID}-edge-${LZ_ZONE_NAME}
  namespace: openshift-machine-api
spec:
  replicas: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
      machine.openshift.io/cluster-api-machineset: ${CLUSTER_ID}-edge-${LZ_ZONE_NAME}
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
        machine.openshift.io/cluster-api-machine-role: edge
        machine.openshift.io/cluster-api-machine-type: edge
        machine.openshift.io/cluster-api-machineset: ${CLUSTER_ID}-edge-${LZ_ZONE_NAME}
    spec:
      metadata:
        labels:
          location: local-zone
          zone_group: ${LZ_ZONE_NAME::-1}
          node-role.kubernetes.io/edge: ""
      taints:
        - key: node-role.kubernetes.io/edge
          effect: NoSchedule
      providerSpec:
        value:
          ami:
            id: ${AMI_ID}
          apiVersion: machine.openshift.io/v1beta1
          blockDevices:
          - ebs:
              volumeSize: 120
              volumeType: gp2
          credentialsSecret:
            name: aws-cloud-credentials
          deviceIndex: 0
          iamInstanceProfile:
            id: ${CLUSTER_ID}-worker-profile
          instanceType: ${INSTANCE_TYPE}
          kind: AWSMachineProviderConfig
          placement:
            availabilityZone: ${LZ_ZONE_NAME}
            region: ${CLUSTER_REGION}
          securityGroups:
          - filters:
            - name: tag:Name
              values:
              - ${CLUSTER_ID}-worker-sg
          subnet:
            id: ${SUBNET_ID}
          publicIp: true
          tags:
          - name: kubernetes.io/cluster/${CLUSTER_ID}
            value: owned
          userDataSecret:
            name: worker-user-data
EOF

Create IngressController manifest to use NLB (optional)

The OCP version used in this article, is using Classic Load Balancer as default router. This option will force to use the NLB by default.

This section is based on the official documentation.

Create the IngressController manifest to use NLB by default

cat <<EOF > manifests/cluster-ingress-default-ingresscontroller.yaml
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: null
  name: default
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
      providerParameters:
        type: AWS
        aws:
          type: NLB
    type: LoadBalancerService
EOF

Update the VPC tag with the InfraID

This step is required when the ELB Operator (not covered) will be installed. It will update the InfraID value on the VPC "cluster tag".

The following error is expected when installing the ELB Operator without setting the cluster tag: ERROR setup failed to get VPC ID {"error": "no VPC with tag \"kubernetes.io/cluster/<infra_id>\" found"}. Covered bug this Bug.

Edit the CloudFormation Template var file of the VPC stack

cat <<EOF | envsubst > ./stack-vpc-vars.json
[
  {
    "ParameterKey": "ClusterName",
    "ParameterValue": "${CLUSTER_NAME}"
  },
  {
    "ParameterKey": "ClusterInfraId",
    "ParameterValue": "${CLUSTER_ID}"
  },
  {
    "ParameterKey": "VpcCidr",
    "ParameterValue": "${VPC_CIDR}"
  },
  {
    "ParameterKey": "AvailabilityZoneCount",
    "ParameterValue": "3"
  },
  {
    "ParameterKey": "SubnetBits",
    "ParameterValue": "12"
  }
]
EOF

Update the stack

aws cloudformation update-stack \
  --stack-name ${STACK_VPC} \
  --template-body file://${STACK_VPC_TPL} \
  --parameters file://${STACK_VPC_VARS}

Install the cluster

Now it's time to create the cluster and check the results.

Create the cluster

./openshift-install create cluster --log-level=debug

Install summary

DEBUG Time elapsed per stage:
DEBUG            cluster: 4m28s
DEBUG          bootstrap: 36s
DEBUG Bootstrap Complete: 10m30s
DEBUG                API: 2m18s
DEBUG  Bootstrap Destroy: 57s
DEBUG  Cluster Operators: 8m39s
INFO Time elapsed: 25m50s

Cluster operator's summary

$ oc get co -o json \
    | jq -r ".items[].status.conditions[] | select(.type==\"Available\").status" \
    | sort |uniq -c
     32 True

$ oc get co -o json \
    | jq -r ".items[].status.conditions[] | select(.type==\"Degraded\").status" \
    | sort |uniq -c
     32 False

Machines in Local Zones

$ oc get machines -n openshift-machine-api \
  -l machine.openshift.io/zone=us-east-1-nyc-1a
NAME                                       PHASE     TYPE          REGION      ZONE               AGE
lzdemo-ds2dn-edge-us-east-1-nyc-1a-6645q   Running   c5d.2xlarge   us-east-1   us-east-1-nyc-1a   12m

Nodes in Local Zones filtering by custom labels defined on Machine Set (location, zone_group)

$ oc get nodes -l location=local-zone
NAME                           STATUS   ROLES         AGE   VERSION
ip-10-0-143-104.ec2.internal   Ready    edge,worker   11m   v1.24.0+beaaed6

Testing

Testing communication with services hosted on availability zones

Steps: - Check API server LB (internal) - Check API server endpoints (nodes) - Test ICMP

NODE_NAME=$(oc get nodes -l node-role.kubernetes.io/edge -o jsonpath={.items[0].metadata.name})
mapfile -t MASTER_NODES < <(oc get node -l node-role.kubernetes.io/master -o json |jq -r '.items[].status.addresses[] | select(.type == "InternalIP") | .address')
KPASS=$(cat auth/kubeadmin-password)
API_INT=$(oc get infrastructures cluster -o jsonpath={.status.apiServerInternalURI})

oc debug node/${NODE_NAME} --  chroot /host /bin/bash -c "\
echo \"#> calling  ${API_INT}/healthz\";
curl -sw \"\\nhttp_code: %{http_code}\\n\" -k ${API_INT}/healthz; \
for ND in ${MASTER_NODES[*]}; do \
echo \"#> calling  https://\${ND}:6443/healthz\";
curl -sw \"\\nhttp_code: %{http_code}\\n\" -k https://\${ND}:6443/healthz; \
echo \"#> pinging \";
ping -c 4 \${ND};
done;\
"

Pulling images from local registry

Steps described on the official documentation

Steps:

Get the node name of Local Zone
Export the Kubeadmin password
Pull random image from the local registry

NODE_NAME=$(oc get nodes -l node-role.kubernetes.io/edge -o jsonpath={.items[0].metadata.name})
KPASS=$(cat auth/kubeadmin-password)
API_INT=$(oc get infrastructures cluster -o jsonpath={.status.apiServerInternalURI})

oc debug node/${NODE_NAME} --  chroot /host /bin/bash -c "\
oc login --insecure-skip-tls-verify -u kubeadmin -p ${KPASS} ${API_INT}; \
podman login -u kubeadmin -p \$(oc whoami -t) image-registry.openshift-image-registry.svc:5000; \
podman pull image-registry.openshift-image-registry.svc:5000/openshift/tests"

Day-2 guide to change MTU on Existing cluster

Section moved to a dedicated guide.

Steps to Destroy the Cluster

To destroy the resources created, you need to first delete the cluster and then the CloudFormation stacks used to build the network.

Destroy the cluster

./openshift-install destroy cluster --log-level=debug

Destroy the Local Zone subnet(s) stack(s)

aws cloudformation delete-stack --stack-name ${STACK_LZ}

Destroy the Network Stack (VPC)

aws cloudformation delete-stack --stack-name ${STACK_VPC}

Final notes / Conclusion

The OpenShift cluster can be installed successfully in existing VPC which has subnets in the Local Zones when the tags have been set correctly. So new Machines Sets can be added to any new location.

It was not found any technical blocker to install OpenShift cluster in existing VPC which has subnets in AWS Local Zones, although there is a sort of configuration to be asserted to avoid issues on the default router and ELB Operator.

As described on the steps section, the setup created one Machine Set setting it to unscheduled, creating the node-role.kubernetes.io/edge=''. The suggestion to create a custom MachineSet named edge was to keep easy the management of resources operating in the Local Zones, which is in general more expensive than the parent zone (the costs are almost 20%). This is a design pattern, the label topology.kubernetes.io/zone can be mixed with taint rules when operating in many locations.

The installation process runs correctly as Day-0 Operation, the only limitation we have found when installing was the ingress controller trying to discover all the public subnets on the VPC to create the service for the default router. The workaround was provided by tagging the Local Zone subnets with kubernetes.io/cluster/unmanaged=true to avoid the Subnets Auto Discovery including the Local Zone Subnets into the default router.

Additionally, when installing the ALB Operator in Day 2 (available on 4.11), the operator requires the cluster tag kubernetes.io/cluster/<infraID>=.* to run successfully, although the installer does not require it when installing a cluster in existing VPC[1]. The steps to use ALB on services deployed in Local Zones exploring the low-latency feature are not covered in this document, an experiment creating the operator from source can be found here.

Resources produced:

UPI CloudFormation template for VPC reviewed/updated
New CloudFormation template to create Local Zone subnets created
Steps for OpenShift 4.11 installing with support to create compute nodes in Local Zones

Takeaways / Important notes:

The Local Zone subnets should have the tag kubernetes.io/cluster/unmanaged=true to avoid the Subnet Discovery for load balancer controller automatically add the subnet located on the Local Zone to the default router.
The VPC should have the tag kubernetes.io/cluster/<infraID>=shared to install correctly the AWS ELB Operator (not covered in this post)
Local Zones do not support Nat Gateways, so there are two options for nodes on Local Zones to access the internet:

1) Create the private subnet, associating the Local Zones subnet to one parent region route table, then create the machine in the private subnet without mapping public IP. 2) Use a public subnet on Local Zone and map the public IP to the instance (Machine spec). There are no security constraints as the Security Group rules block all the access outside the VPC (default installation). The NLB has more unrestrictive rules on the security groups. Option 1 should be better until it is not improved. That option also implies extra data transfer fees from the instance located on the Local Zone to the parent zone, in addition to the standard costs to the internet.