Create an Agent cluster
This document explains how to create HostedClusters and NodePools using the Agent platform.
The Agent platform uses the Infrastructure Operator (AKA Assisted Installer) to add worker nodes to a hosted cluster. For a primer on the Infrastructure Operator, see here.
Overview
When you create a HostedCluster with the Agent platform, HyperShift will install the Agent CAPI provider in the Hosted Control Plane (HCP) namespace.
Upon scaling up a NodePool, a Machine will be created, and the CAPI provider will find a suitable Agent to match this Machine.
Suitable means that the Agent is approved, is passing validations, is not currently bound (in use), and has the requirements
specified on the NodePool Spec (e.g., minimum CPU/RAM, labels matching the label selector). You may monitor the installation of an
Agent by checking its Status
and Conditions
.
Upon scaling down a NodePool, Agents will be unbound from the corresponding cluster. However, you must boot them with the Discovery Image once again before reusing them.
Install HyperShift Operator
Before installing the HyperShift operator we need to get the HyperShift CLI. We have two methods for getting the CLI installed in our system.
Method 1 - Build the HyperShift CLI
Follow instructions for building the HyperShift CLI in Getting Started
Method 2 - Extract HyperShift CLI from the Operator Image
INFO: We are using Podman in the example, same applies to Docker.
export HYPERSHIFT_RELEASE=4.11
podman cp $(podman create --name hypershift --rm --pull always quay.io/hypershift/hypershift-operator:${HYPERSHIFT_RELEASE}):/usr/bin/hypershift /tmp/hypershift && podman rm -f hypershift
sudo install -m 0755 -o root -g root /tmp/hypershift /usr/local/bin/hypershift
Deploy the HyperShift Operator
With the CLI deployed, we can go ahead and deploy the operator:
WARN: If we don't define the HyperShift image we want to use, by default the CLI will deploy
latest
. Usually you want to deploy the image matching the release of the OpenShift cluster where HyperShift will run.
# This install latest
hypershift install
# You may want to run this instead
hypershift install --hypershift-image quay.io/hypershift/hypershift-operator:4.11
You will see the operator running in the hypershift
namespace:
oc -n hypershift get pods
NAME READY STATUS RESTARTS AGE
operator-55fffbd6-whkxs 1/1 Running 0 61s
Install Assisted Service and Hive Operators
NOTE: If Red Hat Advanced Cluster Management (RHACM) is already installed, this can be skipped as the Infrastructure Operator and Hive Operator are dependencies of RHACM.
We will leverage tasty
to deploy the required operators easily.
Install tasty:
curl -s -L https://github.com/karmab/tasty/releases/download/v0.4.0/tasty-linux-amd64 > ./tasty
sudo install -m 0755 -o root -g root ./tasty /usr/local/bin/tasty
Install the operators
tasty install assisted-service-operator hive-operator
Configure Agent Service
Create the AgentServiceConfig
resource
export DB_VOLUME_SIZE="10Gi"
export FS_VOLUME_SIZE="10Gi"
export OCP_VERSION="4.11.5"
export OCP_MAJMIN=${OCP_VERSION%.*}
export ARCH="x86_64"
export OCP_RELEASE_VERSION=$(curl -s https://mirror.openshift.com/pub/openshift-v4/${ARCH}/clients/ocp/${OCP_VERSION}/release.txt | awk '/machine-os / { print $2 }')
export ISO_URL="https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/${OCP_MAJMIN}/${OCP_VERSION}/rhcos-${OCP_VERSION}-${ARCH}-live.${ARCH}.iso"
export ROOT_FS_URL="https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/${OCP_MAJMIN}/${OCP_VERSION}/rhcos-${OCP_VERSION}-${ARCH}-live-rootfs.${ARCH}.img"
envsubst <<"EOF" | oc apply -f -
apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
name: agent
spec:
databaseStorage:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: ${DB_VOLUME_SIZE}
filesystemStorage:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: ${FS_VOLUME_SIZE}
osImages:
- openshiftVersion: "${OCP_VERSION}"
version: "${OCP_RELEASE_VERSION}"
url: "${ISO_URL}"
rootFSUrl: "${ROOT_FS_URL}"
cpuArchitecture: "${ARCH}"
EOF
Configure DNS
The API Server for the Hosted Cluster is exposed a Service of type NodePort.
A DNS entry must exist for api.${HOSTED_CLUSTER_NAME}.${BASEDOMAIN}
pointing to destination where the API Server can be reached.
This can be as simple as an A record pointing to one of the nodes in the management cluster (i.e. the cluster running the HCP). It can also point to a load balancer deployed to redirect incoming traffic to the ingress pods.
Example DNS Config
api.example.krnl.es. IN A 192.168.122.20
api.example.krnl.es. IN A 192.168.122.21
api.example.krnl.es. IN A 192.168.122.22
api-int.example.krnl.es. IN A 192.168.122.20
api-int.example.krnl.es. IN A 192.168.122.21
api-int.example.krnl.es. IN A 192.168.122.22
*.apps.example.krnl.es. IN A 192.168.122.23
Create a Hosted Cluster
WARN: Make sure you have a default storage class configured for your cluster, otherwise you may end up with pending PVCs.
export CLUSTERS_NAMESPACE="clusters"
export HOSTED_CLUSTER_NAME="example"
export HOSTED_CONTROL_PLANE_NAMESPACE="${CLUSTERS_NAMESPACE}-${HOSTED_CLUSTER_NAME}"
export BASEDOMAIN="krnl.es"
export PULL_SECRET_FILE=$PWD/pull-secret
export OCP_RELEASE=4.11.5-x86_64
export MACHINE_CIDR=192.168.122.0/24
# Typically the namespace is created by the hypershift-operator
# but agent cluster creation generates a capi-provider role that
# needs the namespace to already exist
oc create ns ${HOSTED_CONTROL_PLANE_NAMESPACE}
hypershift create cluster agent \
--name=${HOSTED_CLUSTER_NAME} \
--pull-secret=${PULL_SECRET_FILE} \
--agent-namespace=${HOSTED_CONTROL_PLANE_NAMESPACE} \
--base-domain=${BASEDOMAIN} \
--api-server-address=api.${HOSTED_CLUSTER_NAME}.${BASEDOMAIN} \
--release-image=quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE}
After a few moments we should see our hosted control plane pods up and running:
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get pods
NAME READY STATUS RESTARTS AGE
capi-provider-7dcf5fc4c4-nr9sq 1/1 Running 0 4m32s
catalog-operator-6cd867cc7-phb2q 2/2 Running 0 2m50s
certified-operators-catalog-884c756c4-zdt64 1/1 Running 0 2m51s
cluster-api-f75d86f8c-56wfz 1/1 Running 0 4m32s
cluster-autoscaler-7977864686-2rz4c 1/1 Running 0 4m13s
cluster-network-operator-754cf4ffd6-lwfm2 1/1 Running 0 2m51s
cluster-policy-controller-784f995d5-7cbrz 1/1 Running 0 2m51s
cluster-version-operator-5c68f7f4f8-lqzcm 1/1 Running 0 2m51s
community-operators-catalog-58599d96cd-vpj2v 1/1 Running 0 2m51s
control-plane-operator-f6b4c8465-4k5dh 1/1 Running 0 4m32s
etcd-0 1/1 Running 0 4m13s
hosted-cluster-config-operator-c4776f89f-dt46j 1/1 Running 0 2m51s
ignition-server-7cd8676fc5-hjx29 1/1 Running 0 4m22s
ingress-operator-75484cdc8c-zhdz5 1/2 Running 0 2m51s
konnectivity-agent-c5485c9df-jsm9s 1/1 Running 0 4m13s
konnectivity-server-85dc754888-7z8vm 1/1 Running 0 4m13s
kube-apiserver-db5fb5549-zlvpq 3/3 Running 0 4m13s
kube-controller-manager-5fbf7b7b7b-mrtjj 1/1 Running 0 90s
kube-scheduler-776c59d757-kfhv6 1/1 Running 0 3m12s
machine-approver-c6b947895-lkdbk 1/1 Running 0 4m13s
oauth-openshift-787b87cff6-trvd6 2/2 Running 0 87s
olm-operator-69c4657864-hxwzk 2/2 Running 0 2m50s
openshift-apiserver-67f9d9c5c7-c9bmv 2/2 Running 0 89s
openshift-controller-manager-5899fc8778-q89xh 1/1 Running 0 2m51s
openshift-oauth-apiserver-569c78c4d-568v8 1/1 Running 0 2m52s
packageserver-ddfffb8d7-wlz6l 2/2 Running 0 2m50s
redhat-marketplace-catalog-7dd77d896-jtxkd 1/1 Running 0 2m51s
redhat-operators-catalog-d66b5c965-qwhn7 1/1 Running 0 2m51s
Create an InfraEnv
An InfraEnv is a enviroment to which hosts booting the live ISO can join as Agents. In this case, the Agents will be created in the same namespace as our HostedControlPlane.
export SSH_PUB_KEY=$(cat $HOME/.ssh/id_rsa.pub)
envsubst <<"EOF" | oc apply -f -
apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
name: ${HOSTED_CLUSTER_NAME}
namespace: ${HOSTED_CONTROL_PLANE_NAMESPACE}
spec:
pullSecretRef:
name: pull-secret
sshAuthorizedKey: ${SSH_PUB_KEY}
EOF
This will generate a live ISO that allows machines (VMs or bare-metal) to join as Agents.
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get InfraEnv ${HOSTED_CLUSTER_NAME} -ojsonpath="{.status.isoDownloadURL}"
Adding Agents
You can add Agents by manually configuring the machine to boot with the live ISO or by using Metal3.
Manual
The live ISO may be downloaded and used to boot a node (bare-metal or VM).
On boot, the node will communicate with the assisted-service and register as an Agent in the same namespace as the InfraEnv.
Once each Agent is created, optionally set its installation_disk_id and hostname in the Spec. Then approve it to indicate that the Agent is ready for use.
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get agents
NAME CLUSTER APPROVED ROLE STAGE
86f7ac75-4fc4-4b36-8130-40fa12602218 auto-assign
e57a637f-745b-496e-971d-1abbf03341ba auto-assign
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} patch agent 86f7ac75-4fc4-4b36-8130-40fa12602218 -p '{"spec":{"installation_disk_id":"/dev/sda","approved":true,"hostname":"worker-0.example.krnl.es"}}' --type merge
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} patch agent 23d0c614-2caa-43f5-b7d3-0b3564688baa -p '{"spec":{"installation_disk_id":"/dev/sda","approved":true,"hostname":"worker-1.example.krnl.es"}}' --type merge
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get agents
NAME CLUSTER APPROVED ROLE STAGE
86f7ac75-4fc4-4b36-8130-40fa12602218 true auto-assign
e57a637f-745b-496e-971d-1abbf03341ba true auto-assign
Metal3
We will leverage the Assisted Service and Hive to create the custom ISO as well as the Baremetal Operator to perform the installation.
WARN: Since the
BaremetalHost
objects will be created outside the baremetal-operator namespace we need to configure the operator to watch all namespaces.
oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"watchAllNamespaces": true }}'
INFO: This will trigger a restart of the
metal3
pod in theopenshift-machine-api
namespace.
- Wait until the
metal3
pod is ready again:
until oc wait -n openshift-machine-api $(oc get pods -n openshift-machine-api -l baremetal.openshift.io/cluster-baremetal-operator=metal3-state -o name) --for condition=containersready --timeout 10s >/dev/null 2>&1 ; do sleep 1 ; done
Now we can go ahead and create our BaremetalHost objects. We will need to configure some variables required to be able to boot our bare-metal nodes.
BMC_USERNAME
: Username to be used for connecting to the BMC.BMC_PASSWORD
: Password to be used for connecting to the BMC.BMC_IP
: IP used by Metal3 to connect to the BMC.WORKER_NAME
: Name of the BaremetalHost object (this will be used as hostname as well)BOOT_MAC_ADDRESS
: MAC address of the NIC connected to the MachineNetwork.UUID
: Redfish UUID, this is usually1
. If using sushy-tools this will be a long UUID. If using iDrac this will beSystem.Embedded.1
. You may need to check with the vendor.REDFISH_SCHEME
: The Redfish provider to use. If using hardware that uses a standard Redfish implementation you can set this toredfish-virtualmedia
. iDRAC will useidrac-virtualmedia
. iLO5 will useilo5-virtualmedia
. You may need to check with the vendor.REDFISH
: Redfish connection endpoint.
export BMC_USERNAME=$(echo -n "root" | base64 -w0)
export BMC_PASSWORD=$(echo -n "calvin" | base64 -w0)
export BMC_IP="192.168.124.228"
export WORKER_NAME="ocp-worker-0"
export BOOT_MAC_ADDRESS="aa:bb:cc:dd:ee:ff"
export UUID="1"
export REDFISH_SCHEME="redfish-virtualmedia"
export REDFISH="${REDFISH_SCHEME}://${BMC_IP}/redfish/v1/Systems/${UUID}"
With the required information ready, let's create the BaremetalHost. First we will create the BMC Secret:
envsubst <<"EOF" | oc apply -f -
apiVersion: v1
data:
password: ${BMC_PASSWORD}
username: ${BMC_USERNAME}
kind: Secret
metadata:
name: ${WORKER_NAME}-bmc-secret
namespace: ${HOSTED_CONTROL_PLANE_NAMESPACE}
type: Opaque
EOF
Second, we will create the BMH:
INFO:
infraenvs.agent-install.openshift.io
label is used to specify which InfraEnv is used to boot the BMH.bmac.agent-install.openshift.io/hostname
is used to manually set a hostname.
In case you want to manually specify the installation disk you can make use of the rootDeviceHints in the BMH Spec. If rootDeviceHints are not provided, the agent will pick the installation disk that better suits the installation requirements.
envsubst <<"EOF" | oc apply -f -
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: ${WORKER_NAME}
namespace: ${HOSTED_CONTROL_PLANE_NAMESPACE}
labels:
infraenvs.agent-install.openshift.io: ${HOSTED_CLUSTER_NAME}
annotations:
inspect.metal3.io: disabled
bmac.agent-install.openshift.io/hostname: ${WORKER_NAME}
spec:
automatedCleaningMode: disabled
bmc:
disableCertificateVerification: True
address: ${REDFISH}
credentialsName: ${WORKER_NAME}-bmc-secret
bootMACAddress: ${BOOT_MAC_ADDRESS}
online: true
EOF
The Agent should be automatically approved, if not, make sure the bootMACAddress
is correct.
The BMH will be provisioned:
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get bmh
NAME STATE CONSUMER ONLINE ERROR AGE
ocp-worker-0 provisioning true 2m50s
BMH will reach provisioned
state eventually.
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get bmh
NAME STATE CONSUMER ONLINE ERROR AGE
ocp-worker-0 provisioned true 72s
Provisioned means that the node was configured to boot from the virtualCD properly. It will take a few moments for the Agent to show up:
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get agent
NAME CLUSTER APPROVED ROLE STAGE
4dac1ab2-7dd5-4894-a220-6a3473b67ee6 true auto-assign
As you can see it was auto-approved. We will repeat this with another two nodes.
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get agent
NAME CLUSTER APPROVED ROLE STAGE
4dac1ab2-7dd5-4894-a220-6a3473b67ee6 true auto-assign
d9198891-39f4-4930-a679-65fb142b108b true auto-assign
da503cf1-a347-44f2-875c-4960ddb04091 true auto-assign
Accessing the HostedCluster
We have the HostedControlPlane running and the Agents ready to join the HostedCluster. Before we join the Agents let's access the HostedCluster.
First, we need to generate the kubeconfig:
hypershift create kubeconfig --namespace ${CLUSTERS_NAMESPACE} --name ${HOSTED_CLUSTER_NAME} > ${HOSTED_CLUSTER_NAME}.kubeconfig
If we access the cluster we will see that we don't have any nodes and that the ClusterVersion is trying to reconcile the OCP release:
oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig get clusterversion,nodes
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
clusterversion.config.openshift.io/version False True 8m6s Unable to apply 4.11.5: some cluster operators have not yet rolled out
In order to get the cluster in a running state we need to add some nodes to it. Let's do it.
Scale the NodePool
We add nodes to our HostedCluster by scaling the NodePool object. In this case we will start by scaling the NodePool object to two nodes:
oc -n ${CLUSTERS_NAMESPACE} scale nodepool ${NODEPOOL_NAME} --replicas 2
The ClusterAPI Agent provider will pick two agents randomly that will get assigned to the HostedCluster. These agents will go over different states and will finally join the HostedCluster as OpenShift nodes.
INFO: States will be
binding
->discoverying
->insufficient
->installing
->installing-in-progress
->added-to-existing-cluster
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get agent
NAME CLUSTER APPROVED ROLE STAGE
4dac1ab2-7dd5-4894-a220-6a3473b67ee6 hypercluster1 true auto-assign
d9198891-39f4-4930-a679-65fb142b108b true auto-assign
da503cf1-a347-44f2-875c-4960ddb04091 hypercluster1 true auto-assign
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get agent -o jsonpath='{range .items[*]}BMH: {@.metadata.labels.agent-install\.openshift\.io/bmh} Agent: {@.metadata.name} State: {@.status.debugInfo.state}{"\n"}{end}'
BMH: ocp-worker-2 Agent: 4dac1ab2-7dd5-4894-a220-6a3473b67ee6 State: binding
BMH: ocp-worker-0 Agent: d9198891-39f4-4930-a679-65fb142b108b State: known-unbound
BMH: ocp-worker-1 Agent: da503cf1-a347-44f2-875c-4960ddb04091 State: insufficient
Once the agents have reached the added-to-existing-cluster
state, we should see the OpenShift nodes after a few moments:
oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig get nodes
NAME STATUS ROLES AGE VERSION
ocp-worker-1 Ready worker 5m41s v1.24.0+3882f8f
ocp-worker-2 Ready worker 6m3s v1.24.0+3882f8f
At this point some ClusterOperators will start to reconcile by adding workloads to the nodes.
We can also see that two Machines were created when we scaled up the NodePool:
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get machines
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
hypercluster1-c96b6f675-m5vch hypercluster1-b2qhl ocp-worker-1 agent://da503cf1-a347-44f2-875c-4960ddb04091 Running 15m 4.11.5
hypercluster1-c96b6f675-tl42p hypercluster1-b2qhl ocp-worker-2 agent://4dac1ab2-7dd5-4894-a220-6a3473b67ee6 Running 15m 4.11.5
At some point the clusterversion reconcile will reach a point where only Ingress and Console cluster operators will be missing:
oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig get clusterversion,co
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
clusterversion.config.openshift.io/version False True 40m Unable to apply 4.11.5: the cluster operator console has not yet successfully rolled out
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
clusteroperator.config.openshift.io/console 4.11.5 False False False 11m RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hypercluster1.domain.com): Get "https://console-openshift-console.apps.hypercluster1.domain.com": dial tcp 10.19.3.29:443: connect: connection refused
clusteroperator.config.openshift.io/csi-snapshot-controller 4.11.5 True False False 10m
clusteroperator.config.openshift.io/dns 4.11.5 True False False 9m16s
clusteroperator.config.openshift.io/image-registry 4.11.5 True False False 9m5s
clusteroperator.config.openshift.io/ingress 4.11.5 True False True 39m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
clusteroperator.config.openshift.io/insights 4.11.5 True False False 11m
clusteroperator.config.openshift.io/kube-apiserver 4.11.5 True False False 40m
clusteroperator.config.openshift.io/kube-controller-manager 4.11.5 True False False 40m
clusteroperator.config.openshift.io/kube-scheduler 4.11.5 True False False 40m
clusteroperator.config.openshift.io/kube-storage-version-migrator 4.11.5 True False False 10m
clusteroperator.config.openshift.io/monitoring 4.11.5 True False False 7m38s
clusteroperator.config.openshift.io/network 4.11.5 True False False 11m
clusteroperator.config.openshift.io/openshift-apiserver 4.11.5 True False False 40m
clusteroperator.config.openshift.io/openshift-controller-manager 4.11.5 True False False 40m
clusteroperator.config.openshift.io/openshift-samples 4.11.5 True False False 8m54s
clusteroperator.config.openshift.io/operator-lifecycle-manager 4.11.5 True False False 40m
clusteroperator.config.openshift.io/operator-lifecycle-manager-catalog 4.11.5 True False False 40m
clusteroperator.config.openshift.io/operator-lifecycle-manager-packageserver 4.11.5 True False False 40m
clusteroperator.config.openshift.io/service-ca 4.11.5 True False False 11m
clusteroperator.config.openshift.io/storage 4.11.5 True False False 11m
Let's fix the Ingress.
Handling Ingress
Every OpenShift cluster comes set up with a default application ingress controller, which is expected have an external DNS record associated with it.
For example, if a HyperShift cluster named example
with the base domain
krnl.es
is created, then the wildcard domain
*.apps.example.krnl.es
is expected to be routable.
Set up a LoadBalancer and wildcard DNS record for the *.apps
.
This option requires deploying MetalLB, configuring a new LoadBalancer service that routes to the ingress deployment, as well as assigning a wildcard DNS entry to the LoadBalancer's IP address.
Step 1 - Get the MetalLB Operator Deployed
Set up MetalLB so that when you create a service of type LoadBalancer, MetalLB will add an external IP address for the service.
cat <<"EOF" | oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig apply -f -
---
apiVersion: v1
kind: Namespace
metadata:
name: metallb
labels:
openshift.io/cluster-monitoring: "true"
annotations:
workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: metallb-operator-operatorgroup
namespace: metallb
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: metallb-operator
namespace: metallb
spec:
channel: "stable"
name: metallb-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
Once the operator is up and running, create the MetalLB instance:
cat <<"EOF" | oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig apply -f -
apiVersion: metallb.io/v1beta1
kind: MetalLB
metadata:
name: metallb
namespace: metallb
EOF
Step 2 - Get the MetalLB Operator Configured
We will create an IPAddressPool
with a single IP address and L2Advertisement to advertise the LoadBalancer IPs provided by the IPAddressPool
via L2.
Since layer 2 mode relies on ARP and NDP, the IP address must be on the same subnet as the network used by the cluster nodes in order for the MetalLB to work.
more information about metalLB configuration options is available here.
WARN: Change
INGRESS_IP
env var to match your environments addressing.
export INGRESS_IP=192.168.122.23
envsubst <<"EOF" | oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: ingress-public-ip
namespace: metallb
spec:
protocol: layer2
autoAssign: false
addresses:
- ${INGRESS_IP}-${INGRESS_IP}
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: ingress-public-ip
namespace: metallb
EOF
Step 3 - Get the OpenShift Router exposed via MetalLB
Set up the LoadBalancer Service that routes ingress traffic to the ingress deployment.
cat <<"EOF" | oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig apply -f -
kind: Service
apiVersion: v1
metadata:
annotations:
metallb.universe.tf/address-pool: ingress-public-ip
name: metallb-ingress
namespace: openshift-ingress
spec:
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
- name: https
protocol: TCP
port: 443
targetPort: 443
selector:
ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
type: LoadBalancer
EOF
We already configured the wildcard record in our example DNS config:
*.apps.example.krnl.es. IN A 192.168.122.23
So we should be able to reach the OCP Console now:
curl -kI https://console-openshift-console.apps.example.krnl.es
HTTP/1.1 200 OK
And if we check the clusterversion and clusteroperator we should have everything up and running now:
oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig get clusterversion,co
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
clusterversion.config.openshift.io/version 4.11.5 True False 3m32s Cluster version is 4.11.5
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
clusteroperator.config.openshift.io/console 4.11.5 True False False 3m50s
clusteroperator.config.openshift.io/csi-snapshot-controller 4.11.5 True False False 25m
clusteroperator.config.openshift.io/dns 4.11.5 True False False 23m
clusteroperator.config.openshift.io/image-registry 4.11.5 True False False 23m
clusteroperator.config.openshift.io/ingress 4.11.5 True False False 53m
clusteroperator.config.openshift.io/insights 4.11.5 True False False 25m
clusteroperator.config.openshift.io/kube-apiserver 4.11.5 True False False 54m
clusteroperator.config.openshift.io/kube-controller-manager 4.11.5 True False False 54m
clusteroperator.config.openshift.io/kube-scheduler 4.11.5 True False False 54m
clusteroperator.config.openshift.io/kube-storage-version-migrator 4.11.5 True False False 25m
clusteroperator.config.openshift.io/monitoring 4.11.5 True False False 21m
clusteroperator.config.openshift.io/network 4.11.5 True False False 25m
clusteroperator.config.openshift.io/openshift-apiserver 4.11.5 True False False 54m
clusteroperator.config.openshift.io/openshift-controller-manager 4.11.5 True False False 54m
clusteroperator.config.openshift.io/openshift-samples 4.11.5 True False False 23m
clusteroperator.config.openshift.io/operator-lifecycle-manager 4.11.5 True False False 54m
clusteroperator.config.openshift.io/operator-lifecycle-manager-catalog 4.11.5 True False False 54m
clusteroperator.config.openshift.io/operator-lifecycle-manager-packageserver 4.11.5 True False False 54m
clusteroperator.config.openshift.io/service-ca 4.11.5 True False False 25m
clusteroperator.config.openshift.io/storage 4.11.5 True False False 25m
Enabling Node Auto-Scaling for the Hosted Cluster
Auto-scaling can be enabled, if we choose to enable auto-scaling, when more capacity is require in our Hosted Cluster a new Agent will be installed (providing that we have spare agents). In order to enable auto-scaling we can run the following command:
INFO: In this case the minimum nodes will be 2 and the maximum 5.
oc -n ${CLUSTERS_NAMESPACE} patch nodepool ${HOSTED_CLUSTER_NAME} --type=json -p '[{"op": "remove", "path": "/spec/replicas"},{"op":"add", "path": "/spec/autoScaling", "value": { "max": 5, "min": 2 }}]'
If 10 minutes passes without requiring the additional capacity the agent will be decommissioned and placed in the spare queue again.
-
Let's create a workload that requires a new node.
cat <<EOF | oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig apply -f - apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: reversewords name: reversewords namespace: default spec: replicas: 40 selector: matchLabels: app: reversewords strategy: {} template: metadata: creationTimestamp: null labels: app: reversewords spec: containers: - image: quay.io/mavazque/reversewords:latest name: reversewords resources: requests: memory: 2Gi status: {} EOF
-
We will see the remaining agent starts getting deployed.
INFO: The spare agent
d9198891-39f4-4930-a679-65fb142b108b
started getting provisioned.oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get agent -o jsonpath='{range .items[*]}BMH: {@.metadata.labels.agent-install\.openshift\.io/bmh} Agent: {@.metadata.name} State: {@.status.debugInfo.state}{"\n"}{end}' BMH: ocp-worker-2 Agent: 4dac1ab2-7dd5-4894-a220-6a3473b67ee6 State: added-to-existing-cluster BMH: ocp-worker-0 Agent: d9198891-39f4-4930-a679-65fb142b108b State: installing-in-progress BMH: ocp-worker-1 Agent: da503cf1-a347-44f2-875c-4960ddb04091 State: added-to-existing-cluster
-
If we check the nodes we will see a new one joined the cluster.
INFO: We got ocp-worker-0 added to the cluster
oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig get nodes NAME STATUS ROLES AGE VERSION ocp-worker-0 Ready worker 35s v1.24.0+3882f8f ocp-worker-1 Ready worker 40m v1.24.0+3882f8f ocp-worker-2 Ready worker 41m v1.24.0+3882f8f
-
If we delete the workload and wait 10 minutes the node will be removed.
oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig -n default delete deployment reversewords
-
After 10 minutes.
oc --kubeconfig ${HOSTED_CLUSTER_NAME}.kubeconfig get nodes NAME STATUS ROLES AGE VERSION ocp-worker-1 Ready worker 51m v1.24.0+3882f8f ocp-worker-2 Ready worker 52m v1.24.0+3882f8f
oc -n ${HOSTED_CONTROL_PLANE_NAMESPACE} get agent -o jsonpath='{range .items[*]}BMH: {@.metadata.labels.agent-install\.openshift\.io/bmh} Agent: {@.metadata.name} State: {@.status.debugInfo.state}{"\n"}{end}' BMH: ocp-worker-2 Agent: 4dac1ab2-7dd5-4894-a220-6a3473b67ee6 State: added-to-existing-cluster BMH: ocp-worker-0 Agent: d9198891-39f4-4930-a679-65fb142b108b State: known-unbound BMH: ocp-worker-1 Agent: da503cf1-a347-44f2-875c-4960ddb04091 State: added-to-existing-cluster