Controller architecture
High Level Overview
Physical layout and operating model
Legend:
- Yellow box: namespace
- Rounded box: processes
- Rectangle: CR instances
flowchart LR
subgraph hypershift
cluster-operator([HyperShift Operator])
end
subgraph user-clusters
HostedClusterA
NodePoolA
end
subgraph cluster-a
control-plane-operator([Control Plane Operator])
capi-manager([CAPI Manager])
capi-provider([CAPI Provider])
HostedControlPlane
ExternalInfraCluster
cp-components([Control Plane Components])
capi-cluster[CAPICluster]
capi-machine-template[CAPIInfrastructureMachineTemplate]
capi-machineset[CAPI MachineSet]
capi-machine[CAPI Machine]
capi-provider-machine[CAPIInfrastructureMachine]
end
cluster-operator-->|reconciles|HostedClusterA
cluster-operator-->|operates|control-plane-operator
cluster-operator-->|operates|capi-manager
cluster-operator-->|operates|capi-provider
cluster-operator-->|creates|HostedControlPlane
cluster-operator-->|creates|capi-cluster
cluster-operator-->|creates|ExternalInfraCluster
cluster-operator-->|reconciles|NodePoolA
cluster-operator-->|creates|capi-machine-template
cluster-operator-->|creates|capi-machineset
control-plane-operator-->|operates|cp-components
control-plane-operator-->|reconciles|HostedControlPlane
capi-manager-->|reconciles|capi-cluster
capi-manager-->|reconciles|capi-machineset
capi-manager-->|creates|capi-machine
capi-provider-->|reconciles|capi-machine
capi-provider-->|creates|capi-provider-machine
TODO:
1. How do we (or should we) represent an input/output or "consumes" relationship (e.g. the hypershift operator creates and syncs machine templates, and the CAPI provider reads the template, but nothing actively watches templates and does work in reaction to them directly)
Major Components
HyperShift Operator
The HyperShift Operator is a singleton within the management cluster that manages the lifecycle of hosted clusters represented by HostedCluster
resources.
A single version of the the HyperShift Operator knows how to manage multiple hosted OCP versions.
The HyperShift Operator is responsible for:
- Processing
HostedCluster
and NodePool
resources and managing Control Plane Operator and Cluster API (CAPI) deployments which do the actual work of installing a control plane.
- Managing the lifecycle of the hosted cluster by handling rollouts of new Control Plane Operator and CAPI deployments based on version changes to
HostedCluster
and NodePool
resources.
- Aggregating and surfacing information about clusters.
HostedCluster Controller
graph TD
hosted-cluster-controller[HostedCluster Controller] --> reconcile([Reconcile HostedCluster])
reconcile --> is-deleted{{Deleted?}}
is-deleted -->|Yes| teardown([Teardown])
is-deleted -->|No| sync([Sync])
teardown -->teardown-complete{{Teardown complete?}}
teardown-complete -->|Yes| return
teardown-complete -->|No| reconcile
sync --> create-namespace([Create Namespace])
create-namespace --> deploy-cp-operator([Deploy Control Plane Operator])
deploy-cp-operator --> deploy-capi-manager([Deploy CAPI Manager])
deploy-capi-manager --> deploy-capi-provider([Deploy CAPI Provider])
deploy-capi-provider --> create-capi-cluster([Create CAPICluster])
create-capi-cluster --> create-hosted-control-plane([Create HostedControlPlane])
create-hosted-control-plane --> create-external-infra-cluster([Create ExternalInfraCluster])
create-external-infra-cluster -->has-initial-nodes{{HostedCluster has initial nodes?}}
has-initial-nodes -->|Yes| create-node-pool([Create NodePool])
has-initial-nodes -->|No| return
create-node-pool --> return
return([End])
NodePool Controller
graph TD
nodepool-controller[NodePool Controller] --> reconcile([Reconcile NodePool])
reconcile --> is-deleted{{Deleted?}}
is-deleted -->|Yes| teardown([Teardown])
is-deleted -->|No| sync([Sync])
sync --> create-capi-machineset([Create CAPIMachineSet])
create-capi-machineset --> create-capi-infra-machine-template([Create CAPIInfrastructureMachineTemplate])
create-capi-infra-machine-template --> return
teardown -->teardown-complete{{Teardown complete?}}
teardown-complete -->|Yes| return
teardown-complete -->|No| reconcile
return([End])
ExternalInfraCluster Controller
graph TD
external-infra-cluster-controller[ExternalInfraCluster Controller] --> reconcile([Reconcile ExternalInfraCluster])
reconcile --> is-deleted{{Deleted?}}
is-deleted -->|Yes| teardown([Teardown])
is-deleted -->|No| sync([Sync])
teardown -->teardown-complete{{Teardown complete?}}
teardown-complete -->|Yes| return
teardown-complete -->|No| reconcile
sync --> get-hosted-control-plane([Get HostedControlPlane])
get-hosted-control-plane -->is-hcp-ready{{Is HostedControlPlane ready?}}
is-hcp-ready -->|No| reconcile
is-hcp-ready -->|Yes| update-infra-status([Update ExternalInfraCluster status])
update-infra-status --> return
return([End])
Control Plane Operator
The Control Plane Operator is deployed by the HyperShift Operator into a hosted control plane namespace and manages the rollout of a single version of the the hosted cluster's control plane.
The Control Plane Operator is versioned in lockstep with a specific OCP version and is decoupled from the management cluster's version.
The Control Plane Operator is responsible for:
- Provisioning all the infrastructure required to host a control plane (whether this means creating or adopting existing infrastructure). This infrastructure may be management cluster resources, external cloud provider resources, etc.
- Deploying an OCP control plane configured to run in the context of the provisioned infrastructure.
- Implementing any versioned behavior necessary to rollout the new version (e.g. version specific changes at layers above OCP itself, like configuration or infrastructure changes).
HostedControlPlane Controller
graph TD
hosted-control-plane-controller[HostedControlPlane Controller] --> reconcile([Reconcile HostedControlPlane])
reconcile --> is-deleted{{Deleted?}}
is-deleted -->|Yes| teardown([Teardown])
is-deleted -->|No| sync([Sync])
teardown -->teardown-complete{{Teardown complete?}}
teardown-complete -->|Yes| return
teardown-complete -->|No| reconcile
sync --> create-infra([Deploy Control Plane
Components])
create-infra --> create-config-operator([Deploy Hosted Cluster
Config Operator])
create-config-operator -->is-infra-ready{{Infra ready?}}
is-infra-ready -->|Yes| update-hosted-controlplane-ready([Update HostedControlPlane status])
is-infra-ready -->|No| reconcile
update-hosted-controlplane-ready --> return
return([End])
Hosted Cluster Config Operator
The Hosted Cluster Config Operator is a control plane component maintained by HyperShift that's a peer to other control plane components (e.g., etcd, apiserver, controller-manager), and is managed by the Control Plane Operator in the same way as those other control plane components.
The Hosted Cluster Config Operator is versioned in lockstep with a specific OCP version and is decoupled from the management cluster's version.
The Hosted Cluster Config Operator is responsible for:
- Reading CAs from the hosted cluster to configure the kube controller manager CA bundle running in the hosted control plane
- Reconciling resources that live on the hosted cluster:
- CRDs created by operators that are absent from the hosted cluster (RequestCount CRD created by cluster-kube-apiserver-operator)
- Clearing any user changes to the ClusterVersion resource (all updates should be driven via HostedCluster API)
- ClusterOperator stubs for control plane components that run outside.
- Global Configuration that is managed via the HostedCluster API
- Namespaces that are normally created by operators that are absent from the cluster.
- RBAC that is normally created by operators that are absent from the cluster.
- Registry configuration
- Default ingress controller
- Control Plane PKI (kubelet serving CA, control plane signer CA)
- Konnectivity Agent
- OpenShift APIServer resources (APIServices, Service, Endpoints)
- OpenShift OAuth APIServer resources (APIServices, Service, Endpoints)
- Monitoring Configuration (set node selector to non-master nodes)
- Pull Secret
- OAuth serving cert CA
- OAuthClients required by the console
- Cloud Credential Secrets (contain STS role for components that need cloud access)
- OLM CatalogSources
- OLM PackageServer resources (APIService, Service, Endpoints)
Resource dependency diagram
- Dotted lines are dependencies (ownerRefs)
- Solid lines are associations (e.g. infrastructureRefs or controlPlaneRefs on specs)
classDiagram
HostedCluster
HostedControlPlane ..> CAPICluster
ExternalInfraCluster ..> CAPICluster
CAPICluster ..> HostedCluster
CAPICluster --> HostedControlPlane
CAPICluster --> ExternalInfraCluster
CAPIMachineSet ..> CAPICluster
CAPIMachineSet --> CAPIInfrastructureMachineTemplate
CAPIMachine ..>CAPIMachineSet
CAPIMachine -->CAPIInfrastructureMachine
CAPIInfrastructureMachine ..>CAPIMachine
CAPIInfrastructureMachineTemplate ..>CAPICluster
Trying to show how certain important resources are derived from others. These are resources created by our operators, not by CAPI.
classDiagram
CAPICluster ..> HostedControlPlane
CAPICluster ..> ExternalInfraCluster
HostedControlPlane ..> HostedCluster
ExternalInfraCluster ..> HostedCluster
classDiagram
CAPIInfrastructureTemplate ..> NodePool
CAPIInfrastructureTemplate ..> HostedCluster
CAPIMachineSet ..> NodePool
CAPIMachineSet ..> HostedCluster
CAPIMachineSet ..> CAPIInfrastructureTemplate