Configure Metrics Sets
HyperShift creates ServiceMonitor resources in each control plane namespace that allow a Prometheus stack to scrape metrics from the control planes. ServiceMonitors use metrics relabelings to define which metrics are included or excluded from a particular component (etcd, Kube API server, etc) The number of metrics produced by control planes has a direct impact on resource requirements of the monitoring stack scraping them.
Instead of producing a fixed number of metrics that apply to all situations, HyperShift allows configuration of a "metrics set" that identifies a set of metrics to produce per control plane.
The following metrics sets are supported:
Telemetry
- metrics needed for telemetry. This is the default and the smallest set of metrics.SRE
- Configurable metrics set, intended to include necessary metrics to produce alerts and allow troubleshooting of control plane components.All
- all the metrics produced by standalone OCP control plane components.
The metrics set is configured by setting the METRICS_SET
environment variable in the HyperShift
operator deployment:
oc set env -n hypershift deployment/operator METRICS_SET=All
Configuring the SRE Metrics Set
When the SRE metrics set is specified, the HyperShift operator looks for a ConfigMap named
sre-metric-set
with a single key: config
. The value of the config
key should contain a set
of RelabelConfigs organized by control plane component. An example of this configuration can be
found in support/metrics/testdata/sreconfig.yaml
in this repository.
The following components can be specified:
- etcd
- kubeAPIServer
- kubeControllerManager
- openshiftAPIServer
- openshiftControllerManager
- openshiftRouteControllerManager
- cvo
- olm
- catalogOperator
- registryOperator
- nodeTuningOperator
- controlPlaneOperator
- hostedClusterConfigOperator