-
Notifications
You must be signed in to change notification settings - Fork 212
OCPBUGS-66898: Implement mTLS authentication and authorization for CVO metrics endpoint #1271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Skipping CI for Draft Pull Request. |
WalkthroughSwitched metrics auth from bearer-token to TLS client certificates with CN-based authorization, introduced dynamic certificate controllers and a MetricsOptions API, removed fsnotify-based reloads and token-review code, updated ServiceMonitor and go.mod, and adjusted tests and startup wiring. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to 📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (1)**⚙️ CodeRabbit configuration file
Files:
🪛 ast-grep (0.40.3)pkg/cvo/metrics.go[warning] 270-270: MinVersion (missing-ssl-minversion-go) [warning] 304-307: MinVersion (missing-ssl-minversion-go) 🔇 Additional comments (10)
Comment |
|
/test ? |
|
@DavidHurta: The following commands are available to trigger required jobs: The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DavidHurta The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-agnostic-ovn |
1 similar comment
|
/test e2e-agnostic-ovn |
cb7083d to
e7705e2
Compare
|
/test all |
e6cc3ae to
920ad71
Compare
|
/test all |
|
Note: PromeCIeus and the |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/cvo/metrics.go (1)
201-299: Harden the listener TLS config: ensure MinVersion/ciphers apply even ifGetConfigForClientreturns nil.The base TLS config is secured via
crypto.SecureTLSConfig, but the listener uses a freshtls.Configliteral containing onlyGetConfigForClient. If the callback returns nil or fails, the listener operates with default TLS settings lacking explicit cipher suite and minimum version constraints. Wrap the listener config withcrypto.SecureTLSConfigto guarantee hardened settings at the listener level.- tlsConfig := &tls.Config{GetConfigForClient: servingCertController.GetConfigForClient} + tlsConfig := crypto.SecureTLSConfig(&tls.Config{GetConfigForClient: servingCertController.GetConfigForClient}) go startListening(server, tlsConfig, listenAddress, resultChannel)Also verify the process has RBAC to read
kube-system/extension-apiserver-authenticationand thatclient-ca-fileis the intended trust root for Prometheus' client cert chain in all target environments.
🧹 Nitpick comments (3)
pkg/cvo/metrics_test.go (1)
4-6: CN-based TLS auth tests are aligned with the new implementation; consider usinghttptest.NewRequestfor a fully-formed request.
Currenthttp.NewRequest("GET", "url-not-important", nil)is probably fine for this handler, buthttptest.NewRequest("GET", "https://example.invalid/metrics", nil)is more idiomatic and avoids edge cases if the handler later reads URL/Host.Also applies to: 1030-1088
pkg/cvo/metrics.go (2)
126-143: RenamedisableAuthtodisableAuthorizationto avoid confusion (authn vs authz).
Right nowcreateHttpServer(disableAuth bool)is called withmetricsOptions.DisableAuthorization, but the parameter name reads like “disable authentication”.-func createHttpServer(disableAuth bool) *http.Server { - if disableAuth { +func createHttpServer(disableAuthorization bool) *http.Server { + if disableAuthorization { handler := http.NewServeMux() handler.Handle("/metrics", promhttp.Handler()) server := &http.Server{ Handler: handler, } return server }
149-166: CN authorization is very strict; confirm the allowed identity set is complete for all supported deployments.
Hardcoding onlysystem:serviceaccount:openshift-monitoring:prometheus-k8smay be correct, but it will break scraping if the client cert CN differs in hosted control planes / alternate Prometheus setups. Consider allowing a small allowlist (still static) to reduce fragility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (69)
go.sumis excluded by!**/*.sumvendor/github.com/fsnotify/fsnotify/.cirrus.ymlis excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/.gitignoreis excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/.mailmapis excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/CHANGELOG.mdis excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/CONTRIBUTING.mdis excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/LICENSEis excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/README.mdis excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/backend_fen.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/backend_inotify.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/backend_kqueue.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/backend_other.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/backend_windows.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/fsnotify.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/darwin.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/debug_darwin.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/debug_dragonfly.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/debug_freebsd.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/debug_kqueue.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/debug_linux.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/debug_netbsd.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/debug_openbsd.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/debug_solaris.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/debug_windows.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/freebsd.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/internal.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/unix.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/unix2.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/internal/windows.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/shared.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/staticcheck.confis excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/system_bsd.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/fsnotify/fsnotify/system_darwin.gois excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/.editorconfigis excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/.gitignoreis excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/.travis.ymlis excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/AUTHORSis excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/CHANGELOG.mdis excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/CONTRIBUTING.mdis excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/LICENSEis excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/README.mdis excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/fen.gois excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/fsnotify.gois excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/inotify.gois excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/inotify_poller.gois excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/kqueue.gois excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/open_mode_bsd.gois excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/open_mode_darwin.gois excluded by!vendor/**,!**/vendor/**vendor/gopkg.in/fsnotify.v1/windows.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/cert_key.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/client_ca.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_cafile_content.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_serving_content.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_sni_content.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/interfaces.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/named_certificates.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/static_content.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/tlsconfig.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/union_content.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/util.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/client-go/tools/events/OWNERSis excluded by!vendor/**,!**/vendor/**vendor/k8s.io/client-go/tools/events/doc.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/client-go/tools/events/event_broadcaster.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/client-go/tools/events/event_recorder.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/client-go/tools/events/fake.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/client-go/tools/events/helper.gois excluded by!vendor/**,!**/vendor/**vendor/k8s.io/client-go/tools/events/interfaces.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (5)
go.mod(2 hunks)install/0000_90_cluster-version-operator_02_servicemonitor.yaml(1 hunks)pkg/cvo/metrics.go(5 hunks)pkg/cvo/metrics_test.go(2 hunks)pkg/start/start.go(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
install/0000_90_cluster-version-operator_02_servicemonitor.yamlpkg/cvo/metrics_test.gopkg/start/start.gopkg/cvo/metrics.gogo.mod
🧬 Code graph analysis (1)
pkg/start/start.go (1)
pkg/cvo/metrics.go (2)
MetricsOptions(196-199)RunMetrics(208-299)
🪛 ast-grep (0.40.0)
pkg/cvo/metrics.go
[warning] 245-245: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
[warning] 247-247: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{ClientAuth: tls.RequireAndVerifyClientCert}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
[warning] 273-273: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{GetConfigForClient: servingCertController.GetConfigForClient}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🔇 Additional comments (5)
install/0000_90_cluster-version-operator_02_servicemonitor.yaml (1)
14-21: Verify Prometheus mountsmetrics-client-certsat the referenced file paths (otherwise scraping will fail).
ServiceMonitor now expects client cert/key at/etc/prometheus/secrets/metrics-client-certs/{tls.crt,tls.key}; ensure the cluster-monitoring-managed Prometheus pod has that secret mounted and populated in all target topologies (self-managed + hosted).go.mod (1)
29-29: Dependency changes look consistent with dynamiccertificates adoption; please verify module graph is tidy and conflict-free.
Run ago mod tidy-equivalent in CI and ensure no remaining imports ofgopkg.in/fsnotify.v1, and nok8s.io/*version skew introduced by the new directk8s.io/apiserverrequirement.Also applies to: 42-42
pkg/start/start.go (1)
353-358: Double-check HyperShift semantics: this disables both client-cert auth and CN authorization.
Wiring too.HyperShiftis clean, but please confirm the hosted-control-plane scraping model really expects unauthenticated metrics over TLS (vs. mTLS with different client identity).pkg/cvo/metrics.go (2)
18-20: Good move switching todynamiccertificates; verifyk8s.io/apiserverintegration matches the vendored K8s minor version.
This import-level change is appropriate for hot-reloading certs/CA; just ensure nok8s.io/*skew and thatdynamiccertificatesAPIs used here match v0.34.1 expectations.
196-199:MetricsOptionsis a clean API shape for callers.
|
/test e2e-hypershift |
920ad71 to
ff6d96e
Compare
|
/test all |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pkg/cvo/metrics.go (1)
149-166: Certificate-based authorization correctly implemented.The CN validation logic is sound: it checks for the presence of peer certificates and validates that the Common Name matches the expected Prometheus service account. The HTTP status codes (401 for missing certificate, 403 for unauthorized CN) are appropriate.
Note: The CN
"system:serviceaccount:openshift-monitoring:prometheus-k8s"is hardcoded. Consider whether this should be configurable viaMetricsOptionsif other clients might need access in the future.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (4)
install/0000_90_cluster-version-operator_02_servicemonitor.yaml(1 hunks)pkg/cvo/metrics.go(5 hunks)pkg/cvo/metrics_test.go(2 hunks)pkg/start/start.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/cvo/metrics_test.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
install/0000_90_cluster-version-operator_02_servicemonitor.yamlpkg/start/start.gopkg/cvo/metrics.go
🧬 Code graph analysis (1)
pkg/start/start.go (1)
pkg/cvo/metrics.go (2)
MetricsOptions(196-199)RunMetrics(208-312)
🪛 ast-grep (0.40.0)
pkg/cvo/metrics.go
[warning] 256-256: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{ClientAuth: clientAuth}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
[warning] 283-286: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{
ClientAuth: clientAuth,
GetConfigForClient: servingCertController.GetConfigForClient,
}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🔇 Additional comments (10)
install/0000_90_cluster-version-operator_02_servicemonitor.yaml (1)
14-21: LGTM! TLS client certificate authentication correctly configured.The ServiceMonitor now uses TLS client certificates (
certFileandkeyFile) for authentication instead of bearer tokens, aligning with the mTLS implementation inpkg/cvo/metrics.gothat validates the client certificate's CN.pkg/cvo/metrics.go (8)
18-19: LGTM! Imports correctly updated for certificate-based approach.The new imports for
dynamiccertificatesandkubernetesclient replace the previous token-based authentication dependencies and support the dynamic certificate loading implementation.
126-143: LGTM! Simplified server creation aligns with TLS-based authentication.The function signature correctly removes the authentication client dependency since authentication is now handled at the TLS layer via client certificates, with authorization delegated to the
authHandler.
196-199: LGTM! Clean options struct for metrics configuration.The
MetricsOptionsstruct provides a clear, type-safe way to configure authentication and authorization behavior for the metrics server.
208-223: LGTM! Serving certificate controller correctly initialized.The serving certificate controller is properly set up to watch and automatically reload the server's TLS certificate and key files. The
RunOnceinitialization before starting the background watcher ensures the files are loaded before the server starts.
225-249: LGTM! Client CA controller correctly set up when authentication is enabled.The conditional creation and initialization of the client CA controller is appropriate—it's only needed when client certificate authentication is enabled. The controller watches the
extension-apiserver-authenticationConfigMap inkube-systemfor CA changes, which aligns with Kubernetes' standard approach for extension API server authentication.
276-288: LGTM! Server and TLS configuration properly wired.The HTTP server creation respects the
DisableAuthorizationsetting, and the TLS configuration correctly usesGetConfigForClientto enable dynamic certificate updates. TheClientAuthsetting in the final config ensures proper fallback behavior if the callback fails.
290-311: LGTM! Server lifecycle properly managed with graceful shutdown.The implementation correctly handles the run/shutdown contexts, waits for server termination, performs graceful shutdown on context cancellation, and properly collects results from the server goroutine. Error handling and logging are appropriate.
252-267: This concern is not valid.crypto.SecureTLSConfigfromgithub.com/openshift/library-go/pkg/cryptoautomatically setsMinVersionto the cluster's default TLS version (viaDefaultTLSVersion()) whenMinVersionis not explicitly set in the passedtls.Config. The code in metrics.go correctly relies on this behavior and does not need explicitMinVersionconfiguration.Likely an incorrect or invalid review comment.
pkg/start/start.go (1)
353-357: LGTM! MetricsOptions correctly configured for HyperShift mode.The creation of the
MetricsOptionsstruct with bothDisableAuthenticationandDisableAuthorizationset too.HyperShiftis appropriate. This ensures that in HyperShift deployments, both authentication and authorization are disabled together, while in standard deployments, both are enabled.
|
/retest |
1 similar comment
|
/retest |
|
/test e2e-hypershift-conformance |
|
/test all |
1db20de to
b0454ca
Compare
|
/test all |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
The previous approach of checking the client certificate on every HTTP request was redundant. The certificate remains constant for the entire session duration, which can span days with TLS session tickets and resumptions. Optimize authorization by verifying the CN once during the TLS handshake instead. This moves the authorization from the application onto the TLS layer. This can result in harder debugging due to the TLS layer failing without providing detailed information. For example: ``` curl: (56) OpenSSL SSL_read: error:0A000412:SSL routines::sslv3 alert bad certificate, errno 0 command terminated with exit code 56 ``` However, it also makes the current authorization mechanism more explicit. The previous mechanism could infer that every HTTP request was being authorized, but that was not the case in reality. Some alternatives may be: - Disable session tickets and the resumption support to force more frequent authorizations. - Verify the client's certificate in the application layer, but add useful logic, such as checking for the certificate expiration date per every HTTP request. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
7bbe4ec to
290ee8b
Compare
|
@DavidHurta: This pull request references Jira Issue OCPBUGS-66898, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@DavidHurta: This pull request references Jira Issue OCPBUGS-66898, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
TestingStandalone OpenShiftPrometheus can scrape using mTLSClient needs to provide its certificateThe CN is verified correctlyThe metrics-server Pod has its client certificate signed by the same issuer as does Prometheus. Let's try to scrape the CVO using the metric-server certs. I am using Checking the CVO logs: HyperShiftPromeCleus can be used in combination with the Given that the OCP monitoring is able to scrape hosted CVO, it is safe to assume that a custom monitoring stack can as well. The hosted CVO does not even have access to the required CA bundle. Prometheus output using PromeCIeus from a The E2E functionality requires a follow-up with the HyperShift team regarding the E2E testing to not potentially introduce a regression with this PR. |
|
/hold I am checking with the HyperShift team. However, the PR is ready for a review. |
|
/test e2e-agnostic-ovn-upgrade-into-change |
|
@DavidHurta: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| // RunMetrics launches a server bound to listenAddress serving | ||
| // Prometheus metrics at /metrics over HTTPS. Continues serving | ||
| // Prometheus metrics at /metrics over HTTPS. Continues serving | ||
| // until runContext.Done() and then attempts a clean shutdown | ||
| // limited by shutdownContext.Done(). Assumes runContext.Done() | ||
| // limited by shutdownContext.Done(). Assumes runContext.Done() | ||
| // occurs before or simultaneously with shutdownContext.Done(). | ||
| // Also detects changes to metrics certificate files upon which | ||
| // the metrics HTTP server is shutdown and recreated with a new | ||
| // TLS configuration. | ||
| func RunMetrics(runContext context.Context, shutdownContext context.Context, listenAddress, certFile, keyFile string, restConfig *rest.Config, disableMetricsAuth bool) error { | ||
| var tlsConfig *tls.Config | ||
| if listenAddress != "" { | ||
| var err error | ||
| tlsConfig, err = makeTLSConfig(certFile, keyFile) | ||
| if err != nil { | ||
| return fmt.Errorf("Failed to create TLS config: %w", err) | ||
| } | ||
| } else { | ||
| // The TLS configuration automatically reloads certificates when | ||
| // they change on disk using dynamiccertificates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Update the description.
|
@DavidHurta: This pull request references Jira Issue OCPBUGS-66898, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| cn := verifiedChains[0][0].Subject.CommonName | ||
| if cn != metricsAllowedClientCN { | ||
| klog.V(4).Infof("Access denied for CN: %s", cn) | ||
| return errors.New("unauthorized CN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't mind as much for logs, but for error messages, which might eventually reach users, can we avoid accronyms with unauthorized common name or similar instead of using CN?
Also, not clear to me why we'd log the rejected CN in the previous Infof but not include it in the error message here. The client has the certificate, and they can go look at their CN there, but if we include the CN in the error message, maybe that saves them a bit of work?
| // they change on disk using dynamiccertificates. | ||
| func RunMetrics(runContext context.Context, shutdownContext context.Context, listenAddress, certFile, keyFile string, restConfig *rest.Config, metricsOptions MetricsOptions) error { | ||
| if listenAddress == "" { | ||
| return errors.New("TLS configuration is required to serve metrics") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth updating this error message to point out that listenAddress is required. Looks like it's been stale since 0e6136c 😅
| origKeyChecksum, err := checksumFile(keyFile) | ||
| if err != nil { | ||
| return fmt.Errorf("Failed to initialize key file checksum: %w", err) | ||
| if err := servingContentController.RunOnce(metricsContext); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that the RunOnce approach is intended to catch issues early and allow this function to hard fail, before launching the Run method. But there's more going on in Run than in RunOnce, and having no way to get information back out of the Run goroutine to tell you how it's going seems like it could be problematic. What if we have some kind of issue that would need a container restart to recover? But 🤷, I guess if I was particularly motivated, I could try to push for a clearer feedback mechanism in the upstream library. And ClusterVersionOperatorDown gives us a chance at calling in an admin to manually recover things if something in the metrics-serving pipe breaks. So not a blocker, just something that I'm grumbling about, in the hopes that it might motivate someone else to chase upstream about moving to an API with better feeback mechanisms ;)
| } | ||
|
|
||
| clientAuth := tls.NoClientCert | ||
| if !metricsOptions.DisableAuthentication { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this seems like it would be slightly simpler if you shifted the clientAuth := tls.NoClientCert default up to the var clientCA ... level above, and merged the clientAuth = tls.RequireAndVerifyClientCert line into the previous !metricsOptions.DisableAuthentication block, instead of having two of those blocks in such close proximity.
Same for baseTlSConfig too, I think collecting all the insecure defaults in one place, and then overriding all of them in the secure section. Or maybe for the cheap ones we can initialize to the secure settings, like:
clientAuth := tls.RequireAndVerifyClientCert
baseTlSConfig := crypto.SecureTLSConfig(&tls.Config{ClientAuth: clientAuth})
baseTlSConfig.VerifyPeerCertificate = verifyPeerCertificate
var clientCA dynamiccertificates.CAContentProvider
var clientCAController *dynamiccertificates.ConfigMapCAController
if metricsOptions.DisableAuthentication { // weakening
clientAuth = tls.NoClientCert
baseTlSConfig.VerifyPeerCertificate = someDefault
} else { // expensive strengthening
kubeClient, err := kubernetes.NewForConfig(restConfig)
...
}|
|
||
| resultChannelCount++ | ||
| go func() { | ||
| servingCertController.Run(1, metricsContext.Done()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering about that workers=1 argument, 😂:
doesn't matter what workers say, only start one.
| continue | ||
| } | ||
| case <-metricsContext.Done(): | ||
| klog.Infof("Clean shutdown requested: %v", metricsContext.Err()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the klog line will include metrics.go, but maybe mention metrics here, just to be extra clear? Clean metrics shutdown requested: ...?
| if loopError == nil { | ||
| loopError = shutdownError | ||
| } else if shutdownError != nil { // log the error we are discarding | ||
| klog.Errorf("Failed to gracefully shut down metrics server: %v", shutdownError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to keep the logging down here, instead of shifting it to log unconditionally, otherwise you'll double up on the loopError message (which is getting logged below at the Infof level, and then again (presumably at the Errorf level) in whatever is at the top of the bubbled-back return loopError pipeline.

This pull request proposes to migrate the CVO metrics endpoint from bearer token authentication to mutual TLS (mTLS) to comply with OpenShift metrics conventions.
Key changes:
k8s.io/apiserver/pkg/server/dynamiccertificatespackage to simplify logicBreaking Changes:
Trade-offs: