-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A reconciler should not require the DSCI to perform cleanup action to avoid deadlock #1428
Conversation
* cluster_config: init cluster variables on startup (opendatahub-io#1059) * cluster_config: move type definitions to the beginning of the file Just a bit more clarity Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * cluster_config: init cluster variables on startup Cluster configuration is supposed to be the same during operator pod lifetime. There is no point to detect it on every reconcile cycle keeping in mind that it can causes many api requests. Do the initialization on startup. Keep GetOperatorNamespace() returning error since it defines some logic in the DSCInitialization reconciler. Automatically called init() won't work here due to need to check error of the initialization. Wrap logger into context and use it in the Init() like controller-runtime does [1][2]. Save root context without the logger for mgr.Start since "setup" logger does not fit normal manager's work. Leave GetDomain() as is since it uses OpenshiftIngress configuration which is created when DSCInitialization instantiated. Log cluster configuration on success from Init, so remove platform logging from main. [1] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L297 [2] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L111 Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * cluster: do not return error from GetRelease GetRelease return values are defined at startup, the error checked in main, so no point to return error anymore. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> --------- Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * components: move params.env image updating to Init stage (opendatahub-io#1191) * components, main: add Component Init method Add Init() method to the Component interface and call it from main on startup. Will be used to move startup-time code from ReconcileComponent (like adjusting params.env). Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * components: move params.env image updating to Init stage Jira: https://issues.redhat.com/browse/RHOAIENG-11592 Image names in environment are not supposed to be changed during runtime of the operator, so it makes sense to update them only on startup. If manifests are overriden by DevFlags, the DevFlags' version will be used. The change is straight forward for most of the components where only images are updated and params.env is located in the kustomize root directory, but some components (dashboard, ray, codeflare, modelregistry) also update some extra parameters. For them image part only is moved to Init since other updates require runtime DSCI information. The patch also changes logic for ray, codeflare, and modelregistry in this regard to update non-image parameters regardless of DevFlags like it was changed in dashboard recently. The DevFlags functionality brings some concerns: - For most components the code is written such a way that as soon as DevFlags supplied the global path variables are changed and never reverted back to the defaults. For some (dashboard, trustyai) there is (still global) OverridePath/entryPath pair and manifests reverted to the default, BUT there is no logic of transition. - codeflare: when manifests are overridden namespace param is updated in the hardcoded (stock) path; This logic is preserved. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> --------- Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * update: remove webhook service from bundle (opendatahub-io#1275) - we do not need it in bundle, CSV auto generate one during installation - if we install operator via OLM, webhook service still get used from config/webhook/service.yaml Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: add validation on application and monitoring namespace in DSCI (opendatahub-io#1263) Signed-off-by: Wen Zhou <wenzhou@redhat.com> * logger: add zap command line switches (opendatahub-io#1280) Allow to tune preconfigured by --log-mode zap backend with standard zap command line switches from controller-runtime (zap-devel, zap-encoder, zap-log-level, zap-stacktrace-level, zap-time-encoding)[1]. This brings flexibility in logger setup for development environments first of all. The patch does not change default logger setup and does not change DSCI override functionality. [1] https://sdk.operatorframework.io/docs/building-operators/golang/references/logging Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * Modify Unit Tests GitHub Actions workflow to get code coverage test reports (opendatahub-io#1279) * Create codecov.yml * Added to run test coverage also on PRs * Removing trailing ] * Update .github/workflows/codecov.yml Co-authored-by: Wen Zhou <wenzhou@redhat.com> * Removed go mod install dependency * Consolidated codecov workflow into unit tests workflow --------- Co-authored-by: Wen Zhou <wenzhou@redhat.com> * webhook: move initialization inside of the module (opendatahub-io#1284) Add webhook.Init() function and hide webhook setup inside of the module. It will make it possible to replace Init with a NOP (no operation) with conditional compilation for no-webhook build. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * feat: pass platform from env variable or fall back to use old logic (opendatahub-io#1252) * feat: pass platform from env variables or fall back to use old logic - introduce new env var ODH_PLATFORM_TYPE, value set during build time - if value not match, fall back to detect managed then self-managed - introduce new env var OPERATOR_RELEASE_VERSION, value also set during build time - if value is empty, fall back to use old way from CSV to read version or use 0.0.0 - add support from makefile - use envstubst to replace version Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: remove release version in the change Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: update release version in DSCI and DSC .status for upgrade case (opendatahub-io#1287) - DSCI: if current version is not matching, update it - DSC: in both reconcile pass and fail case, update it Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update version to 2.19.0 (opendatahub-io#1291) Co-authored-by: VaishnaviHire <17230536+VaishnaviHire@users.noreply.github.com> --------- Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Yauheni Kaliuta <ykaliuta@redhat.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Adrián Sanz Gómiz <100594859+asanzgom@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: VaishnaviHire <17230536+VaishnaviHire@users.noreply.github.com>
* Initialize Component APIs * Add spec for Dashboard API * Add Dashboard Controller * Add status sub resource to component's APIs * Add component CRDs as internal objects * shared reconciler (cherry picked from commit 086fe25) * Update DataScienceCLuster controller to only deploy Dashboard * Update implementation of base reconciler * feat: create odh client * feat: rename BaseReconciler to ComponentReconciler * feat: add common status update action * feat: improve common conditions * feat: add basic test for delete resource action * feat: add basic status management * feat: add basic test for update status action * feat: move common code to pkg/controller to avoid circular dependecies * feat: copy DSCDashboard from DataScienceCluster * feat: add components.opendatahub.io/name label * Add TODO and refactor component reconciler Co-authored-by: Wen Zhou <wenzhou@redhat.com> * Fix linters * Remove e2e tests that are not needed * Makefile, webhook: add run-nowebhook (opendatahub-io#1286) Make it possible to compile operator without webhook enabled (with -tags nowebhook). Create a stub webhook.Init() function for that. Add run-nowebhook target to run webhook locally. It requires `make install` to be executed on the cluster beforehand. Since it repeats `make run`, move the command to a variable. Also use variable RUN_ARGS for operator arguments. It makes it possible to override them from the command line. In VSCode it is possible to debug it with the following example launch.json configuration: ``` { "name": "Debug Operator No Webhook", "type": "go", "request": "launch", "mode": "debug", "program": "main.go", "buildFlags": [ "-tags", "nowebhook" ], "env": { "OPERATOR_NAMESPACE": "opendatahub-operator-system", "DEFAULT_MANIFESTS_PATH": "./opt/manifests" }, "args": [ "--log-mode=devel" ], "cwd": "${workspaceFolder}", } ``` Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * fix status apply (cherry picked from commit 4d5874d) * feat: add print columns and validation to components.opendatahub.io/v1:Dashboard (cherry picked from commit f3738fd) * Fix linter issues * Update imports for status_update * Add e2e tests * Add dedicated packages for component controllers * Fix linting errors * Delete component CRs on Removed * Add managementState annotation * component(dashboard): initialize images at startup, remove devflags logging setup (cherry picked from commit 3a8f265) * Adress comments feat(dashboard): configure watched resources event filter (cherry picked from commit c9ef475) Update predicate for create event Fix: Update manifests map in initialization Update controllers/components/dashboard/dashboard_controller.go Co-authored-by: Wen Zhou <wenzhou@redhat.com> Update controllers/components/dashboard/dashboard_controller.go Co-authored-by: Wen Zhou <wenzhou@redhat.com> Update pkg/metadata/annotations/annotations.go Co-authored-by: Wen Zhou <wenzhou@redhat.com> Update controllers/datasciencecluster/kubebuilder_rbac.go Co-authored-by: Wen Zhou <wenzhou@redhat.com> Update apis/components/component.go Co-authored-by: Wen Zhou <wenzhou@redhat.com> Update controllers/components/dashboard/dashboard_controller.go Co-authored-by: Wen Zhou <wenzhou@redhat.com> Update bundle * Move rbac to dashboard_controller * fix linters * Update apis/components/v1/dashboard_types.go Co-authored-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> Co-authored-by: Luca Burgazzoli <lburgazzoli@gmail.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Yauheni Kaliuta <ykaliuta@redhat.com>
…SV (opendatahub-io#1314) - builder marker only work with type struct definition either we move all these back to the rbac file with DSC or add into API for each Signed-off-by: Wen Zhou <wenzhou@redhat.com>
…opendatahub-io#1316) * Avoid storing management state n the component CR as it is not needed Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com> * Update apis/components/component.go Co-authored-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com>
* feat: add component reconciliation pipeline framework Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com> * Fix dashboard internal api tests --------- Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com>
* feat: add support for Ray - cleanup old component files: ray and dashboard - rename function to be more generic - status update on component for installedcomponent - add e2e test - move ireturn into golangci config Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
I haven't used this yet, but I _think_ it's more correct this way?
- remove duplicated - move monitoring's to DSCI for now - move component dedicated to their block - move DSC as const in e2e: fix lint Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* Create ModelRegistry component API and reconciler * Create ModelRegistry component API and reconciler (ssa) * Create ModelRegistry component API and reconciler (findings) * Update modelregistry_controller_actions.go Co-authored-by: Gerard Ryan <git@grdryn.xyz> * Create ModelRegistry component API and reconciler (findings) --------- Co-authored-by: Gerard Ryan <git@grdryn.xyz>
…hub.io/part-of (opendatahub-io#1343) This commit replaces then label components.opendatahub.io/managed-by with components.opendatahub.io/part-of as managed-by was misleading since the label is mainly used for watching and request mapping
…r operator (opendatahub-io#1351) * Allow to run the e2e test suite locally, against an out of the cluster operator * Update readme
This commit is meant to cleanup the kustomize manifest redering by: - removing a useless parameter related to the caching mechanism - moving the code to the kustomize package to accomodate other rendering engines - introduce metrics tro track how many resources have been rendered and to perform blackbox testing against the action
…rviceaccounts (opendatahub-io#1358) Signed-off-by: Wen Zhou <wenzhou@redhat.com>
…PI server and improve efficiency and responsiveness (opendatahub-io#1328) * Add caching suport to deployment action to minimize pressure on the API server and improve efficiency and responsiveness * Add action_deploy_resources_total metric to track the number of resources depoloyed by a controller/action * Enable caching on additional components * Fix findings
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* Implement template rendering action * Update controllers to use template rendering action * Fix findings * Fix findings (2) * Rename Instance to Component for clarity * Fix findings (3) * Fix linter
…verb watch on dashboard CR (opendatahub-io#1383) Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Decouple components and main code of operator with ComponentHandler interface. It allows to hanle all common operations in a unique way and add components without changing main/dsc code. At the moment they are initialization, controller creation and fetching some component specific values (name, management status and corresponding CR). For the latter the code basically copy'n'paste unfortunately. Use init() functionality to register components only. It requires linter silencing and explicit importing them in main.go otherwise init() is not called. Non-recommended approach (and linter silencing) of returning interface is used to return component specific CR as common client.Object to simplify the code in DSC reconciler. Otherwise one more level of inderection is needed with passing required functionality (closure) to the interface function. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
…nly after a component has been enabled. (opendatahub-io#1385) Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com>
Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com>
If the GC service is not authorixed to list some particular types, it emits a message "cannot list resource" that may be misleading as it does not impact the behavior of the GC service. This commit set the log verbosity to 3 so it is not shown by default, but can be enabled eventually for troubleshooting
* Reduce component reconciler's action verbosity * Use kind instead of resource name for selectors * Use handlers.AnnotationToName to map from an event to a resource * Fix TrustyAI e2e
…naged (opendatahub-io#1390) * update: DSC status and remove component CR's case - delete component CR when it is either set to Removed or not set in DSC - remove old status updates, only update when component CR create successful or failed Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: update GetManagementState return type and logic - change return as string, used by NewCRObject() - move logic check on Managed, Removed, not set component and others into GetManagementState() Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: wrong component is set in DSPO Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back to return operatorv1.ManagementState for GetManagementState() - return Managed if component is managed, other case, return Removed Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* refactor: codeflare component
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* feat: initial move of kserve to new structure JIRA: https://issues.redhat.com/browse/RHOAIENG-13179 * chore: templatize provider in AuthorizationPolicy With the old value, opendatahub-odh-auth-provider, the value would be set, then almost immediately be reset to the value of {{.AuthExtensionName}} (which appears to be equivalent to <dsci.ApplicationNamespace>-auth-provider), via the separate manifest specified in the following file, which gets applied by the same list of resources in the feature: z-migrations/kserve-predictor-authorizationpolicy.patch.tmpl.yaml This causes a complication when watching the AuthorizationPolicy, because it causes it to continuously update, flip-flopping between the two values. It seems to make more sense to just set the value to the expected value from the outset, so that's what this change attempts to do. * kserve: add custom FeatureTracker eventHandler
…ources (opendatahub-io#1401) * Add reusable functions to add/remove/update ReconciliationRequest.Resources * Update workbenches controller
…pendatahub-io#1410) * Make it possible to define predicates along with a dynamic watcher In some case, a dynamic watcher should be conditionally registered, as an example, when a component depends on a specific operator to be installed, then it should be possible to register a dynamic watcher with a predicate, so the actual watch is registered only if the predicate succeed. Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com> * Fix findings * Fix linter finding * Fix finding --------- Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com>
Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com>
* Add monitoring api * Create services structs * Initialize service reconciler * Fix kustomize errors * Update to sync with latest changes * Converge reconcilers for components and services * Update label to platform.opendatahub.io * sync with latest changes * Update api docs * Update bundle * Add common types * Remove dscmonitoring role
…1414) * Add an option to set the deployment action cache TTL * Fix findings
) * feat: add support for modelmeshserving as component - add new modelcontroller component: do not show in DSC.status.installedComponents - e2e on modelcontroller wont run in parallel but after components are done - if modelmesh is enabled and/or kserve is enabled, modelcontroller should be created - devFlag for modelcontroller from kserve (high) or modelmesh (low) priority - reuse the same clusterrole from kserve on aggregated rules - removed duplicated check in dsc for "enabled" which is in the GetManagementStatus - remove/update odh-model-controller related part from kserve including test - temp disable watch on CRDs coming from kserve and modelmesh, due to a finding that both ship the same CRDs and causing Operator reconcile. following PR will address a fix --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* Update api version for internal apis to v1alpha1 * Update interface name * Update apigroup to *.platform.opendatahub.io * Fix api docs * Update version in gvk
* feat: add NIM flag in Operator - API change - add function in modelcontroller - only when kserve is managed and its nim is managed, we set nim-state to managed - use newer branch of serving which has the new params.env --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>
* cluster_config: init cluster variables on startup (opendatahub-io#1059) * cluster_config: move type definitions to the beginning of the file Just a bit more clarity Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * cluster_config: init cluster variables on startup Cluster configuration is supposed to be the same during operator pod lifetime. There is no point to detect it on every reconcile cycle keeping in mind that it can causes many api requests. Do the initialization on startup. Keep GetOperatorNamespace() returning error since it defines some logic in the DSCInitialization reconciler. Automatically called init() won't work here due to need to check error of the initialization. Wrap logger into context and use it in the Init() like controller-runtime does [1][2]. Save root context without the logger for mgr.Start since "setup" logger does not fit normal manager's work. Leave GetDomain() as is since it uses OpenshiftIngress configuration which is created when DSCInitialization instantiated. Log cluster configuration on success from Init, so remove platform logging from main. [1] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L297 [2] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L111 Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * cluster: do not return error from GetRelease GetRelease return values are defined at startup, the error checked in main, so no point to return error anymore. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> --------- Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * components: move params.env image updating to Init stage (opendatahub-io#1191) * components, main: add Component Init method Add Init() method to the Component interface and call it from main on startup. Will be used to move startup-time code from ReconcileComponent (like adjusting params.env). Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * components: move params.env image updating to Init stage Jira: https://issues.redhat.com/browse/RHOAIENG-11592 Image names in environment are not supposed to be changed during runtime of the operator, so it makes sense to update them only on startup. If manifests are overriden by DevFlags, the DevFlags' version will be used. The change is straight forward for most of the components where only images are updated and params.env is located in the kustomize root directory, but some components (dashboard, ray, codeflare, modelregistry) also update some extra parameters. For them image part only is moved to Init since other updates require runtime DSCI information. The patch also changes logic for ray, codeflare, and modelregistry in this regard to update non-image parameters regardless of DevFlags like it was changed in dashboard recently. The DevFlags functionality brings some concerns: - For most components the code is written such a way that as soon as DevFlags supplied the global path variables are changed and never reverted back to the defaults. For some (dashboard, trustyai) there is (still global) OverridePath/entryPath pair and manifests reverted to the default, BUT there is no logic of transition. - codeflare: when manifests are overridden namespace param is updated in the hardcoded (stock) path; This logic is preserved. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> --------- Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * update: remove webhook service from bundle (opendatahub-io#1275) - we do not need it in bundle, CSV auto generate one during installation - if we install operator via OLM, webhook service still get used from config/webhook/service.yaml Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: add validation on application and monitoring namespace in DSCI (opendatahub-io#1263) Signed-off-by: Wen Zhou <wenzhou@redhat.com> * logger: add zap command line switches (opendatahub-io#1280) Allow to tune preconfigured by --log-mode zap backend with standard zap command line switches from controller-runtime (zap-devel, zap-encoder, zap-log-level, zap-stacktrace-level, zap-time-encoding)[1]. This brings flexibility in logger setup for development environments first of all. The patch does not change default logger setup and does not change DSCI override functionality. [1] https://sdk.operatorframework.io/docs/building-operators/golang/references/logging Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * Modify Unit Tests GitHub Actions workflow to get code coverage test reports (opendatahub-io#1279) * Create codecov.yml * Added to run test coverage also on PRs * Removing trailing ] * Update .github/workflows/codecov.yml Co-authored-by: Wen Zhou <wenzhou@redhat.com> * Removed go mod install dependency * Consolidated codecov workflow into unit tests workflow --------- Co-authored-by: Wen Zhou <wenzhou@redhat.com> * webhook: move initialization inside of the module (opendatahub-io#1284) Add webhook.Init() function and hide webhook setup inside of the module. It will make it possible to replace Init with a NOP (no operation) with conditional compilation for no-webhook build. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * feat: pass platform from env variable or fall back to use old logic (opendatahub-io#1252) * feat: pass platform from env variables or fall back to use old logic - introduce new env var ODH_PLATFORM_TYPE, value set during build time - if value not match, fall back to detect managed then self-managed - introduce new env var OPERATOR_RELEASE_VERSION, value also set during build time - if value is empty, fall back to use old way from CSV to read version or use 0.0.0 - add support from makefile - use envstubst to replace version Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: remove release version in the change Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: update release version in DSCI and DSC .status for upgrade case (opendatahub-io#1287) - DSCI: if current version is not matching, update it - DSC: in both reconcile pass and fail case, update it Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update version to 2.19.0 (opendatahub-io#1291) Co-authored-by: VaishnaviHire <17230536+VaishnaviHire@users.noreply.github.com> * Makefile, webhook: add run-nowebhook (opendatahub-io#1286) Make it possible to compile operator without webhook enabled (with -tags nowebhook). Create a stub webhook.Init() function for that. Add run-nowebhook target to run webhook locally. It requires `make install` to be executed on the cluster beforehand. Since it repeats `make run`, move the command to a variable. Also use variable RUN_ARGS for operator arguments. It makes it possible to override them from the command line. In VSCode it is possible to debug it with the following example launch.json configuration: ``` { "name": "Debug Operator No Webhook", "type": "go", "request": "launch", "mode": "debug", "program": "main.go", "buildFlags": [ "-tags", "nowebhook" ], "env": { "OPERATOR_NAMESPACE": "opendatahub-operator-system", "DEFAULT_MANIFESTS_PATH": "./opt/manifests" }, "args": [ "--log-mode=devel" ], "cwd": "${workspaceFolder}", } ``` Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * update: use namespace dyniamically from operator env than hardcode value (opendatahub-io#1298) - thses are only needed when it is downstream speicific cases Signed-off-by: Wen Zhou <wenzhou@redhat.com> * webhook: switch to contextual logger (opendatahub-io#1295) Use k8s contextual logger instead of own. Comparing to other parts of the operator webhook is a bit special since it serves own endpoints and has context independent from controllers. Include more info (operation), just to get more details. Do not add kind since it's clear from "resource" field. Since controller-framework adds "admission" to the name, use own LogConstructor with own base logger for the framework's DefaultLogConstructor. Add name locally to distinguish from framework's messages. Add Name field to the structures to avoid copying string literal for both WithName() and WithValues(). The output changes and it looks like ``` {"level":"info","ts":"2024-10-11T05:17:20Z","logger":"ValidatingWebhook","msg":"Validation request","object":{"name":"default-dsci"},"namespace":"","name":"default-dsci","resource":{"group":"dscinitialization.opendatahub.io","version":"v1","resource":"dscinitializations"},"user":"kube:admin","requestID":"e5bf3768-6faa-4e14-9004-e54ee84ad8b7","webhook":"ValidatingWebhook","operation":"CREATE"} ``` or for the defaulter: ``` {"level":"info","ts":"2024-10-11T04:50:48Z","logger":"DefaultingWebhook","msg":"Defaulting DSC","object":{"name":"default-dsc"},"namespace":"","name":"default-dsc","resource":{"group":"datasciencecluster.opendatahub.io","version":"v1","resource":"datascienceclusters"},"user":"kube:admin","requestID":"c9213ff3-80ee-40c0-9f15-12188dece68e","webhook":"DefaultingWebhook"} ``` (the messages are not from the current codebase, was added for demo only) Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * chore: use client.OjectKeyFromObject in client.Get() (opendatahub-io#1301) Signed-off-by: Wen Zhou <wenzhou@redhat.com> * (fix): Change on trigger event to `pull_request_target` in the "Check config and readme updates / Ensure generated files are included (pull_request)" action to fix "Resource not accessible by integration" error while running the action (opendatahub-io#1296) Signed-off-by: AJAY JAGANATHAN <ajaganat@redhat.com> * feat: Operator disable create usergroup if detect user enabled external auth (opendatahub-io#1297) * feat: Operator disable create usergroup if detect users enabled external auth - use internal Authentication CR Type "" or IntegratedOAuth to indicate if Operator should create usergroup or not CRD has validation to only allow "IntegratedOAuth", "", "None" or "OIDC" - only grant "get, watch , list" as least permission - remove duplicated rbac for "ingress", has been defined in other places - add object into cache - add CRD into unit-test - add unit-test: since we dont have auth CR, it should not create usergroup Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: review comments Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: review comments use const Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * components, logger: use contextual logging approach (opendatahub-io#1253) Switch ReconcileComponent from passing logger explicitly to wrapping it into context[1][2] Makes one parameter less to pass and will allow called utilities to report component context where they are called from. No user or logging format impact until utilities takes contextual logging in use. [1] https://www.kubernetes.dev/blog/2022/05/25/contextual-logging/ [2] https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/3077-contextual-logging/README.md Credits-To: Bartosz Majsak bartosz.majsak@gmail.com Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * logger, controllers: use common logging level, INFO by default (opendatahub-io#1289) Remove increasing logging level for controllers (it was also passed to components if not overridden from DSCI) since: - it made logging inconsistent. The base contoller runtime logger is set up with INFO level for all log modes, so when controllers are configured for Error level, user sees INFO messages from both controller-runtime and other parts of operator which use controller-runtime's logger directly; - since the base logger is configured for INFO, there is no difference in levels between "default" and "devel". Having levels 1 and 2 there is misleading. Update documentation. This patch changes default logging, former filtered Info messages are displayed now. There is no _big_ difference in practice since currently the log is anyway full of info messages from parts which do not use reconciler's logger, like: {"level":"info","ts":"2024-10-09T13:23:11Z","msg":"waiting for 1 deployment to be ready for dashboard"} Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * removed lint and typo fixes (opendatahub-io#1302) * fix: to make the unset env variable in CSV work with fallback (opendatahub-io#1306) - previous code, will run into opendathaub case if env is not set Signed-off-by: Wen Zhou <wenzhou@redhat.com> * controllers: switch to k8s contextual logger (opendatahub-io#1308) Jira: https://issues.redhat.com/browse/RHOAIENG-14096 This is a squashed commit of the following patches: d012b67 ("controllers: switch to k8s contextual logger") 9880530 ("logger: blindly convert ctrl.Log users to contextual") - controllers: switch to k8s contextual logger Remove own logger from controllers' reconcilers and switch to k8s contextual logger instead [1]. Use contextual logger for SecretGeneratorReconciler and CertConfigmapGeneratorReconciler setups as well. Add name to the logger coming from the framework. It will contains "controller" field already, and like in webhook with the name it's easy to distinguish framework and operator messages. - logger: blindly convert ctrl.Log users to contextual All the users should have proper context now. The log level changes will affect it as well. [1] https://www.kubernetes.dev/blog/2022/05/25/contextual-logging/ Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * Tidy up run-nowebhook recipe & make clean PHONY (opendatahub-io#1310) * chore: Tidy run-nowebhook recipe The suggestion to tidy up the run-nowebhook recipe comes from this conversation: https://github.com/opendatahub-io/opendatahub-operator/pull/1304/files#r1806731373 * chore: Make `clean` a PHONY target I believe `clean` should be a PHONY target, since it doesn't create a file called `clean` * refactor the secret generator controller (opendatahub-io#1311) The commit is meant to: - make the code easier to understand, reducing complexity caused by nested if/else and error conditions (align happy path to the left) - remove shadowed error vars to avoid reporting misleading errors - add some basic unit test for the reconcile loop * update: rename variables rhods to rhoai (opendatahub-io#1313) Signed-off-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: ajaypratap003 <ajay.pratap233@gmail.com> * logger: add logLevel to DSCI devFlags (opendatahub-io#1307) Jira: https://issues.redhat.com/browse/RHOAIENG-14713 This is a squashed commit of the patchset (original ids): 959f84d ("logger: import controller-runtime zap as ctrlzap") b8d9cde ("logger: add logLevel to DSCI devFlags") 2d3efe2 ("components, logger: always use controller's logger") 1ef9266 ("components, logger: use one function for ctx creation") - logger: import controller-runtime zap as ctrlzap To avoid patch polluting with the next changes where uber's zap is imported as `zap`. - logger: add logLevel to DSCI devFlags Allow to override global zap log level from DSCI devFlags. Accepts the same values as to `--zap-log-level` command line switch. Use zap.AtomicLevel type to store loglevel. Locally use atomic to store its value. We must store the structure itsel since command line flags parser (ctrlzap.(*levelFlag).Set()) stores the structure there. In its order ctrlzap.addDefault() stores pointers, newOptions() always sets the level to avoid setting it by addDefaults(). Otherwise SetLevel should handle both pointer and the structure as logLevel atomic.Value. It is ok to store structure of zap.AtomicLevel since it itself contains a pointer to the atomic.Int32 and common level value is changed then. The parsing code is modified version of the one from controller-runtime/pkg/log/zap/flag.go. Deprecate DSCI.DevFlags.logmode. - components, logger: always use controller's logger Since the log level is overridable with its own field of devFlags, do not use logmode anymore. It was used to create own logger with own zap backend in case if devFlags exist. Just add name and value to the existing logger instead. Rename NewLoggerWithOptions back to NewLogger since former NewLogger is removed. Change component logger name. "DSC.Component" is redundant. It was usuful when component's logger was created from scratch, but now when it is always based on the reconciler's one, it's clear that it is a part of DSC. - components, logger: use one function for ctx creation Move both logger and component creation to one function. Just a cleanup. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * secret-generator controller: avoid reporting 'Secret not found' error in reconcile (opendatahub-io#1312) * update: remove dsp with v1(tekton)backend related code (opendatahub-io#1281) * update: remove dsp with v1(tekton)backend related code - images - tekton rbac - descriptions Signed-off-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Humair Khan <HumairAK@users.noreply.github.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Humair Khan <HumairAK@users.noreply.github.com> * update: remove two SA which does not seem valid (opendatahub-io#1254) Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update version to 2.20.0 (opendatahub-io#1325) Co-authored-by: VaishnaviHire <17230536+VaishnaviHire@users.noreply.github.com> * fix: wrong type (opendatahub-io#1322) - we should watch IngressController from operator.openshift.io, not the k8s ingress - this matched the cache we added in main Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update golangci-lint to v1.61.0 (opendatahub-io#1327) * chore(linter): update golangci-lint to v1.61.0 * chore(linter): fix findings * chore: remove duplicated image in modelmesh (opendatahub-io#1336) - this only apply for deplenet odh-model-controller - no need to set it in the default map, a waste Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: ensure input CAbundle always end with a newline (opendatahub-io#1339) * fix: ensure input CAbundle always end with a newline - missing newline will cause problem when inject data into configmap - this happens when user use kustomize with their own CA for DSCI, it set to use |- not | - fix lint Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: better way to add newline Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: add check to remove old kube-rbac-proxy container in modelregistry operator, fixes RHOAIENG-15328 (opendatahub-io#1341) * fix: add check to remove old kube-rbac-proxy container in modelregistry operator, fixes RHOAIENG-15328 Signed-off-by: Dhiraj Bokde <dhirajsb@users.noreply.github.com> * update: remove check in ModelReg code, add it to upgrade logic Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Dhiraj Bokde <dhirajsb@users.noreply.github.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> * fix: wrong name of variable and func lead to always return self-managed (opendatahub-io#1362) Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: add extra check on servicemesh CRD and svc before proceeding create SMCP CR (opendatahub-io#1359) * feat: add extra check on servicemesh CRD and service before proceeding create SMCP CR - when we install operator we might have a race condition: - ossm is not ready to server - dsci already created smcp CR - since we do not know the version of smcp to use - ossm webhook can run into conversion problem since no version specify in smcp CR - test: add more negative test - when CRD does not exist - when service does not exist Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * logger: set loglevel from ZAP_LOG_LEVEL environment variable (opendatahub-io#1344) Add possibility to set loglevel with the environment. The name is used used already. Command line switch has priority over environment. [1] https://github.com/opendatahub-io/data-science-pipelines-operator/blob/main/config/base/params.env#L13 Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * fix: added a filter to evicted pods when checking for ready status (opendatahub-io#1334) * fix: added a filter to evicted pods Signed-off-by: Alessio Pragliola <seth.pro@gmail.com> --------- Signed-off-by: Alessio Pragliola <seth.pro@gmail.com> * Task/NVPE-32: Cleanup NIM integration tech preview resources (opendatahub-io#1369) * chore: cleanup nim integration tech preview resources Signed-off-by: Tomer Figenblat <tfigenbl@redhat.com> * chore: apply suggestions from code review Co-authored-by: Wen Zhou <wenzhou@redhat.com> * chore: addressed pr reviews, better logging Signed-off-by: Tomer Figenblat <tfigenbl@redhat.com> * chore: set nim cleanup for minors 14 and 15 Signed-off-by: Tomer Figenblat <tfigenbl@redhat.com> * chore: nim tp cleanup should remove api key secret Signed-off-by: Tomer Figenblat <tfigenbl@redhat.com> * chore: addresed pr reviews, rewrite cleanup func Signed-off-by: Tomer Figenblat <tfigenbl@redhat.com> --------- Signed-off-by: Tomer Figenblat <tfigenbl@redhat.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> * update: owners (opendatahub-io#1376) Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Updated codecov version to 4.6.0 (opendatahub-io#1378) * Update version to 2.21.0 (opendatahub-io#1379) Co-authored-by: VaishnaviHire <17230536+VaishnaviHire@users.noreply.github.com> * Dockerfile, Makefile: make CGO_ENABLED configurable (opendatahub-io#1382) The commit a107703 ("feat(fips): enable GO_ENABLED in build (opendatahub-io#1001)") enabled CGO which makes problems for builders on non-x86 platforms. Make it as an in the Dockerfile keeping default the same (enabled), but make it possible to override with either environment (`export CGO_ENABLED=0`) or make (`make CGO_ENABLED=0 image-build`) Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * Makefile: make USE_LOCAL overridable (opendatahub-io#1384) Get USE_LOCAL image build flag from makefile variable to make it overridable with `make USE_LOCAL=true image` Do not allow to get its value for environment be default due to pretty generic name. This is shorter than the old recommendation of overriding IMAGE_BUILD_FLAGS. And since now CGO_ENABLED is also a flag, does not mess up with it. * USE_LOCAL=true uses existing manifests from opt/manifests for the produced image without downloading them with get_all_manifests.sh making it possible to both save time and make local amendments. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * Dockerfile: merges manifests builder stages to one (opendatahub-io#1381) We can combine two build stages into one, as there is no need to always build both images (not done by podman) to only then decide from which one we want to copy manifests to the target image. Instead manifests stage will either copy local manifests or fetches using the script based on USE_LOCAL argument. Move USE_LOCAL and OVERWIRTE_MANIFESTS args under FROM since args have scope of the FROM they are declared in. It requires opt/manifests directory to exist, but since it's a part of git repo, it's fine. Original patch from: Bartosz Majsak <bartosz.majsak@gmail.com> [1] [1] opendatahub-io#773 Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * Use incubating for odh-model-controller manifests (opendatahub-io#1393) * Use incubating for odh-model-controller manifests * Use kserve v0.14 branch for manifests * add enablement flag for nim (opendatahub-io#1330) * add enablement flag for nim * add generated files * add api generated file * revert image for csv * update NIM with its own spec * create nim struct remove from serverless * updated generated files * update nim to default removed * update image to correct var * update to managed as default * add nim-state to params.env * fix NVIDIA typo * resolve conflict * add NIM flag check to model mesh * fix condtional for standard practice nim flag * fix to conditional for standards * update typo for error output nim flag kserve * Fix deploy to dependent path * update doc/manifests references for nim flag * fix image typo csv * update if on nim-state * Remove adelton from the platform owners list. (opendatahub-io#1402) * Typo fixes "manifests" (opendatahub-io#1409) * doc: remove -e from make invocation examples (opendatahub-io#1408) The intention in the examples is to override make variables which is done with setting the variable in the command line, while -e switch does not get argument and changes make to a mode when environment takes precedence over makefile settings[1][2] Short examples: ``` % cat Makefile VAR = var DEFVAR ?= defvar all: @echo "VAR $(VAR), origin $(origin VAR)" @echo "DEFVAR $(DEFVAR), origin $(origin DEFVAR)" % make VAR var, origin file DEFVAR defvar, origin file % make VAR=cmd DEFVAR=cmd VAR cmd, origin command line DEFVAR cmd, origin command line % VAR=env DEFVAR=env make VAR var, origin file DEFVAR env, origin environment % VAR=env DEFVAR=env make -e VAR env, origin environment override DEFVAR env, origin environment % VAR=env DEFVAR=env make -e VAR=cmd DEFVAR=cmd VAR cmd, origin command line DEFVAR cmd, origin command line ``` [1] https://www.gnu.org/software/make/manual/html_node/Values.html [2] https://www.gnu.org/software/make/manual/html_node/Environment.html Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * (fix): Avoid using pull_request_target event trigger. (opendatahub-io#1417) Using pull_request_target event trigger results in security vulnerabilities explained here: https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/ Signed-off-by: AJAY JAGANATHAN <ajaganat@redhat.com> * fix: - missing authentications rbca - merge problem - duplicated API definition - lint Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: AJAY JAGANATHAN <ajaganat@redhat.com> Signed-off-by: Dhiraj Bokde <dhirajsb@users.noreply.github.com> Signed-off-by: Alessio Pragliola <seth.pro@gmail.com> Signed-off-by: Tomer Figenblat <tfigenbl@redhat.com> Co-authored-by: Yauheni Kaliuta <ykaliuta@redhat.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Adrián Sanz Gómiz <100594859+asanzgom@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: VaishnaviHire <17230536+VaishnaviHire@users.noreply.github.com> Co-authored-by: Ajay Jaganathan <36824134+AjayJagan@users.noreply.github.com> Co-authored-by: Gerard Ryan <git@grdryn.xyz> Co-authored-by: Luca Burgazzoli <lburgazzoli@users.noreply.github.com> Co-authored-by: ajaypratap003 <ajay.pratap233@gmail.com> Co-authored-by: Marek Laššák <73712209+mlassak@users.noreply.github.com> Co-authored-by: Humair Khan <HumairAK@users.noreply.github.com> Co-authored-by: Dhiraj Bokde <dhirajsb@users.noreply.github.com> Co-authored-by: Alessio Pragliola <83355398+Al-Pragliola@users.noreply.github.com> Co-authored-by: Tomer Figenblat <tomer.figenblat@gmail.com> Co-authored-by: Hannah DeFazio <h2defazio@gmail.com> Co-authored-by: Marcus Trujillo <42344046+trujillm@users.noreply.github.com> Co-authored-by: Jan Pazdziora <jan.pazdziora@code.adelton.com> Co-authored-by: Christian Zaccaria <73656840+ChristianZaccaria@users.noreply.github.com>
* Add custom CRD deployment logic The CRDs are meant to be left in the cluster, so they are not tight to components' lifecycle hence, CRDs requires some custom handling so the metadata is not cluttered by useless component/platform annotations and labels. As a conseguence, the each component has to set-up a dedicated handler and predicate in order to mape and filter out relevant CRDs events. * Fix tests * Fix tests * Fix findings
…b-io#1426) Signed-off-by: Wen Zhou <wenzhou@redhat.com>
… avoid deadlock When the common reconciler detect that a resource is amrked for deletion, it shoudl not try to lookup the DSCI as it should be possible to remove a resource even if the DSCI is not present to avoid deadlocks. To make the code more clear, the Reconcile method has been split in: - delete, which triggers the execution of the finalizer actions - apply, which triggers the execution od the reconcile actions Signed-off-by: Luca Burgazzoli <lburgazzoli@gmail.com>
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
How Has This Been Tested?
Screenshot or short clip
Merge criteria