Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: add design doc for the ceph-csi-operator #1

Merged
merged 1 commit into from
May 29, 2024

Conversation

Madhu-1
Copy link
Collaborator

@Madhu-1 Madhu-1 commented May 21, 2024

Adding intial design doc for the ceph-csi-operator

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented May 21, 2024

@nb-ohad @travisn @BlaineEXE PTAL

docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
Copy link

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone full through this yet. I think the biggest question I have that we haven't talked about is what to do with the node fencing controller. IMO, this operator would be the best home for it for users.

docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
- Ensure high availability and scalability of the operator.
- Support multiple deployment methods for the operator (single-namespace,
multi-namespace, all-namespaces).
- Support multiple operator deployment in a single cluster when running in a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What is the difference in behavior between the single-namespace and multi-namespace modes? (Maybe it's listed later in the doc?)
  2. Will the default/recommended mode be single-namespaced? The single-namespaced mode seems the same pattern we have by default in Rook today.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the name suggests the operator will watch for changes in single namespace and multi-namespace modes,
The suggestion completely depends on the way how admin wants to install and configure csi drivers how many csi drivers they wants to starts and how they want to manage it.

docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Show resolved Hide resolved
apiVersion: csi.ceph.io/v1alpha1
metadata:
name: csioperatorconfig
namespace: operator-namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using the namespace ceph-csi in the examples?

Suggested change
namespace: operator-namespace
namespace: ceph-csi

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i used operator-namespace as example to provide more clarity that we can have the CR's created in the operator-namespace, let me know if you still suggest to change it to ceph-csi.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest picking an example namespace that will be used throughout all the examples in this repo. Whatever namespace is in the example, is what users will most often use in production. In the rook project we documented rook-ceph as the example namespace, and now almost everyone uses it. If you set operator-namespace, many users will deploy it exactly like that, so I'd suggest a more commonly useful namespace. The namespace that makes most sense to me was ceph-csi.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn i choose to keep operator-namespace in the design doc which gives details on what was the design consideration but as you suggested with names that is something we will add it to the deployment section (in documentation) or the deployment yamls

docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
spec:
cephClusterRef:
name: ceph-cluster-1
subvolumeGroup: csi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several questions:

  • What if you have multiple subvolumegroups or rados namespaces?
  • Why are subvolume groups and rados namespaces both defined together? They are very different resources for different drivers.
  • This CRD seems too generic and doesn't have a specific purpose. Even the CRD name "Config" indicates maybe it's just miscellaneous settings?

Copy link
Collaborator Author

@Madhu-1 Madhu-1 May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you have multiple subvolumegroups or rados namespaces?

The user needs to create multiple CRs, its one CR represents either subvolumegroup or radosnamespace or both this allow us to use same clusterID for both resources as they are different values but allow us to use same clusterID what we have today.

Why are subvolume groups and rados namespaces both defined together? They are very different resources for different drivers.

This is a generic CRD and even though they are specific to csi driver, we are keeping it generic because the subvolume group applies to cephfs and nfs not for rbd.

This CRD seems too generic and doesn't have a specific purpose. Even the CRD name "Config" indicates maybe it's just miscellaneous settings?

The idea of keeping it generic which helps in extending it in future instead of adding new CRD's and Config indicates that this is a configuration to the cephcsi driver it can be generic and it can be driver-specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this CRD needs to be generic. What about creating two separate CRDs such as CephCSISubvolumeGroup and CephCSIRadosNamespace? If they have common properties or structs, they can still share the golang structs or implementation in the underlying code, but this will help the CRDs as an API to have more clarity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn a single CSI cluster configuration supports all driver types. This is the way Ceph-CSI was built. This new operator is designed to work on top of the established way the underlying CSI driver is configurable and as such each CSI cluster config needs to be able to specify options for all driver types simultaneously

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to the above SubvolumeGroup and RadosNamespace becomes more ceph terminology and it's really an abstraction for us for where the storage is to be consumed from later we can extend that to new objects if ceph adds anything new instead of adding more CRD's to it.

docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
@Madhu-1 Madhu-1 force-pushed the design-doc branch 2 times, most recently from 3fe44ce to 8cb3152 Compare May 23, 2024 12:12
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
metricsPort: 8000
deployCSIAddons: false
cephFsKernelClient: true
cephFsKernelMountOptions: ms_mode=secure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each driver is of a single type, we don't need the cephFS prefix for mount options

Suggested change
cephFsKernelMountOptions: ms_mode=secure
kernelMountOptions: ms_mode=secure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am still not fine with one , we need to bring this under cephfs or filesystem section but for now, lets get this merged and bring that as a new Issue later on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Show resolved Hide resolved
metricsPort: 8000
deployCSIAddons: false
cephFsKernelClient: true
cephFsKernelMountOptions: ms_mode=secure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

docs/design/operator.md Outdated Show resolved Hide resolved
docs/design/operator.md Outdated Show resolved Hide resolved
@Madhu-1 Madhu-1 force-pushed the design-doc branch 2 times, most recently from ca00915 to 2e0e90c Compare May 28, 2024 09:30
@Madhu-1 Madhu-1 requested a review from nb-ohad May 28, 2024 09:30
@nb-ohad
Copy link
Collaborator

nb-ohad commented May 28, 2024

LGTM

renewDeadline: 100
retryPeriod: 10
deployCSIAddons: true
cephFsKernelClient: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok understood now on why these settings are needed in this CRD. What about a section for the cephfs type, just to organize the settings?

Suggested change
cephFsKernelClient: true
cephfs:
kernelClient: true
kernelMountOptions: ms_mode=secure

logLevel: 5
maxfiles: 5
maxLogSize: 10M
clusterName: 5c63ad7e-74fe-4724-a511-4ccdc560da56
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this clusterName is not set? Since the setting today is not set by default for Rook upstream users I assume it's not required. It's only needed for some scenarios, right? Are those features disabled if not set?


```yaml
---
kind: CephCSIConfigMapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I commented on this previously, but can't find it now or if there was a response... "Config" in the CRD name is redundant IMO. How about a name without "Config"? For example, CephCSIPeerMapping, CephCSIPoolMapping, CephCSIClusterMapping, or even just CephCSIMapping

Suggested change
kind: CephCSIConfigMapping
kind: CephCSIPeerMapping

Copy link
Collaborator

@nb-ohad nb-ohad May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here config is a subject, not an adjective. So it makes sense to keep it.
(The word config here defines the entity and is not a description of it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that everything in every CRD is config, why does it need to be part of the CRD name?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see Travis' point here. It's obvious that it's a subject, but it seems redundant in the sense that all k8s resources contain config (or spec, as k8s would call it). Is there a more clear name for this/these?

CephCSIPeerMapping does seem to be more descriptive and explanatory to me, based on the description:

The CephCSIConfigMapping CR contains a mapping between local and remote Ceph
cluster configurations. The information in this CR helps Ceph-CSI identify
peered blockpools. This can help with the correct management and consumption of
volumes in backup and DR scenarios

IMO, even "mapping" is a bit confusing here. I feel that "mapping" may be too unclear, or just redundant.

IMO, CephCSIPeer is a sufficient name to indicate that

  1. This configures Ceph-CSI
  2. This configures Ceph-CSI with "peer" information

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO CephCSIConfigMapping is the right name for it because this collates the relationship of two things here only is the ceph pool mapping and also the csi configuration Name cephCSIConfigName , and its named a CephCSIConfigMapping because its a Mapping to the CephCSIConfigMapping,

The usecase of this is that there are PVC exists with pool A but for some reason pool A was recover but it got a new pool ID for that cases, the user can still creating this CR to allow cephcsi to work with existing PVC's (this is totally low level on how cephcsi work and how we can recover few things), Currently we don't have support for this in Rook but we have support for this in cephcsi. This CR is really targeting to cover multiple users (cases).

CephCSIPeerMapping does seem to be more descriptive and explanatory to me, based on the description:

The CephCSIConfigMapping CR contains a mapping between local and remote Ceph
cluster configurations. The information in this CR helps Ceph-CSI identify
peered blockpools. This can help with the correct management and consumption of
volumes in backup and DR scenarios

This was one more details that missing in the description but it was present in the CR field. i added more details to it i hope now this makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The information inside this CR maps between entities of type CephCSIConfig. This is why it is natural to call it CephCSIConfig + Mapping = CephCSIConfigMaping

The reason we call the original entity CephCSIConfig is because it represents a configuration entry for CSI, we cannot just drop the config part. If we do so it becomes CephCSI which does not make sense.

Regarding the CephCSIOperatorConfig name: A config map of a similar name is a common practice for operators (<operator-name>-operator-config), and is even provided by default. We just elevated the concept into a CR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still am of the opinion that all of these would be better named without "Config", but let's take it as a separate discussion point and not block the design PR. Naming always takes time to settle. How about we open a separate issue for the discussion?


```yaml
---
kind: CephCSIOperatorConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to remove "Config" from all CRD names. Almost all the CRD names in the design include it and it just is redundant IMO. All CRDs contain configuration settings.

Copy link

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general statement, I don't see any details in the design that explain how things work from an admin- or user-based view. I think admin/user workflows would help understand how things fit together, rather than just viewing individual pieces of the design.

At minimum, I think it would be very helpful to explain some details about fields that are important for cross-referencing between admin and user resources. And also, what are the limitations, and what are important field validations that must be done to prevent multi-user collisions.

kind: CephCSIDriver
apiVersion: csi.ceph.io/v1alpha1
metadata:
name: "<prefix>.<driver_type>.csi.ceph.com"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about using the CR name field to encode config options like this.

On one hand, I understand that this avoids certain kinds of complexity.

On the other hand, this feels like it would be hard for users to understand.
Do I have to provide a prefix? Does it have to be unique? Can I not provide a prefix if I don't want to?

What are the available driver types I can use? This is to say that the only indication I had from this example that it was a cephfs-specific driver was from labels.app=cephfs-plugin.

Considering how users will view this and how this will be documented, I think this will be less clear and that this is the kind of thing I see users get wrong often.

I would suggest something like this as an alternative:

  1. metadata.name can be whatever the user wants, with a reasonable suggestion being something like, "platform-cephfs-driver"
  2. prefix becomes a required spec field
  3. driverType becomes a required spec field with an enum set of options -- this makes it more clear (without having to have inside-knowledge, what the driver type is)
  4. the design doc should specify what validations the operator will take to ensure that the resultant driver name is unique in the k8s cluster
  5. to support usability/user feedback, add status.driverName = <prefix>.<driverType>.csi.ceph.com so that users have feedback to understand what driver name to use in StorageClasses.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I have to provide a prefix? Does it have to be unique? Can I not provide a prefix if I don't want to?

This is not always required but that can be set but the format need to be followed.

What are the available driver types I can use? This is to say that the only indication I had from this example that it was a cephfs-specific driver was from labels.app=cephfs-plugin.

This is just an example in the design doc and more specific CR will be provided when we have proper deployment guide.

Considering how users will view this and how this will be documented, I think this will be less clear and that this is the kind of thing I see users get wrong often.

// The name MUST follow domain name notation format
// (https://tools.ietf.org/html/rfc1035#section-2.3.1). It SHOULD
// include the plugin's host company name and the plugin name,
// to minimize the possibility of collisions. It MUST be 63
// characters or less, beginning and ending with an alphanumeric
// character ([a-z0-9A-Z]) with dashes (-), dots (.), and
// alphanumerics between. This field is REQUIRED.

we already through a couple of options before putting up the design doc with the same idea you have presented. we wanted to keep this CR the way how the csidriver looks today, the should be the unique name what the user wants the driver to the named. the CSI driver has some hard contstrain with the naming convention i mentioned above, Instead of introducing new field

with prefix and type in the spec it gives user an option to create duplicate csi driver easily with some which we are really trying to avoid to an extend by moving it to the name as the name of the CR is the name of the csi driver.

@nb-ohad anything to add?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE we do understand this is not a perfect solution but a decision based on "the best of two evils" approach. The requirements are as follows:

All driver names have to be unique! and conflict needs to be caught at creation time, not at reconciliation time. The name will be carried to the actual driver name as we want a 1:1 match between the client-exposed name at the operator layer and the name in the CSI implementation layer.

We dropped the type as a field in the spec for 2 reasons:

  • I can create a CR that is called my-rbd-driver and have it of type cephfs
  • Even if we agree that it is a user mistake, we cannot force the requirement to match the CSI driver implementation name to be my-rbd-driver because of CSI name restrictions

On the other hand, if we encode the type in the name we can solve all the issues with some simple cel rules which gives the user a pretty good experience when it come to explanting the limitation when creation fails.

Comment on lines 296 to 303
### CephCSICephConfig CRD

Stores connection and configuration details for a single Ceph cluster and
provide the information to be used by multiple CSI drivers.

```yaml
---
kind: CephCSICephCluster

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CephCSICephConfig and CephCSICephCluster don't match. Seems like something was updated and one was forgotten -- probably CephCSICephCluster is the correct one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats is the correct one, this represents the ceph cluster details that's why its named as the CephCSICephCluster

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now i got it, it was a overlook from my side, fixed it.

Comment on lines +328 to +353
### CephCSIConfig CRD

Contains details about CephFS, RBD, and NFS configuration to be used when
communicating with Ceph. Include a reference to a CephCSICephCluster holding
the connection information for the target Ceph cluster.

```yaml
---
kind: CephCSIConfig
apiVersion: csi.ceph.io/v1alpha1
metadata:
name: storage
namespace: <operator-namespace>
spec:
cephClusterRef:
name: ceph-cluster-1
cephFS:
subvolumeGroup: csi
kernelMountOptions: readdir_max_bytes=1048576,norbytes
fuseMountOptions: debug
rbd:
radosNamespace: rados-test
status:
phase: Succeeded
reason: successfully linked to CephClusterRef
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not part of CephCSIDriver CRD?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats because a single cephcsi driver name have multiple ceph clusters created and its not required when we are starting a cephcsi driver, these details can be added later any point of time.

namespace where operator is deployed.
- Multi-namespace: the operator manages Ceph-CSI drivers in user-specified
namespaces.
- All-namespaces: the operator manages Ceph-CSI drivers in all namespaces.
Copy link

@BlaineEXE BlaineEXE May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mode auto-deploy drivers in all namespaces, or does it merely allow them to be deployed in all namespaces? Please clarify in the doc as well I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the CR's are namespace scoped resources and the operator if its deployed in this more it will manage all the csi driver which are created in their respective namespace. This is the pattern followed by many operator, do you think we need to add more details here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All namespaces deploy the driver in a single dedicated namespace while allowing it to reconcile CRs across the entire cluster. To more exact, it allow the operator to be deployed as part of an operator group marked as an all namespaces operator group

Copy link

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I review this design doc from the standpoint of an admin who is allowing multi-tenant ability to deploy Ceph-CSI drivers, I start to become confused about how things fit together.

I think it will be helpful to explicitly explain how much multi-tenancy is intended to be designed into the architecture. What should be allowed? What should be explicitly disallowed? Will those details change in the future?

Use kubebuilder framework which is highly used in operator development,
following Kubernetes best practices for controllers and operators

## Architecture

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be best to include node fencing here, instead of treating it as a follow-on design: #3 (comment)

The node fencing addition changes the architecture significantly by making the operator no longer be a one-way graph. Instead, the system must also respond to other system changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE At least I don't have plans to include the fencing operation as part of the initial design, and we are still not sure about tying the CephCSI operator with the CSI-addons functionality as of today. Yes, as you mentioned, things might require changes based on the new features, and this architecture diagram needs to be updated based on the new additions.

Comment on lines +32 to +37
- Support multiple deployment modes for the operator:
- Own-namespace: the operator manages a single Ceph-CSI driver in the current
namespace where operator is deployed.
- Multi-namespace: the operator manages Ceph-CSI drivers in user-specified
namespaces.
- All-namespaces: the operator manages Ceph-CSI drivers in all namespaces.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the API control for this? I can't find it in the doc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc doesnt talk about the API control for this, that going to come up in followup PR's.

Comment on lines +85 to +87
spec:
logLevel: 1
driverSpecDefaults:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the deployment mode (the operator's 'watch' scope) control makes the most sense here

something like...

spec:
  watchNamespaces: # if empty, only watch current ns, ['*'] means watch all
    - ns1
    - ns2
    - etc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, am opening a new issue for it. #7

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we cannot do that. The deployment control is part of the OLM spec and needs to be identified there in order for this to work. I Don't want us to create an operator that has internally code to reconcile OLM entities.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nb-ohad can we continue the discussion on the open issue, we need to consider with OLM and without OLM as well. if we have the outcome of the discussion we can have follow-up PR's to add/fix the wording.

Comment on lines 355 to 365
### CephCSIConfigMapping

The CephCSIConfigMapping CR contains a mapping between local and remote Ceph
cluster configurations. The information in this CR helps Ceph-CSI identify
peered blockpools. This can help with the correct management and consumption of
volumes in backup and DR scenarios

```yaml
---
kind: CephCSIConfigMapping

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different question related to peer mapping: why is this a separate CRD and not part of CephCSIDriver? Shouldn't each driver know what its peers are? What is the user story for this being separate versus included in CephCSIDriver?

I feel like this CRD seems strange to be a separate one in the multi-tenant case where a cluster admin wants to allow other users (or sub-admins) to spin up their own CSI drivers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the Peer information related to the configuration which Rook is generating here https://github.com/rook/rook/blob/f5ca09e38582d8658491e620e302c174ea11020f/pkg/operator/ceph/csi/peermap/config.go#L51-L110. This doesn't required when cephcsi driver is installed or in most of the normal cases, this is required only for recovery cases, that why its a different CRD.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This CRD covers cross-driver concerns.
  2. It is a solution to a problem that is not inherited in CSI or its implementation. The problem only manifests itself when we add a layer of cross-cluster replication/mapping of volumes, which today only manifests as a DR-related concern.

The above two points, and most specifically the second point, highlight that this is not a core concern or a core feature. With that in mind, it made sense to separate into this concern into its own CRD that can evolve on its own and does not impact the evolution of the core CRD as this operator matures

Adding intial design doc for the ceph-csi-operator

Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Co-authored-by: nb-ohad <mitrani.ohad@gmail.com>
renewDeadline: 100
retryPeriod: 10
deployCSIAddons: true
cephFsKernelClient: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a small detail, it's fine either way


```yaml
---
kind: CephCSIConfigMapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still am of the opinion that all of these would be better named without "Config", but let's take it as a separate discussion point and not block the design PR. Naming always takes time to settle. How about we open a separate issue for the discussion?

@Madhu-1 Madhu-1 merged commit 53aff29 into ceph:main May 29, 2024
iamniting pushed a commit to iamniting/ceph-csi-operator that referenced this pull request Jul 12, 2024
bundle: Implement changes for bundle generation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants