Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFBUGS-1528: [release-4.18] Fix CephFS volumes failing to mount after upgrade to 4.18 #3007

Conversation

malayparida2000
Copy link
Contributor

@malayparida2000 malayparida2000 commented Feb 4, 2025

Manual Backport of #3006

Until 4.18 Provider mode was using v1(6789) port as default, so v1 ports were present in the rook-ceph-mon-endpoints CM. Rook doesn’t update this CM to v2 3300 port until the mons are failed over, even after requireMsgr2 is set to true. Provider sends the mon endpoints from the same rook-ceph-mon-endpoints CM to the client, so the client uses the v1 (6789) port address it received in it’s ceph-csi-config CM.

But client receives the cephFS kernel mount option from provider as ‘prefer-crc’ as requireMsgr2 is true. When mounting new cephFS volume on client side it tries to use the v1 6789 port with the ‘prefer-crc’ kernel mount option. Which can't work,thus cephFS volumes fail to mount.

As since 4.18 we are using v2 port always, so the provider should send the v2 port address to the client by modifying the mon IPs. Similar implementation can be seen in rook.

Until 4.18 Provider mode was using v1(6789) port as default, so v1 ports
were present in the rook-ceph-mon-endpoints CM. Rook doesn’t update this
CM to v2 3300 port until the mons are failed over, even after
requireMsgr2 is set to true. Provider sends the mon endpoints from the
same rook-ceph-mon-endpoints CM to the client, so the client uses the v1
(6789) port address it received in it’s ceph-csi-config CM.

But client receives the cephFS kernel mount option from provider as
‘prefer-crc’ as requireMsgr2 is true. When mounting new cephFS volume on
client side it tries to use the v1 6789 port with the ‘prefer-crc’
kernel mount option. Which can't work,thus cephFS volumes fail to mount.

As since 4.18 we are using v2 port always, so the provider should send
the v2 port address to the client by modifying the mon IPs.
Similar implementation can be seen in rook.

Signed-off-by: Malay Kumar Parida <mparida@redhat.com>
@malayparida2000 malayparida2000 changed the title [release-4.18] Fix CephFS volumes failing to mount after upgrade to 4.18 DFBUGS-1528: [release-4.18] Fix CephFS volumes failing to mount after upgrade to 4.18 Feb 5, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid jira ticket of any type jira/invalid-bug Indicates that the referenced jira bug is invalid for the branch this PR is targeting labels Feb 5, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 5, 2025

@malayparida2000: This pull request references [Jira Issue DFBUGS-1528](https://issues.redhat.com//browse/DFBUGS-1528), which is invalid:

  • expected the bug to target the "odf-4.18" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Until 4.18 Provider mode was using v1(6789) port as default, so v1 ports were present in the rook-ceph-mon-endpoints CM. Rook doesn’t update this CM to v2 3300 port until the mons are failed over, even after requireMsgr2 is set to true. Provider sends the mon endpoints from the same rook-ceph-mon-endpoints CM to the client, so the client uses the v1 (6789) port address it received in it’s ceph-csi-config CM.

But client receives the cephFS kernel mount option from provider as ‘prefer-crc’ as requireMsgr2 is true. When mounting new cephFS volume on client side it tries to use the v1 6789 port with the ‘prefer-crc’ kernel mount option. Which can't work,thus cephFS volumes fail to mount.

As since 4.18 we are using v2 port always, so the provider should send the v2 port address to the client by modifying the mon IPs. Similar implementation can be seen in rook.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@malayparida2000
Copy link
Contributor Author

/retest

@malayparida2000
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that the referenced jira bug is valid for the branch this PR is targeting and removed jira/invalid-bug Indicates that the referenced jira bug is invalid for the branch this PR is targeting labels Feb 5, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 5, 2025

@malayparida2000: This pull request references [Jira Issue DFBUGS-1528](https://issues.redhat.com//browse/DFBUGS-1528), which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (odf-4.18) matches configured target version for branch (odf-4.18)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (ebenahar@redhat.com), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 5, 2025

@malayparida2000: This pull request references [Jira Issue DFBUGS-1528](https://issues.redhat.com//browse/DFBUGS-1528), which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (odf-4.18) matches configured target version for branch (odf-4.18)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (ebenahar@redhat.com), skipping review request.

In response to this:

Manual Backport of #3006

Until 4.18 Provider mode was using v1(6789) port as default, so v1 ports were present in the rook-ceph-mon-endpoints CM. Rook doesn’t update this CM to v2 3300 port until the mons are failed over, even after requireMsgr2 is set to true. Provider sends the mon endpoints from the same rook-ceph-mon-endpoints CM to the client, so the client uses the v1 (6789) port address it received in it’s ceph-csi-config CM.

But client receives the cephFS kernel mount option from provider as ‘prefer-crc’ as requireMsgr2 is true. When mounting new cephFS volume on client side it tries to use the v1 6789 port with the ‘prefer-crc’ kernel mount option. Which can't work,thus cephFS volumes fail to mount.

As since 4.18 we are using v2 port always, so the provider should send the v2 port address to the client by modifying the mon IPs. Similar implementation can be seen in rook.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2025
Copy link
Contributor

openshift-ci bot commented Feb 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: malayparida2000, travisn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 5, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 0212528 into red-hat-storage:release-4.18 Feb 5, 2025
11 checks passed
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 5, 2025

@malayparida2000: [Jira Issue DFBUGS-1528](https://issues.redhat.com//browse/DFBUGS-1528): All pull requests linked via external trackers have merged:

[Jira Issue DFBUGS-1528](https://issues.redhat.com//browse/DFBUGS-1528) has been moved to the MODIFIED state.

In response to this:

Manual Backport of #3006

Until 4.18 Provider mode was using v1(6789) port as default, so v1 ports were present in the rook-ceph-mon-endpoints CM. Rook doesn’t update this CM to v2 3300 port until the mons are failed over, even after requireMsgr2 is set to true. Provider sends the mon endpoints from the same rook-ceph-mon-endpoints CM to the client, so the client uses the v1 (6789) port address it received in it’s ceph-csi-config CM.

But client receives the cephFS kernel mount option from provider as ‘prefer-crc’ as requireMsgr2 is true. When mounting new cephFS volume on client side it tries to use the v1 6789 port with the ‘prefer-crc’ kernel mount option. Which can't work,thus cephFS volumes fail to mount.

As since 4.18 we are using v2 port always, so the provider should send the v2 port address to the client by modifying the mon IPs. Similar implementation can be seen in rook.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that the referenced jira bug is valid for the branch this PR is targeting jira/valid-reference Indicates that this PR references a valid jira ticket of any type lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants