diff --git a/docs/content/installation/install_compatibility_requirements.md b/docs/content/installation/install_compatibility_requirements.md index 7c2198211..7d95320af 100644 --- a/docs/content/installation/install_compatibility_requirements.md +++ b/docs/content/installation/install_compatibility_requirements.md @@ -98,7 +98,7 @@ Complete these steps to prepare your environment for installing the CSI (Contain 2. Enable policy-based replication on volume groups, see the following section within your IBM Storage Virtualize product documentation on [IBM Documentation](https://www.ibm.com/docs/): **Administering** > **Managing policy-based replication** > **Assigning replication policies to volume groups**. -5. (Optional) If planning on using volume replication (remote copy function), enable support on your orchestration platform cluster and storage system. +6. (Optional) If planning on using volume replication (remote copy function), enable support on your orchestration platform cluster and storage system. 1. To enable support on your Kubernetes cluster, install the following volume group CRDs once per cluster. @@ -112,13 +112,13 @@ Complete these steps to prepare your environment for installing the CSI (Contain 2. To enable support on your storage system, see the following section within your IBM Storage Virtualize product documentation on [IBM Documentation](https://www.ibm.com/docs/en/): **Administering** > **Managing Copy Services** > **Managing remote-copy partnerships**. -6. (Optional) To use CSI Topology, at least one node in the cluster must have the label-prefix of `topology.block.csi.ibm.com` to introduce topology awareness. +7. (Optional) To use CSI Topology, at least one node in the cluster must have the label-prefix of `topology.block.csi.ibm.com` to introduce topology awareness. **Important:** This label-prefix must be found on the nodes in the cluster **before** installing the IBM® block storage CSI driver. If the nodes do not have the proper label-prefix before installation, CSI Topology cannot be used with the CSI driver. For more information, see [Configuring for CSI Topology](../configuration/configuring_topology.md). -7. (Optional) If planning on using a high availability (HA) feature (either HyperSwap or stretched topology) on your storage system, see the appropriate sections within your IBM Storage Virtualize product documentation on [IBM Documentation](https://www.ibm.com/docs/en/): +8. (Optional) If planning on using a high availability (HA) feature (either HyperSwap or stretched topology) on your storage system, see the appropriate sections within your IBM Storage Virtualize product documentation on [IBM Documentation](https://www.ibm.com/docs/en/): - HyperSwap topology planning and configuration - **Planning** > **Planning for high availability** > **Planning for a HyperSwap topology system** - **Configuring** > **Configuration details** > **HyperSwap system configuration details** @@ -126,4 +126,4 @@ Complete these steps to prepare your environment for installing the CSI (Contain - **Planning** > **Planning for high availability** > **Planning for a stretched topology system** - **Configuring** > **Configuration details** > **Stretched system configuration details** -8. (Optional) If planning on using policy-based replication with your IBM Storage Virtualize storage system, verify that the correct replication policy is in place. This can be done either through the IBM Storage Virtualize user interface (go to **Policies** > **Replication policies**) or through the CLI (`lsreplicationpolicy`). If a replication policy is not in place create one before replicating a volume through the CSI driver. +9. (Optional) If planning on using policy-based replication with your IBM Storage Virtualize storage system, verify that the correct replication policy is in place. This can be done either through the IBM Storage Virtualize user interface (go to **Policies** > **Replication policies**) or through the CLI (`lsreplicationpolicy`). If a replication policy is not in place create one before replicating a volume through the CSI driver. diff --git a/docs/content/release_notes/known_issues.md b/docs/content/release_notes/known_issues.md index ca35dfbfd..3e7a11a00 100644 --- a/docs/content/release_notes/known_issues.md +++ b/docs/content/release_notes/known_issues.md @@ -23,5 +23,5 @@ The following severity levels apply to known issues: |**CSI-3382**|Service|After CSI Topology label deletion, volume provisioning does not work, even when not using any topology-aware YAML files.
**Workaround:** To allow volume provisioning through the CSI driver, delete the operator pod.
After the deletion, a new operator pod is created and the controller pod is automatically restarted, allowing for volume provisioning.| |**CSI-2157**|Service|In extremely rare cases, too many Fibre Channel worker node connections may result in a failure when the CSI driver attempts to attach a pod. As a result, the `Host for node: {0} was not found, ensure all host ports are configured on storage` error message may be found in the IBM block storage CSI driver controller logs.
**Workaround:** Ensure that all host ports are properly configured on the storage system. If the issue continues and the CSI driver can still not attach a pod, contact IBM Support.| |**CSI-5722**|Service|In rare cases, when recreating a pod with a previously used PVC, volume attachment may be stuck and needs to be manually released
**Workaround:** get the list of volume attachments and find the one that is stuck, then release the volume attachment by deleting any finalizers. Then recreate the pod with the previously used PVC.| -|**CSI-5769**|Service|PVC resizing doesn't work in K8S 1.29 on RHEL 9.x nodes| +|**CSI-5769**|Service|In case the NVMe cli package is installed on a Kubernetes cluster node, but the NVMe kernel modules are not loaded, PVC resizing/expansion will not work.
**Workaround:** load the `nvme` and `nvme_core` kernel modules and then retry to resize/expand the PVC. See more details in [Miscellaneous troubleshooting](../troubleshooting/troubleshooting_misc.md)| diff --git a/docs/content/troubleshooting/troubleshooting_misc.md b/docs/content/troubleshooting/troubleshooting_misc.md index 4c237347d..6f9eae363 100644 --- a/docs/content/troubleshooting/troubleshooting_misc.md +++ b/docs/content/troubleshooting/troubleshooting_misc.md @@ -24,13 +24,15 @@ If the following error occurs during stateful application pod creation (the pod /dev/mapper/mpathym: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) ``` + + 1. Log in to the relevant worker node and run the `fsck` command to repair the filesystem manually. `fsck /dev/dm-` The pod should come up immediately. If the pod is still in a _ContainerCreating_ state, continue to the next step. -2. Run the `# multipath -ll` command to see if there are faulty multipath devices. +2. Run the `multipath -ll` command to see if there are faulty multipath devices. If there are faulty multipath devices: @@ -38,4 +40,36 @@ If the following error occurs during stateful application pod creation (the pod 2. Rescan any iSCSI devices, using the `rescan-scsi-bus.sh` command. 3. Restart the multipath daemon again, using the `systemctl restart multipathd` command. - The multipath devices should be running properly and the pod should come up immediately. \ No newline at end of file + The multipath devices should be running properly and the pod should come up immediately. + +## Error during PVC expansion +**Note:** This troubleshooting procedure is a workaround for issue **CSI-5769** (see [Known issues](../release_notes/known_issues.md)) + +If the PVC expansion fails with the following event + +``` screen + Type Status LastProbeTime LastTransitionTime Reason Message + ---- ------ ----------------- ------------------ ------ ------- + FileSystemResizePending True Mon, 01 Jan 0001 00:00:00 +0000 Tue, 14 Jan 2025 15:36:16 +0000 Waiting for user to (re-)start a pod to finish file system resize of volume on node. +``` + +1. Examine the logs of the IBM Block Storage CSI driver node pods. Check if there are log lines similar to the following + +``` screen + 2025-01-20 15:43:49,1203 DEBUG [639] [SVC:100;6005076810830237180000000000038F] (node.go:732) - Discovered device : {/dev/dm-2} + 2025-01-20 15:43:49,1203 DEBUG [639] [SVC:100;6005076810830237180000000000038F] (node_utils.go:168) - GetSysDevicesFromMpath with param : {dm-2} + 2025-01-20 15:43:49,1203 DEBUG [639] [SVC:100;6005076810830237180000000000038F] (node_utils.go:170) - looking in path : {/sys/block/dm-2/slaves} + 2025-01-20 15:43:49,1203 DEBUG [639] [SVC:100;6005076810830237180000000000038F] (node_utils.go:177) - found slaves : {[0xc000586340 0xc000586410]} + 2025-01-20 15:43:49,1203 DEBUG [639] [SVC:100;6005076810830237180000000000038F] (executer.go:75) - Executing command : {nvme} with args : {[list]}. and timeout : {10000} mseconds + 2025-01-20 15:43:49,1203 DEBUG [639] [SVC:100;6005076810830237180000000000038F] (executer.go:69) - Non-zero exit code: exit status 1 + 2025-01-20 15:43:49,1203 DEBUG [639] [SVC:100;6005076810830237180000000000038F] (executer.go:86) - Finished executing command (no output) + 2025-01-20 15:43:49,1203 ERROR [639] [SVC:100;6005076810830237180000000000038F] (node.go:744) - Error while trying to check if sys devices are nvme devices : {exit status 1} + 2025-01-20 15:43:49,1203 DEBUG [639] [SVC:100;6005076810830237180000000000038F] (sync_lock.go:62) - Lock for action NodeExpandVolume, release lock for volume + 2025-01-20 15:43:49,1203 DEBUG [639] [SVC:100;6005076810830237180000000000038F] (node.go:745) - <<<< NodeExpandVolume + 2025-01-20 15:43:49,1203 ERROR [639] [-] (driver.go:85) - GRPC error: rpc error: code = Internal desc = exit status 1 +``` + + +2. Check if required NVMe kernel modules are loaded with command `lsmod | grep nvme`. Kernel modes `nvme` and `nvme_core` are required. + +3. If the required kernel modes `nvme` and `nvme_core` are not loaded, manually load them with command `modprobe nvme; modprove nvme_core` and then retry to expand the PVC.