Skip to content

Commit

Permalink
Add architecture docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Praveenrajmani committed Jun 14, 2022
1 parent 7391bb2 commit 7e633cf
Show file tree
Hide file tree
Showing 4 changed files with 118 additions and 11 deletions.
121 changes: 110 additions & 11 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,22 @@ Architecture

### Components

DirectCSI is made up of 4 components:
DirectCSI is made up of 5 components:

| Component | Description |
|-------------------|------------------------------------------------------|
| CSI Driver | Performs mounting, unmounting of provisioned volumes |
| CSI Controller | Schedules volumes on nodes |
| Drive Controller | Formats and manages drive lifecycle |
| Volume Controller | manages volume lifecycle |
| Component | Description |
|-------------------|---------------------------------------------------------------------------------------|
| CSI Driver | Performs mounting, unmounting of provisioned volumes |
| CSI Controller | Responsible for scheduling and detaching volumes on the nodes |
| Drive Controller | Formats and manages drive lifecycle |
| Volume Controller | Manages volume lifecycle |
| Drive Discovery | Discovers and monitors the drives and their states on the nodes |

The 4 components run as two different pods.

| Name | Components | Description |
|-------------------------------|------------------------------------------------------|------------------------------------|
| DirectCSI Node Driver | CSI Driver, Driver Controller, Volume Controller | runs on every node as a DaemonSet |
| DirectCSI Central Controller | CSI Controller | runs as a deployment |
| Name | Components | Description |
|-------------------------------|-----------------------------------------------------------------------|------------------------------------|
| DirectCSI Node Driver | CSI Driver, Driver Controller, Volume Controller, Drive Discovery | runs on every node as a DaemonSet |
| DirectCSI Central Controller | CSI Controller | runs as a deployment |


### Scalability
Expand All @@ -37,3 +38,101 @@ If node driver is down, then volume mounting, unmounting, formatting and cleanup
In central controller is down, then volume scheduling and deletion will not proceed for all volumes and drives in the direct-csi cluster. In order to restore operations, bring the central controller to running status.

Security is covered [here](./security.md)

### Node Driver

This runs on every node as a Daemonset in 'direct-csi-min-io' namespace. Each pod consists of four containers

#### Node driver registrar

This is a kubernetes csi side-car container which registers the `direct-csi-min-io` CSI driver with kubelet. This registration is necessary for kubelet to issue CSI RPC calls like `NodeGetInfo`, `NodeStageVolume`, `NodePublishVolume` to the corresponding nodes.

For more details, please refer [node-driver-registrar](https://github.com/kubernetes-csi/node-driver-registrar).

#### Livenessprobe

This is a kubernetes csi side-car container which exposes an HTTP `/healthz` endpoint as a liveness hook. This endpoint will be used by kubernetes for csi-driver liveness checks.

For more details. please refer [livenessprobe](https://github.com/kubernetes-csi/livenessprobe)

#### Dynamic drive discovery

This container uses `directpv` binary with `--dynamic-drive-handler` flag enabled. This container is responsible for discovering and managing the drives in the node.

The devices will be discovered from `/run/data/udev/` directory and dynamically listens for udev events for any add, change and remove uevents. Apart from dynamically listening, there is a periodic 30sec sync which checks and syncs the drive states.

For any change, the directcsidrive object will be synced to match the local state. A new directcsidrive object will be created when a new device is detected during sync or when an "Add" uevent occurs.

#### Direct CSI

This container acts as a node plugin and implements the following node service RPCs.

- [NodeGetInfo](https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetinfo)
- [NodeGetCapabilities](https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetinfoNodeGetCapabilities)
- [NodeGetVolumeStats](https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetvolumestats)
- [NodeStageVolume](https://github.com/container-storage-interface/spec/blob/master/spec.md#nodestagevolume)
- [NodePublishVolume](https://github.com/container-storage-interface/spec/blob/master/spec.md#nodepublishvolume)
- [NodeUnstageVolume](https://github.com/container-storage-interface/spec/blob/master/spec.md#nodeunstagevolume)
- [NodeUnpublishVolume](https://github.com/container-storage-interface/spec/blob/master/spec.md#nodeunpublishvolume)

This container is responsible for bind-mounting and umounting volumes on the responding nodes. Monitoring volumes is a WIP and will be added soon. Please refer [csi spec](https://github.com/container-storage-interface/spec) for more details on the CSI volume lifecycle.

Apart from this, there are also drive and volume controllers in place.

##### Drive Controller

Drive controller manages the directcsidrive object lifecycle. This actively listens for drive object (post-hook) events like Add, Update and Delete. The drive controller is responsible for the following

- Formatting a drive

If `.Spec.RequestedFormat` is set on the drive object, it indicates the `kubectl directpv drives format` was called on it and this drive will be formatted.

- Releasing a drive

`kubectl directpv drives release` is a special command to release a "Ready" drive in directpv cluster by umounting the drives and making it "Available". If `.Status.DriveStatus` is set to "Released", it indicates that `kubectl directpv drives release` was called on the drive and it will be released.

- Checking the primary mount of the drive

Drive controller also checks for the primary drive mounts. If an "InUse" or "Ready" drive is not mounted or if it has unexpected mount options set, the drive will be remounted with correct mountpoint and mount options.

- Tagging the lost drives

If a drive is not found on the host, it will be tagged as "lost" with an error message attached to the drive object and its respective volume objects.

Overall, drive controller validates and tries to sync the host state of the drive to match the expected state of the drive. For example, it mounts the "Ready" and "InUse" drives if their primary mount is not present in host.

For more details on the drive states, please refer [Drive States](./drive-states.md).

##### Volume Controller

Volume controller manages the directcsivolume object lifecycle. This actively listens for volume object (post-hook) events like Add, Update and Delete. The volume controller is responsible for the following

- Releasing/Purging deleted volumes and free-ing up its space on the drive

When a volume is deleted (PVC deletion) or purged (using `kubectl directpv drives purge` command), the corresponding volume object will be in terminating state (with deletion timestamp set on it). The volume controller will look for such deleted volume objects and releases them by freeing up the disk space and unsetting the finalizers.


### Central Controller

This runs as a deployment in 'direct-csi-min-io' namespace with default replica count 3.

(Note: The central controller does not do any device level interactions in the host)

Each pod consist of two continers

#### CSI Provisioner

This is a kubernetes csi side-car container which is responsible for sending volume provisioning (CreateVolume) and volume deletion (DeleteVolume) requests to csi drivers.

For more details, please refer [external-provisioner](https://github.com/kubernetes-csi/external-provisioner).

#### Direct CSI

This container acts as a central controller and implements the following RPCs

- [CreateVolume](https://github.com/container-storage-interface/spec/blob/master/spec.md#createvolume)
- [DeleteVolume](https://github.com/container-storage-interface/spec/blob/master/spec.md#deletevolume)

This container is responsible for selecting a suitable drive for a volume scheduling request. The selection algorithm looks for range and topology specifications provided in the CreateVolume request and selects a drive based on its free capacity.

(Note: kube-scheduler is responsible for selecting a node for a pod, central controller will just select a suitable drive in the requested node based on the specifications provided in the create volume request)
4 changes: 4 additions & 0 deletions pkg/node/node.go
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ func NewNodeServer(ctx context.Context,
}

// NodeGetInfo gets node information.
// reference: https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetinfo
func (ns *NodeServer) NodeGetInfo(ctx context.Context, req *csi.NodeGetInfoRequest) (*csi.NodeGetInfoResponse, error) {
topology := &csi.Topology{
Segments: map[string]string{
Expand All @@ -131,6 +132,7 @@ func (ns *NodeServer) NodeGetInfo(ctx context.Context, req *csi.NodeGetInfoReque
}

// NodeGetCapabilities gets node capabilities.
// reference: https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetcapabilities
func (ns *NodeServer) NodeGetCapabilities(ctx context.Context, req *csi.NodeGetCapabilitiesRequest) (*csi.NodeGetCapabilitiesResponse, error) {
nodeCap := func(cap csi.NodeServiceCapability_RPC_Type) *csi.NodeServiceCapability {
klog.V(5).Infof("Using node capability %v", cap)
Expand All @@ -153,6 +155,7 @@ func (ns *NodeServer) NodeGetCapabilities(ctx context.Context, req *csi.NodeGetC
}

// NodeGetVolumeStats gets node volume stats.
// reference: https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetvolumestats
func (ns *NodeServer) NodeGetVolumeStats(ctx context.Context, req *csi.NodeGetVolumeStatsRequest) (*csi.NodeGetVolumeStatsResponse, error) {
vID := req.GetVolumeId()
volumePath := req.GetVolumePath()
Expand Down Expand Up @@ -206,6 +209,7 @@ func (ns *NodeServer) NodeGetVolumeStats(ctx context.Context, req *csi.NodeGetVo
}

// NodeExpandVolume returns unimplemented error.
// reference: https://github.com/container-storage-interface/spec/blob/master/spec.md#nodeexpandvolume
func (ns *NodeServer) NodeExpandVolume(ctx context.Context, in *csi.NodeExpandVolumeRequest) (*csi.NodeExpandVolumeResponse, error) {
return nil, status.Error(codes.Unimplemented, "unimplemented")
}
2 changes: 2 additions & 0 deletions pkg/node/publish_unpublish.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ func getPodInfo(ctx context.Context, req *csi.NodePublishVolumeRequest) (podName
}

// NodePublishVolume is node publish volume request handler.
// reference: https://github.com/container-storage-interface/spec/blob/master/spec.md#nodepublishvolume
func (n *NodeServer) NodePublishVolume(ctx context.Context, req *csi.NodePublishVolumeRequest) (*csi.NodePublishVolumeResponse, error) {
klog.V(3).InfoS("NodePublishVolumeRequest",
"volumeID", req.GetVolumeId(),
Expand Down Expand Up @@ -156,6 +157,7 @@ func (n *NodeServer) NodePublishVolume(ctx context.Context, req *csi.NodePublish
}

// NodeUnpublishVolume is node unpublish volume handler.
// reference: https://github.com/container-storage-interface/spec/blob/master/spec.md#nodeunpublishvolume
func (n *NodeServer) NodeUnpublishVolume(ctx context.Context, req *csi.NodeUnpublishVolumeRequest) (*csi.NodeUnpublishVolumeResponse, error) {
klog.V(3).InfoS("NodeUnPublishVolumeRequest",
"volumeID", req.GetVolumeId(),
Expand Down
2 changes: 2 additions & 0 deletions pkg/node/stage_unstage.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ import (
)

// NodeStageVolume is node stage volume request handler.
// reference: https://github.com/container-storage-interface/spec/blob/master/spec.md#nodestagevolume
func (n *NodeServer) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolumeRequest) (*csi.NodeStageVolumeResponse, error) {
klog.V(3).InfoS("NodeStageVolumeRequest",
"volumeID", req.GetVolumeId(),
Expand Down Expand Up @@ -124,6 +125,7 @@ func (n *NodeServer) NodeStageVolume(ctx context.Context, req *csi.NodeStageVolu
}

// NodeUnstageVolume is node unstage volume request handler.
// reference: https://github.com/container-storage-interface/spec/blob/master/spec.md#nodeunstagevolume
func (n *NodeServer) NodeUnstageVolume(ctx context.Context, req *csi.NodeUnstageVolumeRequest) (*csi.NodeUnstageVolumeResponse, error) {
klog.V(3).InfoS("NodeUnstageVolumeRequest",
"volumeID", req.GetVolumeId(),
Expand Down

0 comments on commit 7e633cf

Please sign in to comment.