Skip to content
This repository has been archived by the owner on Jan 23, 2020. It is now read-only.

Cloudstor volume plugin does not support NVMe block devices on Nitro-based instances #184

Open
kinghuang opened this issue Nov 21, 2018 · 18 comments

Comments

@kinghuang
Copy link

Summary

On current generation EC2 instances, EBS volumes are exposed as NVMe block devices. Devices are named /dev/nvme0n1, /dev/nvme1n1, ….

The Cloudstor volume plugin doesn't appear to work correct with EBS volumes exposed as NVMe block devices.

Expected behaviour

Cloudstor should be able to handle EBS volumes exposed as NVMe block devices.

Actual behaviour

An error occurs mounting Docker volumes backed by EBS volumes exposed as NVMe block devices.

Information

OK hostname=ip-172-31-20-204-us-west-2-compute-internal session=1542824424-wnwHekROgUvnZcZ4yFEjSf6gCYX7tSbq
Done requesting diagnostics.
Your diagnostics session ID is 1542824424-wnwHekROgUvnZcZ4yFEjSf6gCYX7tSbq
Please provide this session ID to the maintainer debugging your issue.

Cloudstor correctly creates and attaches EBS volumes. But, cannot then mount volumes in containers.

Steps to reproduce the behavior

  1. Create a Docker for AWS 18.06.1 cluster using CloudFormation. Use current generation instance types such as m5.large.
  2. Make sure the cloudstor:aws plugin is installed and active.
docker plugin inspect --format '{{ .Enabled }}' cloudstor:aws
  1. Create a relocatable volume backed by EBS.
docker volume create \
--driver cloudstor:aws \
--opt backing=relocatable \
--opt size=1 \
ebs-volume
  1. Run a container, mounting the volume created in the previous step. An error will occur mounting the volume.
docker container run -it --rm \
-v ebs-volume:/ebs \
alpine:3.8 ash

Unable to find image 'alpine:3.8' locally
3.8: Pulling from library/alpine
4fe2ade4980c: Pull complete 
Digest: sha256:621c2f39f8133acb8e64023a94dbdf0d5ca81896102b9e57c0dc184cadaf5528
Status: Downloaded newer image for alpine:3.8
docker: Error response from daemon: error while mounting volume '': Post http://%2Frun%2Fdocker%2Fplugins%2F642ecd8348cf8fd93b18fad4fbd2be66ce9720bedaeac70f26d13181f38a35e3%2Fcloudstor.sock/VolumeDriver.Mount: context deadline exceeded.
@FrenchBen
Copy link
Contributor

/cc @ddebroy

@dodgemich
Copy link

This should follow the work done in RexRay - any ideas when this might get implemented? Without this change, it holds up use of the new instance types, which puts this plugin on a dead-end path in terms of usage...

@stevehumer
Copy link

@kinghuang appears to have documented this very well, and we are experiencing the same issue here for the past month. Would like to get on the latest nitro-based instances (C5/M5), as it's not sustainable to stay on 4-series instances much further into 2019.

This hurts us on performance, which we've proven out on a few clusters that do not have storage requirements, and has caused us to delay reserving 5-series instance types for the year ahead for much of our workload. Hopeful to get some traction here.

@dodgemich
Copy link

Bump to top...any news on this front?

@akumadare
Copy link

Hi,
Experienced the very same issue attempting to move a workload onto the new r5 instance family yesterday.
Using the previous generation is a workaround for now but an update on this would help to decide whether to look for alternatives.

@kinghuang
Copy link
Author

@joeabbey Any chance we can get a comment from Docker on this? Will Cloudstor be updated to handle NVMe block devices on current generation EC2 instances?

@iget-master
Copy link

Just reserved a few M5 instances, but noticed this issue. Any workarounds for this bug?

@akomlik
Copy link

akomlik commented Mar 19, 2019

We had to downgrade our m5 to m4 and t3 to t2 to make this work ;-(

@kinghuang
Copy link
Author

I've moved to REX-Ray EBS, but it doesn't handle copying volumes across zones like Cloudstor.

@dodgemich
Copy link

Same - have to choose between using older instances (m4/t2) and getting cross-AZ replication with Cloudstor, or using newer instances (m5/t3) and losing cross-AZ replication with RexRay.

Would be good to hear if Docker is planning to support Cloudstor here, otherwise it's on a deadend path...

@iget-master
Copy link

I'll migrate to REX-Ray instead, since we can't downgraate to M4 since just reserved a few M5 for 3 years. 👎 Fortunately, the lack of across zones volume copy doesn't affect us.

@porshkevich
Copy link

Hi, I found a temporary solution:
https://github.com/oogali/ebs-automatic-nvme-mapping

@dodwmd
Copy link

dodwmd commented Nov 6, 2019

Is cloudstor no longer being developed?

@dodgemich
Copy link

dodgemich commented Nov 6, 2019

We gave up, migrated to Rexray and accepted the lack of multi-az support.

Very unfortunate that Docker-Inc didn't at least OSS the plugin if not carrying it forward, as it had some nice features.

@respectTheCode
Copy link

We gave up and moved to Rexray as well. It has been a much better experience even though it took longer to get up and running.

@enbohm
Copy link

enbohm commented Nov 8, 2019

@dodwmd @dodgemich @respectTheCode too bad that no one from Docker can assist - I've tried as well getting in touch with @joeabbey et.al to assist updating the Docker version but no reply what so ever. Feels not very professional from Docker's side IMO.

@scottbuckel
Copy link

scottbuckel commented Dec 20, 2019

@enbohm did you ever get in contact with anyone? We've looked into using Rexray but it also looks like it's not actively maintained anymore..

We just upgraded from t2 to t3's and prepaid for the t3's for the next year but now ran into this issue and I'm running out of options..

@scottbuckel
Copy link

@porshkevich Care to explain how you implemented this workaround? Thanks!

Hi, I found a temporary solution:
https://github.com/oogali/ebs-automatic-nvme-mapping

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests