Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement security option in VM cluster managers #222

Merged
merged 8 commits into from
Jan 27, 2021

Conversation

jacobtomlinson
Copy link
Member

@jacobtomlinson jacobtomlinson commented Dec 15, 2020

This PR implements the security keyword argument for VMCluster based cluster managers.

This depends on dask/distributed#4364 as credentials are distributed via the Dask config and it seems there are a couple of bugs in the way this works.

Examples

Temporary credentials

Setting security=True will generate temporary credentials which will be distributed to the scheduler and workers at creation time. This is also the new default option.

>>> from dask_cloudprovider.gcp import GCPCluster
>>> from dask.distributed import Client
>>> cluster = GCPCluster(n_workers=1, security=True)
>>> client = Client(cluster)
>>> client
<Client: 'tls://10.142.0.29:8786' processes=0 threads=0, memory=0 B>

Note the TLS here in the connection URL. Only clients with the credentials will be able to connect. In this example the Client class retrieves the credentials from the cluster object.

Custom certificates

You can also set security to a custom Security object with your own generated certificates. Certificates will need to be accessible to the scheduler and workers so likely will need to be included in the Docker image.

>>> from dask_cloudprovider.gcp import GCPCluster
>>> from dask.distributed import Client
>>> from distributed.security import Security
>>> sec = Security(tls_ca_file='cluster_ca.pem',
...                tls_client_cert='cli_cert.pem',
...                tls_client_key='cli_key.pem',
...                require_encryption=True)
>>> cluster = GCPCluster(n_workers=1, security=sec)
>>> client = Client(cluster)
>>> client
<Client: 'tls://10.142.0.29:8786' processes=0 threads=0, memory=0 B>

With this approach clients from other processes can connect in the same way.

>>> from dask.distributed import Client
>>> from distributed.security import Security
>>> sec = Security(tls_ca_file='cluster_ca.pem',
...                tls_client_cert='cli_cert.pem',
...                tls_client_key='cli_key.pem',
...                require_encryption=True)
>>> client = Client('tls://10.142.0.29:8786', security=sec)
>>> client
<Client: 'tls://10.142.0.29:8786' processes=0 threads=0, memory=0 B>

Disabling security

This change makes secure connections the default option. You can also disable SSL/TLS by setting security to False or None.

>>> from dask_cloudprovider.gcp import GCPCluster
>>> cluster = GCPCluster(n_workers=1, security=False)
>>> client = Client(cluster)
>>> client
<Client: 'tcp://10.142.0.29:8786' processes=0 threads=0, memory=0 B>

@jacobtomlinson jacobtomlinson added provider/gcp/vm Cluster provider for GCP Instances provider/aws/ec2 Cluster provider for AWS EC2 Instances provider/digitalocean/droplet Cluster provider for Digital Ocean Droplets provider/azure/vm Cluster provider for Azure Virtual Machines labels Dec 15, 2020
@quasiben
Copy link
Member

This is really great! +1 to make this the default option. Do you want to update docs in this PR ? I'm also happy to help write docs for this

@jacobtomlinson
Copy link
Member Author

Yeah a docs update would be good, the docstrings already incorrectly contain this option (I got a bit over excited with my copy pasta when writing them). Do you suggest anywhere else that it would be useful to document this?

@quasiben
Copy link
Member

Maybe the advanced section on RTD ?

@jacobtomlinson
Copy link
Member Author

I've added a documentation page, a review would be appreciated.

I'm going to move this into draft until dask/distributed#4364 is merged and a release happens that we can pin to. But other than that this is ready.

@jacobtomlinson jacobtomlinson marked this pull request as draft December 16, 2020 17:34
@quasiben
Copy link
Member

@jacobtomlinson That's a really great doc write up. I especially appreciate the details on on why decisions were made and even when security might be disable in the case of troubleshooting

@jacobtomlinson jacobtomlinson marked this pull request as ready for review January 27, 2021 16:26
@jacobtomlinson
Copy link
Member Author

Minimum version bumped in #243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider/aws/ec2 Cluster provider for AWS EC2 Instances provider/azure/vm Cluster provider for Azure Virtual Machines provider/digitalocean/droplet Cluster provider for Digital Ocean Droplets provider/gcp/vm Cluster provider for GCP Instances
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants