Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP - Do not require service account key export for google-lifesciences #1709

Closed
mozack opened this issue Aug 17, 2020 · 12 comments
Closed

GCP - Do not require service account key export for google-lifesciences #1709

mozack opened this issue Aug 17, 2020 · 12 comments
Milestone

Comments

@mozack
Copy link
Contributor

mozack commented Aug 17, 2020

New feature

Hi, it would be very helpful to use the currently authenticated credentials instead of export service account key for the google-lifesciences executor. In fact, we cannot use the life-sciences API for most of our use cases without this.

This was done previously for the google-pipelines executor:
#1068

Usage scenario

There is data available to the public on GCP that requires security access. Providers of this data do not allow export of security keys because it may circumvent their security measures. For example, the TCGA and TARGET data stored here: https://isb-cgc.appspot.com/ cannot be accessed from a project with an exported security key.

Suggest implementation

It should be possible to just used the currently authenticated user's credentials. This code snippet from @kdbinder may help:

kdbinder@8b6c049

Thanks!

@pditommaso
Copy link
Member

Not sure to understand how the auth credentials are taken then?

@kdbinder
Copy link

If nextflow is executed from an existing GCP instance, the service account is baked into that VM

@pditommaso
Copy link
Member

I think then we are in the scenario at point 2:

If the environment variable GOOGLE_APPLICATION_CREDENTIALS isn't set, ADC uses the default service account that Compute Engine, Google Kubernetes Engine, App Engine, Cloud Run, and Cloud Functions provide.

There's no way if the service account info is accessible? It would be preferable to check to give a more consistent error message if missing.

@snamburi3
Copy link

I was able to bypass the service account key export and use the user credentials via gcloud auth application-default login.

  1. gcloud auth application-default login

  2. Add Project Name to the json config file ~/.config/gcloud/application_default_credentials.json
    project_id": "PROJECT NAME",

  3. export values

export NXF_VER="20.07.1"
export NXF_MODE=google
export NXF_DEBUG=3
export PROJECT="PROJECT NAME""
export GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/application_default_credentials.json

@mbookman
Copy link
Contributor

Reviewing the description of how Application Default Credentials work, as described here:

https://github.com/googleapis/google-auth-library-java#google-auth-library-oauth2-http

The key bit is:

The following are searched (in order) to find the Application Default Credentials:

  1. Credentials file pointed to by the GOOGLE_APPLICATION_CREDENTIALS environment variable
  2. Credentials provided by the Google Cloud SDK gcloud auth application-default login command
  3. Google App Engine built-in credentials
  4. Google Cloud Shell built-in credentials
  5. Google Compute Engine built-in credentials
    • Skip this check by setting the environment variable NO_GCE_CHECK=true
    • Customize the GCE metadata server address by setting the environment variable GCE_METADATA_HOST=

For submitting requests to the Pipelines API, it would be preferable for Nextflow to just pick up the credentials that are available in the runtime environment (the default credentials). This is the way that most other tools work. So the net of it is:

  • To get Application Default Credentials use GoogleCredentials.getApplicationDefault() or GoogleCredentials.getApplicationDefault(HttpTransportFactory). Presumably this is already done, but...
  • Don't explicitly look for the environment GOOGLE_APPLICATION_CREDENTIALS; let the GoogleCredentials look and emit a message if it can't find credentials. You could enhance what is communicated if the message is insufficient.

Having users explicitly set GOOGLE_APPLICATION_CREDENTIALS as suggested by @snamburi3 is an okay workaround for picking up end user credentials, although it is forcing users to set an environment variable that they could otherwise omit. It would be picked up automatically as check #2 ("Credentials provided by the Google Cloud SDK gcloud auth application-default login command").

But it isn't clear how any other credentials are going to get picked up. The most common one (I think) is #5 ("Google Compute Engine built-in credentials"). The explicit check for GOOGLE_APPLICATION_CREDENTIALS prevents that from happening. It is better to pick up those credentials than encourage users to drop explicit credentials on a GCE VM.

@mbookman
Copy link
Contributor

FWIW, I commented out a bit of code in GoogleLifeSciencesExecutor.groovy, rebuilt, and everything then "just worked". Nextflow then started submitting process execution requests to the life sciences API:

diff --git a/modules/nf-google/src/main/nextflow/cloud/google/lifesciences/GoogleLifeSciencesExecutor.groovy b/modules/nf-google/src/main/nextflow/cloud/google/lifesciences/GoogleLifeSciencesExecutor.groovy
index dcda4501..00877462 100644
--- a/modules/nf-google/src/main/nextflow/cloud/google/lifesciences/GoogleLifeSciencesExecutor.groovy
+++ b/modules/nf-google/src/main/nextflow/cloud/google/lifesciences/GoogleLifeSciencesExecutor.groovy
@@ -70,14 +70,14 @@ class GoogleLifeSciencesExecutor extends Executor {
             throw new AbortOperationException("Executor `google-lifesciences` requires a Google Storage bucket to be specified as a working directory -- Add the option `-w gs://<your-bucket/path>` to your run command line or specify a workDir in your config file")
         }
 
-        def credsFile = env.get('GOOGLE_APPLICATION_CREDENTIALS')
-        final projectId = GoogleLifeSciencesConfig.getProjectIdFromCreds(credsFile)
+        // def credsFile = env.get('GOOGLE_APPLICATION_CREDENTIALS')
+        // final projectId = GoogleLifeSciencesConfig.getProjectIdFromCreds(credsFile)
 
         config = GoogleLifeSciencesConfig.fromSession(session)
-        if( !config.project )
-            config.project = projectId
-        else if( config.project != projectId )
-            throw new AbortOperationException("Project Id `$config.project` declared in the nextflow config file does not match the one expected by credentials file: $credsFile")
+        // if( !config.project )
+        //     config.project = projectId
+        // else if( config.project != projectId )
+        //     throw new AbortOperationException("Project Id `$config.project` declared in the nextflow config file does not match the one expected by credentials file: $credsFile")
 
         if( session.binDir && !config.disableBinDir ) {
             final cloudPath = getTempDir()

In my test case, I was running nextflow from a GCE VM so I wanted nextflow to pick up the service account credentials on my VM. However, with this change, I could also execute nextflow locally and it would pick up my end user credentials that I created with gcloud auth application-default login.

@pditommaso
Copy link
Member

Thanks for this detailed report, this is indeed very useful. A quick question related to Google LS execution, in my tests I notice the GOOGLE_APPLICATION_CREDENTIALS env variable. How the auth is handled in this specific case?

@mbookman
Copy link
Contributor

If GOOGLE_APPLICATION_CREDENTIALS is set, then it is taken to be the intent of the caller that those are the credentials to use. The environment variable is assumed to point to a valid credentials file and is loaded. If you want to follow the code, take a look at:

https://github.com/googleapis/google-auth-library-java/blob/master/oauth2_http/java/com/google/auth/oauth2/DefaultCredentialsProvider.java#L144

    // First try the environment variable
    GoogleCredentials credentials = null;
    String credentialsPath = getEnv(CREDENTIAL_ENV_VAR);
    if (credentialsPath != null && credentialsPath.length() > 0) {
      LOGGER.log(
          Level.FINE,
          String.format("Attempting to load credentials from file: %s", credentialsPath));
      InputStream credentialsStream = null;
      try {
        File credentialsFile = new File(credentialsPath);
        if (!isFile(credentialsFile)) {
          // Path will be put in the message from the catch block below
          throw new IOException("File does not exist.");
        }
        credentialsStream = readStream(credentialsFile);
        credentials = GoogleCredentials.fromStream(credentialsStream, transportFactory);
      } catch (IOException e) {
        // Although it is also the cause, the message of the caught exception can have very
        // important information for diagnosing errors, so include its message in the
        // outer exception message also.
        throw new IOException(
            String.format(
                "Error reading credential file from environment variable %s, value '%s': %s",
                CREDENTIAL_ENV_VAR, credentialsPath, e.getMessage()),
            e);
      } catch (AccessControlException expected) {
        // Exception querying file system is expected on App-Engine
      } finally {
        if (credentialsStream != null) {
          credentialsStream.close();
        }
      }
    }
   <etc>

Does this answer your question or were you looking for something else?

@pditommaso
Copy link
Member

Oops.. I've in my previous comment the crucial point was missing. I meant that I've tested in Google removing the check for the GOOGLE_APPLICATION_CREDENTIALS env var from NF and it failed to submit jobs because it was not enabled to authenticate.

Therefore I was wondering how you made it work?

@mbookman
Copy link
Contributor

Which code did you take out, what was the error message, and on which API call?

As long as you continue to call GoogleCredentials.getApplicationDefault() and pass the returned credentials to API calls, things should work.

@mbookman
Copy link
Contributor

Hi @mozack , @kdbinder , @snamburi3!

If you get the chance, please take a look at #1776 to see if the implementation there (and corresponding documentation) matches up with your expectation for how Nextflow should best handle credentials for the Life Sciences API.

Thanks!

@pditommaso
Copy link
Member

Done

@pditommaso pditommaso added this to the v21.01.0 milestone Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants