Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault ignores the tls_ca_file value inside of the storage/consul config block #6602

Closed
karl-tpio opened this issue Apr 17, 2019 · 5 comments · Fixed by #6689
Closed

Vault ignores the tls_ca_file value inside of the storage/consul config block #6602

karl-tpio opened this issue Apr 17, 2019 · 5 comments · Fixed by #6689
Assignees
Milestone

Comments

@karl-tpio
Copy link

karl-tpio commented Apr 17, 2019

I have two bugs, i think. I discovered the second one while trying to resolve the first. The second bug seems to be "smaller" so i'll lead with it

Bug the second:

The documents for the storage/consul portion indicate that tls_skip_verify is of type BOOL but this does not appear to be the case.
See the vault.hcl file that configures the vault server for the details.

The problem

When i follow the docs and use a BOOL value for tls_skip_verify i get an unknown type for string error.

root@ip-172-25-50-90:/etc/vault.d# cat vault.hcl | grep tls_skip_verify
  tls_skip_verify = true
  #tls_skip_verify = "true"
root@ip-172-25-50-90:/etc/vault.d# service vault start; journalctl -xfe -u vault.service
<snip>
Apr 17 20:22:02 ip-172-25-50-90 vault[9140]: Error loading configuration from /etc/vault.d/vault.hcl: error parsing 'storage': storage.consul: At 16:21: root.tls_skip_verify: unknown type for string *ast.LiteralType
Apr 17 20:22:02 ip-172-25-50-90 systemd[1]: vault.service: Main process exited, code=exited, status=1/FAILURE
Apr 17 20:22:02 ip-172-25-50-90 systemd[1]: vault.service: Failed with result 'exit-code'.

And when you take the error literally and treat tls_skip_verify as a STRING, the failure to parse error is not present on startup:

root@ip-172-25-50-90:/etc/vault.d# cat vault.hcl | grep tls_skip_verify
  #tls_skip_verify = true
  tls_skip_verify = "true"
root@ip-172-25-50-90:/etc/vault.d# service vault start; journalctl -xfe -u vault.service
<snip>
Apr 17 20:23:47 ip-172-25-50-90 vault[9404]: 2019-04-17T20:23:47.095Z [WARN]  storage migration check error: error="Get https://bootstrap.my-corp.tld:8501/v1/kv/vault/core/migration: x509: certificate signed by unknown authority"
Apr 17 20:23:49 ip-172-25-50-90 vault[9404]: 2019-04-17T20:23:49.099Z [WARN]  storage migration check error: error="Get https://bootstrap.my-corp.tld:8501/v1/kv/vault/core/migration: x509: certificate signed by unknown authority"

It would seem that #1559 is related.

The certificate signed by unknown authority error is the "first" bug that got me started on this whole journey.

Bug the first

I am using the pki backend in vault to generate certificates for hosts internally.
The consul server that is hosted at bootstrap.my-corp.tld uses one of these certificates.

I am using consul-template to fetch the CA from vault and the vault.hcl file does have tls_ca_file pointing to the correct ca.pem on disk.

I can verify that the /etc/vault.d/tls/ca.pem file works with cUrl:

root@ip-172-25-50-90:/etc/vault.d# curl --cacert ./tls/ca.pem -vvv https://bootstrap.my-corp.tld:8501/v1/kv/vault/core/migration
*   Trying 172.25.50.90...
* TCP_NODELAY set
* Connected to bootstrap.my-corp.tld (172.25.50.90) port 8501 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: ./tls/ca.pem
  CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: [NONE]
*  start date: Apr 17 19:42:09 2019 GMT
*  expire date: May  1 19:42:39 2019 GMT
*  subjectAltName: host "bootstrap.my-corp.tld" matched cert's "bootstrap.my-corp.tld"
*  issuer: C=US; O=my-corp; OU=Karl's Lab; CN=Karls Lab POC ROOT
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x558a7e149900)
> GET /v1/kv/vault/core/migration HTTP/2
> Host: bootstrap.my-corp.tld:8501
> User-Agent: curl/7.58.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 404
< vary: Accept-Encoding
< x-consul-index: 565
< x-consul-knownleader: true
< x-consul-lastcontact: 0
< content-type: text/plain; charset=utf-8
< content-length: 0
< date: Wed, 17 Apr 2019 20:30:27 GMT
<
* Connection #0 to host bootstrap.my-corp.tld left intact

And when i omit the --cacert ./tls/ca.pem argument:

root@ip-172-25-50-90:/etc/vault.d# curl -vvv https://bootstrap.my-corp.tld:8501/v1/kv/vault/core/migration
*   Trying 172.25.50.90...
* TCP_NODELAY set
* Connected to bootstrap.my-corp.tld (172.25.50.90) port 8501 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS alert, Server hello (2):
* SSL certificate problem: unable to get local issuer certificate
* stopped the pause stream!
* Closing connection 0
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

clearly the certificate being offered up by https://bootstrap.my-corp.tld:8501/v1/kv/vault/core/migration is trusted when the tool accessing the URL is told to use the /etc/vault.d/tls/ca.pem file.
When using the default system CAs, the certificate offered up by that URL is not trusted.

The problem

The problem is that vault does not respect the tls_ca_file setting inside of the storage "consul" block.
When i use the tls_ca_file = "/etc/vault.d/tls/ca.pem" setting, i still get the "storage migration check error: x509: certificate signed by unknown authority" error.

When i point the tls_ca_file directive to a file that is not present on disk (bonkers.pem), i get no startup exception saying that the file could not be found. This leads me to believe that the value is ignored.

Here's my vault.hcl file:

root@ip-172-25-50-90:/etc/vault.d# cat vault.hcl
##
# Vault's defaults are pretty sane, a few things we must change:
#
# See: https: //www.consul.io/docs/agent/options.html
##
# TODO: remove after debugging
log_level = "TRACE"

# See https://www.vaultproject.io/docs/configuration/ui/index.html
ui = true

storage "consul" {
  address = "https://bootstrap.my-corp.tld:8501"

 
  # Note: There is no bonkers.pem on disk in this location. If this value is being parsed properly, a "no file found" error should be thrown on start up, no?
  tls_ca_file = "/etc/vault.d/tls/bonkers.pem"

  #tls_ca_file = "/etc/vault.d/tls/ca.pem"
  
  # Docs say this is a BOOL, but i get error when i don't use quotes
  # tls_skip_verify = true

  # This does not error, but does not seem to take effect, either...
  tls_skip_verify = "true"

}

seal "awskms" {
   region     = "us-west-1"
   kms_key_id = "<lol no>"
}

I can share the vault.service systemd file if that's something you'd want to see

And my system info, too:

root@ip-172-25-50-90:/etc/vault.d# vault -version
Vault v1.1.0 ('36aa8c8dd1936e10ebd7a4c1d412ae0e6f7900bd')
root@ip-172-25-50-90:/etc/vault.d# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
@karl-tpio
Copy link
Author

For anybody that hits a similar snag, adding the CA's cert to the system CA store seems to work.

It's not the best idea to mingle the CAs cert the system CA (depending on your security model/posture...) but i was able to unblock myself using this quick guide.

@briankassouf briankassouf added this to the 1.1.3 milestone Apr 29, 2019
@mgritter mgritter self-assigned this May 3, 2019
@mgritter
Copy link
Contributor

mgritter commented May 3, 2019

@karl-tpio,

Thanks for submitting the issue!

I’ve looked at the code and I believe I’ve found a way to get your configuration working correctly. The consul stanza in the configuration requires “scheme” to be set to “https” before it will look at the TLS settings. See

https://www.vaultproject.io/docs/configuration/storage/consul.html#scheme

If you change your configuration to

   scheme = “https”
   address = “bootstrap.my-corp.tld:8501”

you should see better behavior; the TLS options will be applied.

I’ll look into whether we should make a change to parse the address for a protocol type; it seems like the Consul client library accepts a whole URL, even though our configuration handling did not expect it. Please follow up with me if you have any further concerns.

Mark

@karl-tpio
Copy link
Author

karl-tpio commented May 4, 2019

@mgritter thanks for getting back to me! I'll spin my lab back up and give it a shot...

If my vote counts for anything, i'd strongly encourage parsing to determine the protocol as:

  1. that's intuitive;
  2. It would seem to be more consistent. I can't recall off the top of my head where in the consul or vault config the full URI is required, but i do recall getting a warn/error to the effect of "received https reply, but https not configured". This was on a field that took protocol+host. I'm sorry i can't tell you precisely where, but i do remember encountering this message while messing about on this vault/consul/consul-template PoC.

But i would totally settle for a "hey... you specified a protocol where only a host should go. Here's a hint: see the docs pertaining to scheme directive" message in the logs. Or if neither of those two options are preferable, i don't mind adding a quick NOTE: Protocol is set in the \scheme`...` line to the docs.

As for the unknown type for string error in bug the second, is that a documentation issue or a side-effect of my misconfiguration?

@mgritter
Copy link
Contributor

mgritter commented May 4, 2019

Thanks for the suggestions, @karl-tpio.

The tls_skip_verify parameter does have to be a string in the HCL file, as per the comment in #1559 (comment). This was not a side-effect of the lack of scheme. The parameter's value is parsed later, to accommodate values such as "0" or "1". I'll look into updating the documentation.

@NFarrington
Copy link

NFarrington commented May 7, 2019

@mgritter Thanks for your message about the scheme configuration - I had the same issue, and this has fixed it.

I agree with @karl-tpio that parsing the address for a scheme would be intuitive, especially because this is the route I went configuring and debugging Vault:

  1. Start Vault with address set to vault.my.tld.
  2. Error occurs: [WARN] storage migration check error: error="Get http://vault.my.tld:8501/v1/kv/vault/core/migration: net/http: HTTP/1.x transport connection broken: malformed HTTP resp...03\x01\x00\x02\x02""
  3. Reconfigure address to https://vault.my.tld
  4. Different error occurs: [WARN] storage migration check error: error="Get https://vault.my.tld:8501/v1/kv/vault/core/migration: x509: certificate signed by unknown authority"

The main confusion is that adding the scheme to the address results in it attempting to resolve the correct URI, i.e. it doesn't start using http://https://vault.my.tld/ - so to some degree, Vault understands that the address has a scheme, and doesn't change it. It would have been much clearer if Vault had started using http://https://vault.my.tld/, because then it would clearly demonstrate a configuration error.

I think the logical behaviour would either be to:

  1. Throw an error if the scheme is included in the address option; or
  2. Parse the scheme in address (if there is one), and favour that over the scheme option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants