Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auth: add client-only ACL #18730

Merged
merged 4 commits into from
Oct 12, 2023
Merged

auth: add client-only ACL #18730

merged 4 commits into from
Oct 12, 2023

Conversation

tgross
Copy link
Member

@tgross tgross commented Oct 11, 2023

The RPC handlers expect to see nil ACL objects whenever ACLs are disabled. By using nil as a sentinel value, we have the risk of nil pointer exceptions and improper handling of nil when returned from our various auth methods that can lead to privilege escalation bugs. This is the third in a series to eliminate the use of nil ACLs as a sentinel value for when ACLs are disabled.

This patch involves creating a new "virtual" ACL object for checking permissions on client operations and a matching AuthenticateClientOnly method for client-only RPCs that can produce that ACL.

Unlike the server ACLs PR, this also includes a special case for "legacy" client RPCs where the client was not previously sending the secret as it should (leaning on mTLS only). Those client RPCs were fixed in Nomad 1.6.0, but it'll take a while before we can guarantee they'll be present during upgrades.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: #18703
Ref: #18715
Ref: #16799

The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the third in a series to eliminate
the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch involves creating a new "virtual" ACL object for checking permissions
on client operations and a matching `AuthenticateClientOnly` method for
client-only RPCs that can produce that ACL.

Unlike the server ACLs PR, this also includes a special case for "legacy" client
RPCs where the client was not previously sending the secret as it
should (leaning on mTLS only). Those client RPCs were fixed in Nomad 1.6.0, but
it'll take a while before we can guarantee they'll be present during upgrades.

Ref: hashicorp/nomad-enterprise#1218
Ref: #18703
Ref: #18715
Ref: #16799
Comment on lines +264 to +266
if err != nil {
s.logger.Error("could not determine remote address", "error", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just making sure we do not want to bail here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I can't think of any case where we'd actually get an error when ctx != nil (and if it is we return no error), but I also didn't want to introduce a more serious bug if I was missing something, just for the sake of metrics.

Comment on lines 281 to 289
expected := fmt.Sprintf("client.%s.nomad", s.region)
_, err := validateCertificateForName(tlsCert, expected)
if err != nil {
expected := fmt.Sprintf("server.%s.nomad", s.region)
_, err := validateCertificateForName(tlsCert, expected)
if err != nil {
return nil, err
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to have validateCertificateForName accept a slice of expected values of which at least one must match?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works. Will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b07a322. While doing that I noticed we had a straggler use of the cert check and after removing that we can remove the old cert check code in nomad/util.go entirely. Done in the same commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then that revealed that there was a bunch of places where we weren't authenticating with the node secret in test code, even though it's required in the RPC. Fixed that.

nomad/service_registration_endpoint.go Outdated Show resolved Hide resolved
@tgross tgross merged commit 3633ca0 into main Oct 12, 2023
25 checks passed
@tgross tgross deleted the no-nil-acls-client-auth branch October 12, 2023 16:21
tgross added a commit that referenced this pull request Oct 12, 2023
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the third in a series to eliminate
the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch involves leveraging the refactored `auth` package to remove the weird
"mixed auth" helper functions that only support the Variables read/list RPC
handlers. Instead, pass the ACL object and claim together into the
`AllowVariableOperations` method in the usual `acl` package.

Ref: hashicorp/nomad-enterprise#1218
Ref: #18703
Ref: #18715
Ref: #16799
Ref: #18730

Fixes: #15875
tgross added a commit that referenced this pull request Oct 12, 2023
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the third in a series to eliminate
the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch involves leveraging the refactored `auth` package to remove the weird
"mixed auth" helper functions that only support the Variables read/list RPC
handlers. Instead, pass the ACL object and claim together into the
`AllowVariableOperations` method in the usual `acl` package.

Ref: hashicorp/nomad-enterprise#1218
Ref: #18703
Ref: #18715
Ref: #16799
Ref: #18730

Fixes: #15875
tgross added a commit that referenced this pull request Oct 13, 2023
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the final patch in a series to
eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch adds a new virtual ACL policy field for when ACLs are disabled and
updates our authentication logic to use it. Included:

* Extends auth package tests to demonstrate that nil ACLs are treated as failed
  auth and disabled ACLs succeed auth.
* Adds a new `AllowDebug` ACL check for the weird special casing we have for
  pprof debugging when ACLs are disabled.
* Removes the remaining unexported methods (and repeated tests) from the
  `nomad/acl.go` file.
* Update the semgrep rules to detect improper nil ACL checking and remove the
  old invalid ACL checks.
* Update the contributing guide for RPC authentication.

Ref: hashicorp/nomad-enterprise#1218
Ref: #18703
Ref: #18715
Ref: #16799
Ref: #18730
Ref: #18744
tgross added a commit that referenced this pull request Oct 13, 2023
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the final patch in a series to
eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch adds a new virtual ACL policy field for when ACLs are disabled and
updates our authentication logic to use it. Included:

* Extends auth package tests to demonstrate that nil ACLs are treated as failed
  auth and disabled ACLs succeed auth.
* Adds a new `AllowDebug` ACL check for the weird special casing we have for
  pprof debugging when ACLs are disabled.
* Removes the remaining unexported methods (and repeated tests) from the
  `nomad/acl.go` file.
* Update the semgrep rules to detect improper nil ACL checking and remove the
  old invalid ACL checks.
* Update the contributing guide for RPC authentication.

Ref: hashicorp/nomad-enterprise#1218
Ref: #18703
Ref: #18715
Ref: #16799
Ref: #18730
Ref: #18744
tgross added a commit that referenced this pull request Oct 13, 2023
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the final patch in a series to
eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch adds a new virtual ACL policy field for when ACLs are disabled and
updates our authentication logic to use it. Included:

* Extends auth package tests to demonstrate that nil ACLs are treated as failed
  auth and disabled ACLs succeed auth.
* Adds a new `AllowDebug` ACL check for the weird special casing we have for
  pprof debugging when ACLs are disabled.
* Removes the remaining unexported methods (and repeated tests) from the
  `nomad/acl.go` file.
* Update the semgrep rules to detect improper nil ACL checking and remove the
  old invalid ACL checks.
* Update the contributing guide for RPC authentication.

Ref: hashicorp/nomad-enterprise#1218
Ref: #18703
Ref: #18715
Ref: #16799
Ref: #18730
Ref: #18744
tgross added a commit that referenced this pull request Oct 13, 2023
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the final patch in a series to
eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch adds a new virtual ACL policy field for when ACLs are disabled and
updates our authentication logic to use it. Included:

* Extends auth package tests to demonstrate that nil ACLs are treated as failed
  auth and disabled ACLs succeed auth.
* Adds a new `AllowDebug` ACL check for the weird special casing we have for
  pprof debugging when ACLs are disabled.
* Removes the remaining unexported methods (and repeated tests) from the
  `nomad/acl.go` file.
* Update the semgrep rules to detect improper nil ACL checking and remove the
  old invalid ACL checks.
* Update the contributing guide for RPC authentication.

Ref: hashicorp/nomad-enterprise#1218
Ref: #18703
Ref: #18715
Ref: #16799
Ref: #18730
Ref: #18744
tgross added a commit that referenced this pull request Oct 16, 2023
The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the final patch in a series to
eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch adds a new virtual ACL policy field for when ACLs are disabled and
updates our authentication logic to use it. Included:

* Extends auth package tests to demonstrate that nil ACLs are treated as failed
  auth and disabled ACLs succeed auth.
* Adds a new `AllowDebug` ACL check for the weird special casing we have for
  pprof debugging when ACLs are disabled.
* Removes the remaining unexported methods (and repeated tests) from the
  `nomad/acl.go` file.
* Update the semgrep rules to detect improper nil ACL checking and remove the
  old invalid ACL checks.
* Update the contributing guide for RPC authentication.

Ref: hashicorp/nomad-enterprise#1218
Ref: #18703
Ref: #18715
Ref: #16799
Ref: #18730
Ref: #18744
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants