Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-side key refresh and update registered public keys from origin server #1748

Merged
merged 8 commits into from
Jan 22, 2025

Conversation

h2zh
Copy link
Collaborator

@h2zh h2zh commented Nov 15, 2024

Overview

This PR introduces 4 major features:

  1. Support multiple private key files in a new directory specified by a new config param IssuerKeysDirectory, while also incorporating the legacy key saved at IssuerKey
  2. Periodically refresh above locations to maintain an up-to-date in-memory private keys map AllKeys. A token or payload signature is considered valid if any of these keys could have produced it. The key with lowest lexicographical filename is set as the current issuer key CurrentKey to sign tokens and payload signatures. They live in a struct IssuerKeys protected by Go's atomic pointer from race condition.
  3. If the server is an origin and keys change after the refresh, the registered public keys update for the corresponding namespace will be triggered. Use the public keys generated from the latest issuerKeys.AllKeys to overwrite the previous registered public keys on registry database.
  4. Server's public keys are generated from issuerKeys.AllKeys, which means multiple public keys could be obtained at <server_url>/.well-known/issuer.jwks.

New Key

In this PR, admin can create new private key by simply dropping the private key into into the new directory as a .pem file (the location where config param IssuerKeysDirectory points to, which by default is /etc/pelican/issuer-keys). Pelican program will pick up all .pem files (including the legacy one at IssuerKey if it exists) through a goroutine running every 5 mins (see details in 1-3 above).

Security

How do we know the origin who updates the namespace's public key is the same origin registered this namespace? We authorize the origin through handshakes with nonce signatures:

  1. At the start of handshake, client (origin, same below) send nonce to the registry to start a handshakes authentication. We use extra fields in these handshakes to verify the client possess the rights to update the namespace's public key.
  2. The registry sends all known public keys for the namespace registered in the database.
  3. The client receives the list of public keys and selects a corresponding private key from that list (if one is available) to sign the challenge.
  4. The client sends the server the updated set of public keys plus the proof of possession from one of the known keys.
  5. The registry verify the signature using the corresponding previous registered public key. If it passes, overwrite the public keys of the registered namespace with the latest set from the client

During the handshakes, all the namespaces managed by the sender origin are sent in a list, and then their corresponding registered public keys get update at the same time. This feature reduces the number of HTTP requests. It builds on the fact that currently, registered public keys are managed "per origin" rather than "per namespace." In other words, all namespaces registered by a single origin share the same set of public keys. (The existing namespace registration process does the same thing too)

Design thinking

Existing file hierarchy:

├── whatever-parent-dir/ (by default /etc/pelican)                             
│   ├── issuer.jwk            # Existing key location, specified by config param “IssuerKey”

To ensure backward compatibility, the new directory that stores the private keys (saved as .pem files) is mounted to a new config param IssuerKeysDirectory. User doesn’t need to change the value of IssuerKey, because we still incorporate the that file in the key refresh, if it exists. The new file hierarchy looks like this:

├── whatever-parent-dir/ (by default /etc/pelican)                             
│   ├── issuer.jwk            # (optional) Legacy key location, specified by config param “IssuerKey”
│   └── issuer-keys/          # The new directory storing multiple private keys where “IssuerKeysDirectory” refers to
│       ├── pelican_generated_<timestamp_1>_<randomChars>.pem    # If no private key is provided, new .pem file will be created
│       ├── pelican_generated_<timestamp_2>_<randomChars>.pem         
│       ├── ...                 
│       └── <admin's fav name>.pem

Algorithm efficiency tradeoff

For a map using atomic.pointer, it requires to copy the entire map before update any key-value pair. In the loadPEMFiles func, because it runs every 5 minutes to update the in-memory key map from the file, which is not frequent, we think the memory use by a map using atomic.pointer is still acceptable.

Future work

At this moment, we have gone far away from the initial ticket and are pretty close to the finishing line. #561

The original final goal is: when multiple origins have the same namespace, the admin of these origins can append a common private key to all. Then these origin servers will register the corresponding shared public key in the registry. Currently this operation needs to manually done by OSDF admin.

If the goal hasn't changed, it's DONE!😃

@h2zh h2zh requested a review from matyasselmeci November 19, 2024 14:40
@h2zh h2zh linked an issue Nov 19, 2024 that may be closed by this pull request
@h2zh h2zh added enhancement New feature or request origin Issue relating to the origin component registry Issue relating to the registry component go Pull requests that update Go code security labels Nov 26, 2024
@h2zh h2zh requested a review from turetske December 10, 2024 21:10
Copy link
Collaborator

@turetske turetske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to do another run through of the server_creds and the actual checking and updating of the keys in the namespace to ensure security, but I have enough non-trivial changes requested that I'm giving this review now.

Major points:

1.) I'm worried about backwards compatibility. What happens when you run with an old origin and a new registry (what about the reverse)?
2.) Updating the private keys and the periodic origin registration are two different tasks. The periodic updating of private keys doesn't need to do a re-register, we already have that occurring. This is redundant.
3.) This PR removes the checks which determine whether an origin is already registered under a different key. This is important because we want to be able to properly communicate this and stop origin startup if the some other user is trying to use the same namespace prefix. This removal is a bad idea.
4.) This PRs tests checks that things work properly, but unit tests should also ensure that failures occur properly and give the expected error messages.

origin/origin_ui.go Outdated Show resolved Hide resolved
config/config.go Show resolved Hide resolved
config/init_server_creds.go Outdated Show resolved Hide resolved
registry/client_commands.go Outdated Show resolved Hide resolved
launcher_utils/register_namespace.go Outdated Show resolved Hide resolved
launcher_utils/register_namespace_test.go Show resolved Hide resolved
config/init_server_creds_test.go Outdated Show resolved Hide resolved
newKey, err := loadIssuerPrivateJWK(issuerKeyFile)
issuerKeysDir := param.IssuerKeysDirectory.GetString()
currentIssuerKeysDir := getCurrentIssuerKeysDir()
// Handles runtime changes to the issuer keys directory (configured via "IssuerKeysDirectory" parameter).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would runtime changes occur?

The parameter is assigned/parsed on startup. If the server settings are changed in the web-ui it forces a restart. Can this actually occur?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently runtime changes occur happen in two test scenarios:
"invalid-token-sig-key" in pelican/web_ui/prometheus_test.go, "diff-secrets-yield-diff-result" in pelican/config/encrypted_test.go

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests should take a different approach: reset the config state between the calls. Otherwise, it clutters up the runtime code.

config/config.go Show resolved Hide resolved
// This new key will be detected and loaded later by the ongoing goroutine `LaunchIssuerKeysDirRefresh`,
// which periodically refreshes the keys in the issuer keys directory
func createNewIssuerKey(ctx *gin.Context) {
issuerKeysDir := param.IssuerKeysDirectory.GetString()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My on the return still stands, but looking at the original issue, generating new private keys is specifically not in scope got this issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the resolution of this question / comment? If it was resolved in-person, please record the knowledge in the PR as well.

@h2zh
Copy link
Collaborator Author

h2zh commented Dec 13, 2024

Updated on Dec 17

1.) I'm worried about backwards compatibility. What happens when you run with an old origin and a new registry (what about the reverse)?

For private key generation and retrieval, the new and old version will use param.IssuerKey and param.IssuerKeysDirectory respectively so there's no conflicts.

Old origin, new registry: The periodic goroutine to refresh the key directory and update its public key in registry doesn't exist in old origin, so theoretically nothing happens. Old origin will keep using the IssuerKey config param to manage the private key generation and usage. It registers its public key on registry through the namespace registration workflow, which is untouched in this PR. For testing, I run registry, director with the latest code and a v7.9.9 origin. They work well.

New origin, old registry: This is highly unlikely to happen since we keep the central services up-to-date, including registry. Having said that, if it actually happens, new origin will get a 404 error when it can't hit the public key update endpoint on old registry. Around line 330 and 414 in registry/client_commands.go, the new origin will handle 404 error gracefully with a log and just silently give up the update.

@jhiemstrawisc jhiemstrawisc added this to the v7.12.0 milestone Dec 17, 2024
@h2zh
Copy link
Collaborator Author

h2zh commented Dec 17, 2024

Update: this is outdated. Refer to the writeup on the top for the latest.
This is the security checking workflow when updating the keys. Hope this could facilitate your reviews : ) @turetske
Multi-Private-Key-Page-2 drawio

config/config.go Show resolved Hide resolved
origin/origin_ui.go Outdated Show resolved Hide resolved
docs/parameters.yaml Outdated Show resolved Hide resolved
origin/origin_ui.go Outdated Show resolved Hide resolved
launcher_utils/update_namespace_pubkey.go Outdated Show resolved Hide resolved
origin/origin_ui_test.go Outdated Show resolved Hide resolved
registry/client_commands.go Outdated Show resolved Hide resolved
registry/client_commands.go Outdated Show resolved Hide resolved
registry/client_commands_test.go Outdated Show resolved Hide resolved
registry/registry_pubkey_update.go Outdated Show resolved Hide resolved
docs/parameters.yaml Outdated Show resolved Hide resolved
@jhiemstrawisc
Copy link
Member

One other quick comment, now that I think we've settled on an answer -- wherever there's some JSON that gets passed back and forth via API calls, we should stick to camel case:
https://github.com/orgs/PelicanPlatform/discussions/1734#discussioncomment-11621001

For example, the new RegisteredPrefixUpdate struct from registry/registry_pubkey_update.go uses snake case when it defines JSON tags like json:"client_nonce". Unless we're worried about a specific backwards compatibility issue here, these should be defined like json:"clientNonce"

@h2zh h2zh force-pushed the multiple-private-keys branch from 19ce7bb to 85bc504 Compare December 20, 2024 19:30
Comment on lines 303 to 372
func() { assert.NoError(t, egrp.Wait()) }()
cancel()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly the deadlock and looks like some bad copy/pasting without understanding what the original code does.

The original was:

	defer func() { require.NoError(t, egrp.Wait()) }()
	defer cancel()

Since defer is executed LIFO, this will invoke cancel() followed by the Wait() (to wait on the canceled functions to clean up).

The revised code is effectively this:

         defer func() {
	        func() { require.NoError(t, egrp.Wait()) }()
	        cancel()
          }

Which waits on the group to stop, then shuts it down, a clear deadlock.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out and fixing this lingering problem! Having said that, this is just an intermediate problem in the tests - the root problem is topology mock server doesn't response in TestRegistryKeyChainingOSDF and test timed out in TestRegistryKeyChaining (see the error logs in the tests result). These two errors were suppressed if the other two tests in client_commands_test.go are commented out. Justin figured out there is a test state leak because commenting out the other tests allowed both TestRegistryKeyChaining and TestRegistryKeyChainingOSDF to pass. I'm still looking for the leak and hoping fixing the problems mentioned in your new PR comments could potentially fix the leak.

Copy link
Collaborator Author

@h2zh h2zh Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  Test result Test result
Comment Out ⬇️ TestRegistryKeyChainingOSDF TestRegistryKeyChaining
TestServeNamespaceRegistry
TestMultiPubKeysRegisteredOnNamespace
Both test funcs above ✔️ ✔️

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: all failed tests on macOS env are fixed.

Copy link
Collaborator

@bbockelm bbockelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a lot to do here. Some minor stylistic / convention comments in the review -- plus a few major requests:

  1. Redo the handshake between origin and registry to allow arbitrary replacement of the public keys.
  2. Move the key refresh logic into the config module and add unit tests. Make the updates atomic and have things in lexicographical order.
  3. Do not deprecate the existing issuer key (at least not in this version!) to allow external folks to adapt to the new setup. Do not delete a key that an admin has provided. Instead, behave as if the issuer key was found within the key directory.

@@ -39,7 +39,7 @@ func keygenMain(cmd *cobra.Command, args []string) error {
return errors.Wrap(err, "failed to get the current working directory")
}
if privateKeyPath == "" {
privateKeyPath = filepath.Join(wd, "issuer.jwk")
privateKeyPath = filepath.Join(wd, "issuer-keys")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? Before, the variable was a file and now it's a directory. All the code below operates on this as if it was a directory.

On first glance, appears that the tool is broken, no?

Copy link
Collaborator Author

@h2zh h2zh Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the private/public generation logic lies in config.GetIssuerPublicJWKS(), which was already updated to incorporate the new keys in IssuerKeysDirectory instead of only one key file at IssuerKey. The pelican generate keygen command works as expected (creating a key in ./issuer-keys).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. This line:

		privateKeyPath = filepath.Join(wd, "issuer-keys")

suggests things are a directory. However, 4 lines later:

	if err = os.MkdirAll(filepath.Dir(privateKeyPath), 0755); err != nil {
		return errors.Wrapf(err, "failed to create directory for private key at %s", filepath.Dir(privateKeyPath))
	}

filepath.Dir(privateKeyPath) returns the parent directory, meaning os.MkdirAll doesn't create the full directory path. The code needs to be reviewed to ensure the assumptions haven't changed.

config/init_server_creds.go Outdated Show resolved Hide resolved
}

// Rename the existing private key file and set destination path
fileName := fmt.Sprintf("migrated_%d_%s.pem",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the appropriate mkstemp to create a unique file. This is not sufficient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also use the mkstemp logic in another similar function GeneratePEM

newKey, err := loadIssuerPrivateJWK(issuerKeyFile)
issuerKeysDir := param.IssuerKeysDirectory.GetString()
currentIssuerKeysDir := getCurrentIssuerKeysDir()
// Handles runtime changes to the issuer keys directory (configured via "IssuerKeysDirectory" parameter).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests should take a different approach: reset the config state between the calls. Otherwise, it clutters up the runtime code.

config/init_server_creds.go Outdated Show resolved Hide resolved
// Check the origin is authorized to update (possessing the public key used for prefix initial registration)
// Parse all public keys of the sender into a JWKS
var clientKeySet jwk.Set
if data.AllPubkeys == nil { // backward compatibility - AllPubkeys only exists in the payload in Pelican 7.12 or later
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this.

data is from the client -- why is there backward compatibility code in an API call introduced in 7.12?

Copy link
Collaborator Author

@h2zh h2zh Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion, I greatly revamp this part of code. There should be no backward compatibility issue here, and data.AllPubkeys is never going to be nil. Only the origin with the updated version could send registered public key update request to the registry. And there's a check on the origin side to make sure data.AllPubkeys is not nil. In client_commands.go:

privateKeys := config.GetIssuerPrivateKeys()
if len(privateKeys) == 0 {
 return errors.Errorf("The server doesn't have any private key in memory")
}
issuerPubKeysSet := jwk.NewSet()
for _, privKey := range privateKeys {
  pubKey, err := privKey.PublicKey()
  if err != nil {
	  return errors.Wrapf(err, "failed to get the public key of a private key")
  }
  if err = issuerPubKeysSet.AddKey(pubKey); err != nil {
	  return errors.Wrap(err, "failed to add public key to all keys JWKS")
  }
}

continue
}

if registryDbKid == clientKid {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary.

The client should prove it possess one of the existing keys.

However, this additionally forces the client to keep the key it used for the challenge in the updated key set. That seems unnecessary. We should simply see if the key used to sign is one of the existing known keys.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce the for-loop and avoid verify the signature with each public key in the registry database, I attach the MatchedKeyId in the payload sent back from the client. Then the registry can verify the signature using the right key (if it exists) in one step.

// Look for the matched key sent back from the client in `registryDbKeySet`
matchedKey, found := registryDbKeySet.LookupKeyID(data.MatchedKeyId)
if !found {
  return nil, permissionDeniedError{
	  Message: fmt.Sprintf("The client that tries to update prefix '%s' cannot be authorized because it doesn't contain any public key matching the existing namespace's public key in db", prefix),
  }
}
rawkey := &ecdsa.PublicKey{}
if err := matchedKey.Raw(rawkey); err != nil {
  return nil, errors.Wrap(err, "failed to generate the raw key from the matched key")
}
keyUpdateAuthzVerified := verifySignature(clientPayload, keyUpdateAuthzSignature, rawkey)

}

// Check if any key in `clientKeySet` matches a key in `registryDbKeySet`
registryDbKeysIter := registryDbKeySet.Keys(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These loops could be replaced by the LookupKeyID method.

Copy link
Collaborator Author

@h2zh h2zh Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also apply this nice function in other places previously using for loop.


// Update the public key of registered prefix(es) if the http request passed client and server verification for nonce.
// It returns the response data, and an error if any
func updateNsKeySignChallengeCommit(ctx *gin.Context, data *RegisteredPrefixUpdate) (map[string]interface{}, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the design here is still a bit confused. For example, it removes the previous key -- not clear that is matching the intent here.

Let me suggest something cleaner:

  1. At the start of handshake, the registry sends all known public keys for the namespace.
  2. The client receives the list of public keys and selects a corresponding private key from that list (if one is available) to sign the challenge.
  3. The client sends the server the updated set of public keys plus the proof of possession from one of the known keys.

Thus, at the end of the handshake:

  • The client has demonstrated it owns a private key from the existing known public keys.
  • The client's desired set of keys is synchronized with the registry; the client can remove or add any arbitrary keys in the call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. For now, client's entire on-disk key set is synchronized with the registry at the end the handshakes.

registry/client_commands_test.go Outdated Show resolved Hide resolved
@bbockelm
Copy link
Collaborator

Oh! One final item from the review -- please rebase and squash down the commits to a clean list.

Each commit should be self-contained (compiles, passes tests), covering a single piece of functionality (such as adding the key directory logic; or adding the API to update the keys), and have a detailed commit message. Please refer to https://osg-htc.org/technology/software/git-software-development/#making-good-pull-requests-the-art-of-good-commits for some best practices. Given the scope of the changes, I'd expect 4-5 commits to come out on this branch -- not 60.

I typically like keeping the git history in response to reviewer comments to help the reader understand the evolution of ideas. However, at this point, the history of the PR is so messy that it'd be impossible to glean that knowledge -- more value is in a clean branch.

@h2zh h2zh changed the title Allow multiple private keys but use the latest one Server-side key refresh and update registered public keys for origin server Jan 7, 2025
@h2zh h2zh changed the title Server-side key refresh and update registered public keys for origin server Server-side key refresh and update registered public keys from origin server Jan 7, 2025
@h2zh h2zh requested a review from bbockelm January 7, 2025 14:48
@h2zh h2zh modified the milestones: v7.12.0, v7.13.0 Jan 8, 2025
@h2zh h2zh mentioned this pull request Jan 9, 2025
Copy link
Collaborator

@bbockelm bbockelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting quite close here.

I think the biggest issue is something that got dropped along the way -- when is it safe to use the latest key?

Right now, the latest key is used immediately upon discovery -- even though the key takes time to propagate through the system. A prior attempt kept a copy of the "previous key" and used that for some undefined period (basically, until the timer on the key update loop fired again). Further, on this review, I noticed that we also are rotating the keys for the encrypted config file logic which also has issues (see inline comments).

I think we will need follow-up work (I'll writeup an issue) to handle when it is safe to use the newest key. However, in the meantime, I think we can move ahead with this PR with a small change: always use the "oldest" (where "old" is actually based on the filename sorting you currently have) key. This means that, as new keys get added, what we use for signing stays stable, avoiding both issues above (for now). That will allow us to get the bulk of the PR in without having to make it even larger.

@@ -39,7 +39,7 @@ func keygenMain(cmd *cobra.Command, args []string) error {
return errors.Wrap(err, "failed to get the current working directory")
}
if privateKeyPath == "" {
privateKeyPath = filepath.Join(wd, "issuer.jwk")
privateKeyPath = filepath.Join(wd, "issuer-keys")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. This line:

		privateKeyPath = filepath.Join(wd, "issuer-keys")

suggests things are a directory. However, 4 lines later:

	if err = os.MkdirAll(filepath.Dir(privateKeyPath), 0755); err != nil {
		return errors.Wrapf(err, "failed to create directory for private key at %s", filepath.Dir(privateKeyPath))
	}

filepath.Dir(privateKeyPath) returns the parent directory, meaning os.MkdirAll doesn't create the full directory path. The code needs to be reviewed to ensure the assumptions haven't changed.

@@ -42,7 +42,7 @@ jobs:
- name: Test
run: |
make web-build
go test -tags=${{ inputs.tags }} -timeout 15m -coverpkg=./... -coverprofile=${{ inputs.coverprofile }} -covermode=count ./...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Seems unrelated - I'd prefer to avoid unnecessary code changes.

@@ -380,17 +377,18 @@ func SaveConfigContents_internal(config *OSDFConfig, forcePassword bool) error {
// Take a hash, and use the hash's bytes as the secret.
func GetSecret() (string, error) {
// Use issuer private key as the source to generate the secret
issuerKeyFile := param.IssuerKey.GetString()
err := GeneratePrivateKey(issuerKeyFile, elliptic.P256(), false)
privateKey, err := GetIssuerPrivateJWK()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - this is going to be a problem.

The secret is used to encrypt the private parts of the config then again to decrypt when reading. This means that we can't rotate out the key.

Any thoughts on how we could fix this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought after some reflection --

The encrypted config is used almost exclusively on the client side for now. Thus, this becomes a blocker for removing old keys but not adding new keys. If the config module had an API to get the oldest key in addition to the current issuer, the issue is avoided until we want to tackle automatic key rotation.

So, one way forward:

  • Add this new API to config to get the "oldest" signing key.
  • Update this call to use the new API.
  • File an issue to introduce a stable secret key to be used for encrypting client configuration. This way, we don't have to do the full solution now.

func createDirForKeys(dir string) error {
gid, err := GetDaemonGID()
if err != nil {
return errors.Wrap(err, "failed to get deamon gid")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrap(err, "failed to get deamon gid")
return errors.Wrap(err, "failed to get daemon gid")

return err
}

if !d.IsDir() && filepath.Ext(d.Name()) == ".pem" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two items:

  • I want to avoid single-character variable names. Humans struggle with understanding them from context. something like dirEntry is much better than d.
  • Use SkipDir to avoid recursing into a subdirectory.

@@ -196,6 +196,13 @@ func LaunchModules(ctx context.Context, modules server_structs.ServerType) (serv
lc.Register(ctx, rootGroup)
}

// Start a routine to periodically refresh the private key directory
// This ensures that new or updated private keys are automatically loaded and registered
if modules.IsEnabled(server_structs.RegistryType) || modules.IsEnabled(server_structs.OriginType) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop the conditional -- for servers, we always want to keep the private keys refreshed.

case <-ctx.Done():
log.Debugln("Stopping periodic check for private keys directory.")
return nil
case <-ticker.C:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch this routine to using server_utils.LaunchWatcherMaintenance instead. The server_utils.LaunchWatcherMaintenance does effectively the same thing except it also sets up a watcher on the directory and will get notified immediately about changes.

@@ -281,3 +283,165 @@ func NamespaceDelete(endpoint string, prefix string) error {
fmt.Println(string(respData))
return nil
}

func NamespacesPubKeyUpdate(privateKey jwk.Key, prefixes []string, siteName string, namespacePubKeyEndUpdatepoint string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add documentation string to exported function.

@@ -382,7 +382,6 @@ func AddNamespace(ns *server_structs.Namespace) error {
if ns.AdminMetadata.Status == "" {
ns.AdminMetadata.Status = server_structs.RegPending
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, please reduce unnecessary whitespace churn.

if err != nil {
return nil, errors.Wrapf(err, "Failed to convert the latest public key(s) from json to string format for the prefix %s", prefix)
}
clientPubkeyString := string(allPubkeyData)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on the string representation seems dicey here. Do a logical comparison of the public keys, as you do in the config module.

This permits multiple private keys to be associated with a Pelican
service and synchronize its keys with the registry.
The next version of XRootD will respond with a 201 (Created) on successful
upload (which is the correct status code).  Unfortunately, `pelican`
had a hard check for 200 (OK) instead.

Accept either.  Additionally, use the symbolic name for these so we
can better `grep` through the source code to find use of particular
status codes.
In the branch for multi-key support, the keygen tool was changed
to create keys in the key directory instead of the working directory.

While that might have been a better original design, I don't want to
switch the semantic (and particularly the documentation!) at this point.

Restoring the original behavior.
There remain too many open questions for when we can safely use
the newest private key for signing operations.  Until then, be
conservative and use the oldest one.

Update the unit tests along with the new behavior.
@bbockelm bbockelm force-pushed the multiple-private-keys branch from 0edb6b5 to 060c7ab Compare January 22, 2025 20:59
@bbockelm bbockelm dismissed turetske’s stale review January 22, 2025 21:00

Dismissing review as it was mostly addressed in subsequent work.

@bbockelm bbockelm self-requested a review January 22, 2025 21:01
Copy link
Collaborator

@bbockelm bbockelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're at the "good enough" point that we can get this into a release without making the PR larger.

Follow-up PRs to finish this work include:

@bbockelm bbockelm merged commit 09929fc into PelicanPlatform:main Jan 22, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request go Pull requests that update Go code origin Issue relating to the origin component registry Issue relating to the registry component security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow origin issuer key to be rotated
4 participants