Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLI] DKG ceremony crash upon existing validator_keys folder #2881

Closed
crisog opened this issue Feb 13, 2024 · 1 comment
Closed

[CLI] DKG ceremony crash upon existing validator_keys folder #2881

crisog opened this issue Feb 13, 2024 · 1 comment
Assignees
Labels
protocol Protocol Team tickets V1

Comments

@crisog
Copy link

crisog commented Feb 13, 2024

🐞 Bug Report

Description

This shouldn't have happened. Apparently, all the other nodes in the cluster received their key shares except me. It doesn't seem to be an obvious recovery point, the only ideas that come to head are:

  1. Everyone deletes their validator_keys folder, which is not practical when you deal with users using dappnode and have no direct access to the OS. (This is assuming DKG can be repeated)
  2. A new cluster is created

In my opinion, what should happen is: if validator_keys folder exists, then a new directory gets created with a random prefix, i.e. xyz_validator_keys and the keys are written there. This way the key shares wouldn't have been lost after a successful DKG ceremony

🔬 Minimal Reproduction

docker run -u $(id -u):$(id -g) --rm -v "$(pwd)/:/opt/charon" obolnetwork/charon:v0.18.0 dkg --definition-file="https://api.obol.tech/dv/<DV_CLUSTER>" --publish

🔥 Error


15:28:54.256 INFO dkg        Connected to peer 2 of 3                 {"peer": "nervous-painter"}
15:48:55.335 INFO dkg        Connected to peer 2 of 3                 {"peer": "strange-jewelry"}
16:28:54.630 INFO dkg        Connected to peer 2 of 3                 {"peer": "nervous-painter"}
16:31:05.277 INFO dkg        Connected to peer 3 of 3                 {"peer": "shy-computer"}
16:31:06.453 INFO dkg        All peers connected, starting DKG ceremony
16:31:11.644 ERRO cmd        Fatal error: mkdir /validator_keys: mkdir .charon/validator_keys: file exists
    dkg/disk.go:128 .writeKeysToDisk
    dkg/dkg.go:315 .Run
    cmd/dkg.go:35 .func1
    cmd/cmd.go:80 .func1
    main.go:19 .main

🌍 Your Environment

Operating System:

  
 Docker Image: obolnetwork/charon:v0.18.0
  

What version of Charon are you running? (Which release)

  
v0.18.0
  

Anything else relevant (validator index / public key)?

@github-actions github-actions bot added the protocol Protocol Team tickets label Feb 13, 2024
@gsora
Copy link
Collaborator

gsora commented Feb 15, 2024

Hi @crisog!

First and foremost, could you try reproducing this issue with our latest release, v0.19.1?

Apparently, all the other nodes in the cluster received their key shares except me.

Since the DKG process is a P2P process, this is unfortunately a thing that can happen: maybe a peer closed its terminal, maybe it crashed, maybe their internet connection failed.

While your terminal showed the validator_keys-related error, other peers should've shown messages stating that the DKG process has failed and must be re-executed: one can always re-run the DKG process with the same definition file by cleaning their .charon directory as explained here.

The keys that other peers have received should be ignored, since the DKG process did not finish in an ordered fashion for all peers.

In my opinion, what should happen is: if validator_keys folder exists, then a new directory gets created with a random prefix, i.e. xyz_validator_keys and the keys are written there. This way the key shares wouldn't have been lost after a successful DKG ceremony

As you pointed out, in the context of e.g. a Dappnode deployment even if the DKG created a different validator_keys directory, one would have to dig into the Dappnode configuration files and figure out a way to point the validator stack to this new directory.


Thank you for pointing out this UX issue with Dappnode and similar tools, food for thought for our development team!

@boulder225 boulder225 added the V1 label Mar 1, 2024
@pinebit pinebit assigned pinebit and unassigned gsora Mar 13, 2024
obol-bulldozer bot pushed a commit that referenced this issue Mar 14, 2024
In according with the issue description #2881, this is UX improvement - if the target validator_keys folder exists, the implementation checks if the folder is empty. If the existing directory is not empty, or in case of any IO error, the process fails.

category: feature
ticket: #2881
@pinebit pinebit closed this as completed Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol Protocol Team tickets V1
Projects
None yet
Development

No branches or pull requests

4 participants