Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate mnemonic internally in entropy-tss #1128

Merged
merged 15 commits into from
Dec 18, 2024
Merged

Conversation

ameba23
Copy link
Contributor

@ameba23 ameba23 commented Oct 22, 2024

Addresses #1015 by not allowing mnemonics to be passed in (except in development / testing) via the command line or an environment variable. Instead they are always randomly generated internally.

However i think this does not completely resolve the issue - see #1127

@ameba23
Copy link
Contributor Author

ameba23 commented Oct 22, 2024

Im pretty sure failing CI is unrelated to this PR - its the good ol' reshare test

Copy link
Member

@JesseAbram JesseAbram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also tagging @vitropy cuz will have an effect on devops flow

@vitropy
Copy link
Contributor

vitropy commented Oct 22, 2024

Also tagging @vitropy cuz will have an effect on devops flow

@JesseAbram, please tag teams, not individuals: @entropyxyz/system-reliability-engineers, not me. Thanks!

Copy link
Contributor

@vitropy vitropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this will not just "have an impact on devops flow," it will break the possibility of deterministically bringing up a specific TSS server with a prior knowledge of a given private key. This is not really feasible from a devops perpsective at all because we will not be able to replace a broken/failed/otherwise unusable TSS instance without replacing the TSS instance with a new key. If that key or account needs to be communicated to any other part of the system, we can not make use of this change, period.

If this understand is correct, then this will simply break automation, full stop. Please either correct my understanding or do not merge this.

@JesseAbram
Copy link
Member

If I understand correctly, this will not just "have an impact on devops flow," it will break the possibility of deterministically bringing up a specific TSS server with a prior knowledge of a given private key. This is not really feasible from a devops perpsective at all because we will not be able to replace a broken/failed/otherwise unusable TSS instance without replacing the TSS instance with a new key. If that key or account needs to be communicated to any other part of the system, we can not make use of this change, period.

If this understand is correct, then this will simply break automation, full stop. Please either correct my understanding or do not merge this.

yes it will but also using TDX would break this flow anyways, as well we can automate around this, by having the last step of a spin up process run a script to fund the TSS and inform the chain what the TSS is. Giving access to the kvdb or the TSS accounts outside of TDX removes the security of running TDX

@vitropy
Copy link
Contributor

vitropy commented Oct 22, 2024

yes it will but also using TDX would break this flow anyways, as well we can automate around this, by having the last step of a spin up process run a script to fund the TSS and inform the chain what the TSS is.

Okay, but that then sounds like it has nothing to do with devops because it's something the TSS will do for itself by itself and speak to the chain on its own behalf, in which case I have no objections.

@JesseAbram
Copy link
Member

yes it will but also using TDX would break this flow anyways, as well we can automate around this, by having the last step of a spin up process run a script to fund the TSS and inform the chain what the TSS is.

Okay, but that then sounds like it has nothing to do with devops because it's something the TSS will do for itself by itself and speak to the chain on its own behalf, in which case I have no objections.

mmmm will probably have to run a script and need access to a secure funded account but other than that ya

@ameba23
Copy link
Contributor Author

ameba23 commented Oct 23, 2024

So maybe this is not useful without the other part where the TSS informs the chain of its public keys. With things as they currently are this would need to happen before the validate extrinsic gets submitted by the node operator, as the node operator needs to know these public keys for that extrinsic.

The obvious place to do this would be the attestation pallet's request_attestation extrinsic, as this needs to anyway be called by the TSS on launch and before validate, as validate needs to include a quote. The problem is it doesn't give us the x25519 public key, and i think @HCastano would object to us putting it in there as he's quite keen on not mixing up the logic of the attestation pallet and staking extension pallet. Also, currently the node operator has no way of getting the quote from the TSS node which they will need for the validate extrinsic (which is a separate issue i guess).

So my proposal would be that the TS server has a GET route which returns its account ID and x25519 public keys as JSON. Not sure if we could just put this in the existing /healthz route or we need a new one. The operator can call this route, then fund the TSS account. Then we need yet another route where they tell the TSS that it has been funded and should request an attestation from the chain, get the nonce, and respond the the HTTP request with a quote that can be used in the validate extrinsic. This would all be much simpler if we had free transactions.

@entropyxyz/system-reliability-engineers - i feel like it is tricky to design TS launch/attestation process accommodating for both TSS nodes which are hard-coded at genesis as well as dynamically joining ones. Ideally i would like that we remove TS server details from the testnet chainspec and have them all join after launching the chain.

@ameba23
Copy link
Contributor Author

ameba23 commented Oct 23, 2024

I have added a GET route for retrieving the account ID and x25519 public key.

@ameba23
Copy link
Contributor Author

ameba23 commented Nov 5, 2024

@vitropy i am re-requesting a review as this is currently blocked by requested changes. I know this is going to break deployment scripts, but if the plan is to have the next release run on TDX hardware, then the deployment scripts are anyway going to need some changes.

@ameba23 ameba23 requested a review from vitropy November 5, 2024 10:45
@ameba23 ameba23 added this to the v0.4.0 milestone Nov 13, 2024
@ameba23 ameba23 self-assigned this Nov 13, 2024
@HCastano
Copy link
Collaborator

HCastano commented Dec 4, 2024

This has been open for a while, we should really try and resolve it soon.

At the moment, having deterministic accounts makes the following easier:

  1. Pre-funding TSS servers as appropriate (currently being done via the chainspec
    balances but could be done "manually" ahead of time too)
  2. Associating TSS accounts to particular a particular node/stash account (done through
    the chainspec's thresholdServers field atm)
  3. Possibly associate them at the infrastructure level (e.g what account should go on
    what machine and who shold be allowed to talk to whom), but correct me if I'm wrong
    here

For (1), I'm thinking that we can work around this by exploring free transactions again.
There's quite a lot to think about here in order to do this safely (i.e not opening up
DoS vectors for the chain) and it's not as simple as slapping Pays::No to some
extrinsics. I still need to do more reading, but maybe having a SignedExtension to
faciliate this would be the way.

There's only three extrinsics I found that are called by the TSS (confirm_jump_start,
confirm_key_reshare, and request_attestation) which would make the scope of this
pretty small.

For (2), as Peg suggested we can move away from doing this in the chainspec and instead
having the node operator do this after genesis (and FWIW we can do this for (1) as well).
From an automation point of view this is doable but a bit annoying.

The flow would be:

  • spin up chain nodes
  • spin up TSS nodes
  • query TSS info via GET /info
  • have TSS request an attestation on chain (pallet_attestation::request_attestation)
  • have operator register their TSS on chain (pallet_staking_extention::validate)

Before commenting on (3) I'd need to know a bit more about how the infrastructure is
being spun up atm and what assumptions are being used.

cc @entropyxyz/core-developers @entropyxyz/system-reliability-engineers

@ameba23
Copy link
Contributor Author

ameba23 commented Dec 5, 2024

@vitropy i can't merge this until you approve 🙏

@ameba23
Copy link
Contributor Author

ameba23 commented Dec 6, 2024

Related to this PR: #1203

@vitropy
Copy link
Contributor

vitropy commented Dec 6, 2024

Currently, the chain spec file is generated entirely locally first, before literally anything happens on any remote machine. This means the TSS server's secrets are stored in our secrets management system.

Next, the TSS servers are coming online independently of any other piece of the network infrastructure. It happens after its associated Substrate node on the same host machine is launched, but that isn't strictly required.

query TSS info via GET /info

If I understand correctly, this route is simply providing the same information that the TSS server spits out in its .entropy/account_id and .entropy/public_key files, no? I would rather continue to use the filesystem as a communication mechanism for this information, not an HTTP request, if it's all the same to you. But I also have no objection of an REST-ful route being added to access the same information, of course. That being said, if this information is not required by pieces of the network that are not filesystem-local, then there should not be a network-accessible way to get this information in the first place.

have operator register their TSS on chain (pallet_staking_extention::validate)

I confess I do not actually understand what this means, mechanically speaking. What command do I run for this, or what HTTP endpoint do I need to request to make this happen?

@ameba23
Copy link
Contributor Author

ameba23 commented Dec 6, 2024

If I understand correctly, this route is simply providing the same information that the TSS server spits out in its .entropy/account_id and .entropy/public_key files, no? I would rather continue to use the filesystem as a communication mechanism for this information, not an HTTP request, if it's all the same to you.

When entropy-tss is running in a confidential virtual machine, you wont have ssh access to it. So the only way to read those files from the host would be to read them out of the virtual machine image file. Currently you can do this, but ideally it will be encrypted and there will be no way for the host operator to access it (although we haven't figured out how to do that bit yet).

have operator register their TSS on chain (pallet_staking_extention::validate)

In production, the network is going to start small and grow. When a new validator joins, they need to submit an extrinsic (a transaction to the chain), which contains the details (account ID, http endpoint, etc) of their TSS server. pallet_staking_extention::validate is that extrinsic. Currently with our testnet setup we don't need to use it because all nodes on the network are pre-configured at genesis.

* master:
  Add TDX test network chainspec (#1204)
  Test CLI command to retrieve quote and change endpoint / TSS account in one command (#1198)
  Bump the patch-dependencies group with 2 updates (#1212)
  Bump thiserror from 2.0.4 to 2.0.6 in the patch-dependencies group (#1206)
  Downgrade parity-scale-codec as version we currently use has been yanked (#1205)
  Bump clap from 4.5.22 to 4.5.23 in the patch-dependencies group (#1202)
@HCastano HCastano dismissed vitropy’s stale review December 16, 2024 14:53

We discussed the DevOps implications and can work around them.

Copy link
Collaborator

@HCastano HCastano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Danke

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
* master:
  Bump colored from 2.1.0 to 2.2.0 (#1218)
  Bump thiserror from 2.0.6 to 2.0.7 in the patch-dependencies group (#1217)
@ameba23 ameba23 merged commit 6fa0bef into master Dec 18, 2024
8 checks passed
@ameba23 ameba23 deleted the peg/generate-mnemonic branch December 18, 2024 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants