Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add liveness endpoint for doppleganger protection #131

Merged
merged 11 commits into from
Oct 25, 2022

Conversation

paulhauner
Copy link
Contributor

@paulhauner paulhauner commented Apr 18, 2021

What

Following from #64, this PR adds a /eth/v1/validator/liveness endpoint which indicates if a validator has been observed to be "live" in some epoch. The idea is that doppleganger protection should only start if some validator has is_live: false for some number of epochs.

This PR differs from @dankrad's proposal in that it doesn't ask "give me the latest attestation", instead it asks the broader question "has the validator been observed to be live?". This has the following advantages:

  • It's an easier question to answer. Clients already need to store caches for observed attesters/aggregators/proposers that answer the boolean question "has this validator already been observed". Keeping a cache of old attestations for validators is more of a burden.
  • It's a less specific question so clients are welcome to include more "liveness" sources if they see fit. It's also more future-proof since it might also evolve to encompass future message types.

Why

This provides doppleganger protection as a first-class citizen. Given that clients already need to answer the "has this validator been observed" question for gossip message verification, this seems cheap to implement. If we add this endpoint then things like Infura can also support doppleganger protection.

I understand that Nimbus has already implemented doppleganger protection (cc @arnetheduck) and Lighthouse is in the later stages of implementing it (see #2230).

Example

Request

POST /eth/v1/validator/liveness

{
  "epoch": "1",
  "indices": [
    "1"
  ]
}

Response

{
  "data": [
    {
      "index": "1",
      "epoch": "1",
      "is_live": true
    }
  ]
}

@paulhauner
Copy link
Contributor Author

paulhauner commented Apr 18, 2021

I'm not sure why CI is failing, I had a look but it's not clear to me. Perhaps someone with more CI skills could nudge me in the right direction? 🙏

type: object
properties:
epoch:
$ref: "../../../beacon-node-oapi.yaml#/components/schemas/Uint64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll find references in this file are only ../../beacon-node-oapi.yaml and that's probably the main build issue...

content:
application/json:
schema:
title: GetAttesterDutiesBody
Copy link
Contributor

@mpetrunic mpetrunic Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably change this title. If somebody generates client code from this spec, this title will affect generated type/class name.

content:
application/json:
schema:
title: GetAttesterDutiesBody
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@dankrad
Copy link

dankrad commented Apr 20, 2021

Quick question on how is this intended to be used? It only queries for a single epoch, so I assume a validator client would have to poll a few (2-5) epochs to be certain?

Maybe it's worth adding a short sentence about this intended use.

Co-authored-by: realbigsean <sean@sigmaprime.io>
@ahadda5
Copy link

ahadda5 commented May 12, 2021

@paulhauner Shouldn't this be part of the eth/v1/beacon. Ultimately you are asking the question "Who are the validators being observed to be live at this (end of) epoch?" no?
Which the beacon-chain will answer similar to other requests like ListValidators /eth/v1/beacon/states/{state_id}/validators or attestations etc..

@paulhauner
Copy link
Contributor Author

@paulhauner Shouldn't this be part of the eth/v1/beacon. Ultimately you are asking the question "Who are the validators being observed to be live at this (end of) epoch?" no?
Which the beacon-chain will answer similar to other requests like ListValidators /eth/v1/beacon/states/{state_id}/validators or attestations etc..

I'm trying to determine if the validator has been seen to sign any message during the given epoch to try and make a guess about whether or not there's some other VC running the same pubkey. This is different to asking if it exists in the state, has an attestation included in a block or if it is included in the attestation pool.

@ahadda5
Copy link

ahadda5 commented May 12, 2021

I'm trying to determine if the validator has been seen to sign any message during the given epoch to try and make a guess about whether or not there's some other VC running the same pubkey. This is different to asking if it exists in the state, has an attestation included in a block or if it is included in the attestation pool.

Got it. I looked at the other requests and saw their folder structure. I was confused as to where this get belongs. Thanks

bors bot pushed a commit to sigp/lighthouse that referenced this pull request Sep 20, 2021
…#2607)

## Issue Addressed

N/A

## Proposed Changes

Add a note to the Doppelganger Protection docs about how it is not interoperable until an endpoint facilitating it is standardized (ethereum/beacon-APIs#131).

## Additional Info

N/A


Co-authored-by: realbigsean <seananderson33@gmail.com>
@mcdee
Copy link
Contributor

mcdee commented Feb 21, 2022

There's no information in here on how far back the "epoch" parameter can/should go. Is it expected that this could be called for any epoch, or for only the most recent epochs? And if the latter, what would a reasonable number of epochs be? Perhaps back to the last finalized epoch?

@mpetrunic
Copy link
Contributor

@paulhauner could you check "Allow edits from maintainers" or update yourself? :)

@arnetheduck
Copy link
Contributor

Thanks for raising this PR - we're in the process of adding doppelganger support to our VC, so getting this settled would be nice.

There's no information in here on how far back the "epoch" parameter can/should go.

Indeed -an easy solution here is to simply remove it - in Nimbus, we only keep track of a single "last-seen" data point, which is sufficient for doppelganger, or "liveness" in general.

is_live

This flag seems redundant - I think we can capture the essence of this API by simply returning the last epoch (or indeed slot) and let consumers decide on their liveness policy.

epoch

Possibly, we could increase granularity here to slot.

@jclapis
Copy link

jclapis commented Sep 19, 2022

I'll throw my support behind this one. Post-merge, Rocket Pool has seen several people shuttling nodes around to migrate to different clients with better performance, and in some situations exporting & importing the slashing DB isn't easy to do. Doppelganger detection is a common way to miss attesting on purpose in cases like this, but we had one instance where someone tried to attach a Lighthouse VC to a Nimbus BN and it failed because the routes aren't standardized.

Having a standardized liveness route so any VC with doppelganger could work with any BN would be very well received by our node operators.

@paulhauner paulhauner dismissed stale reviews from ghost via a522c0a October 19, 2022 22:22
@paulhauner
Copy link
Contributor Author

Alrighty, I have updated this PR. My apologies for letting it linger. Thanks for @rolfyone for prodding me (several times) 🙏

There's no information in here on how far back the "epoch" parameter can/should go.

@mcdee, good point. I've added that the BN MUST support the current and previous epoch and MAY support previous epochs. Based on our implementation in Lighthouse, this seems to be the minimally useful set of information.

Indeed -an easy solution here is to simply remove it

@arnetheduck, removing epoch is a reasonable request. What it would remove is the ability to know that a validator was live in the current epoch but not the previous one. It's been a while since we implemented DG, so I'm not sure if this is an issue or not. Removing the epoch parameter makes the API slightly simpler, keeping it makes it slightly more powerful. I'm ambivalent and therefore tempted to leave it as is. I'm open to arguments.

Possibly, we could increase granularity here to slot.

The reason that I have specified epoch is since P2P validation conditions already require nodes to track which epochs they've seen attestations, however it doesn't require tracking specific slots. One of the goals of this PR was to be minimally intrusive. I'm open to switching to slots, but I think the fact that clients already have the groundwork for epochs is an appealing reason to leave it at that.

@arnetheduck
Copy link
Contributor

I'm open to arguments.

I think without epoch, the query poses a different question: "when was the validator last active" vs "was the validator active in epoch X" - answering the latter requires book-keeping of historical "activeness" which complicates the implementation

epoch in both request and response

as specified, it's somewhat ambiguous what the epoch parameter in the answer should be - if a validator was seen to sign something in epoch 5 and a query is done for epoch 6, should the validator be considered active or not (is_live)? if not, why is the epoch both in request and response? under what circumstances would epoch in the response differ from that of the request?

current and previous epoch

under what circumstances do we need to know two epochs for doppelganger, and why two specifically? it seems sufficient to know when the validator last published something - this allows configuring different paranoia policies (number of inactive epochs) for liveness without having to do multiple (potentially racy) queries. limiting it to "previous" and "current" constrains the implementation to that number of "inactive" epochs (one-and-a-few-slots)

@paulhauner
Copy link
Contributor Author

paulhauner commented Oct 20, 2022

@arnetheduck, I can see two points in your previous post:

  1. Why should we track liveness per epoch rather than "highest live epoch".
  2. Why is the epoch returned in the response.

1. Why track liveness per epoch

Firstly, we already need to track per-epoch liveness to follow the P2P spec. For instance, if you've seen an attestation in epoch n but not n - 1 you still need to propagate the epoch n - 1 attestation. Tracking just the highest seen epoch would not permit the client to follow the P2P spec.

Secondly, we should note that the P2P spec only requires the node to track the current + previous epochs (potentially the next epoch too, if you consider clock disparity).

Given those two points, let's assume that all nodes have at least this abstract data structure in order to meet the P2P propagation conditions:

struct AttestationSeen {
  previous_epoch: Set<ValidatorIndex>,
  current_epoch: Set<ValidatorIndex>,
}

If we accept that the above struct is the minimum required struct to meet the P2P propagation conditions, then I think it's clear that the best way to query it is by epoch. I argue that querying this struct for "highest seen epoch" is strictly worse for the following reasons:

  1. It's impossible to know the liveness status of the previous epoch if the validator was live in the current epoch.
  2. There is no way for the server to communicate that the requested epoch is out of range. If the epoch is provided in the request then the server can say "that epoch is out of range", this reduces edge-cases when the client and server have clock sync issues and also allows the server to communicate the lowest epoch it supports. If the client asks for the highest epoch then it gets no information back from the server about their respective clock sync or which epochs the server actually tracks.

So, I argue that "highest live epoch" is:

  • A complication when we consider what clients are already required to track.
  • Provides strictly less information.

The upside of "highest live epoch" is that it can smoosh together information about the current and previous epoch to reduce the number of calls to the BN. You made a point about "racy" queries, I reject that claim because the liveness of epoch n - 1 and n are independent values. I also argue that doppelganger queries are light enough and infrequent enough that the gains of optimising functionality away from them is premature.

2. Why is the epoch returned in the response.

The epoch in the request is always the same as the response. I don't recall why this was added (we implemented this over a year ago). Perhaps we can remove it, I don't really mind.

Summary

Unless there's a glaring hole in my logic here, I have a rather small appetite for arguing this further. It matters little to me about what we do here, I'm sure we can just shim your approach into ours if need be. If my reasoning is sound, but you still want to optimise this API call then feel free to make a new PR for it. If that gets merged rather than this then Lighthouse will implement it.

@rolfyone
Copy link
Contributor

I'm in line with your logic @paulhauner , thanks for the detailed description and your hackmd - I find it all really logical...

To me, there's been no discussion for a long time, so I think that we were happy previously with the logic also. Will flag this to make sure ppl read it on discord, but I'm inclined to approve and merge unless theres strong objections raised...

@arnetheduck
Copy link
Contributor

  1. SGTM, thanks for the rationale
  2. Tendentially it seems better to avoid redundant data as it opens up for ambiguity and implementation bugs/complexity (ie implementations must now check that the epoch is indeed correct) for no gain - annoying, but not a big deal

@paulhauner
Copy link
Contributor Author

Tendentially it seems better to avoid redundant data as it opens up for ambiguity and implementation bugs/complexity (ie implementations must now check that the epoch is indeed correct) for no gain - annoying, but not a big deal

I had a look around the other endpoints and it seems that not including the epoch in the response would be most consistent with the rest of the repo. I also think your argument makes sense. I've removed the epoch in 4bbefdb.

@rkapka
Copy link
Contributor

rkapka commented Oct 24, 2022

Nitpick:

  1. Why track liveness per endpoint

It should say epoch, not endpoint.

Copy link
Member

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@rolfyone rolfyone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. last call for any strong objections...

@rolfyone rolfyone merged commit 8a96147 into ethereum:master Oct 25, 2022
arnetheduck added a commit to arnetheduck/eth2.0-APIs that referenced this pull request Oct 27, 2022
ethereum#131 introduced a new
liveness endpoint - the request is similar to attester duties in that we
query a list of attesters for data pertaining to a particular epoch - as
such, it seems reasonable to keep the two requests similar in terms of
their URL/postdata structure.
rolfyone added a commit that referenced this pull request Nov 16, 2022
#131 introduced a new
liveness endpoint - the request is similar to attester duties in that we
query a list of attesters for data pertaining to a particular epoch - as
such, it seems reasonable to keep the two requests similar in terms of
their URL/postdata structure.

Co-authored-by: Paul Harris <paul.harris@consensys.net>
Ingridmichelledev added a commit to Ingridmichelledev/API-beacons that referenced this pull request Mar 3, 2023
ethereum/beacon-APIs#131 introduced a new
liveness endpoint - the request is similar to attester duties in that we
query a list of attesters for data pertaining to a particular epoch - as
such, it seems reasonable to keep the two requests similar in terms of
their URL/postdata structure.

Co-authored-by: Paul Harris <paul.harris@consensys.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.