Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explore state machine to expand population of routing table #934

Merged
merged 17 commits into from
Sep 25, 2023

Conversation

iand
Copy link

@iand iand commented Sep 21, 2023

No description provided.

@iand iand added the v2 All issues related to the v2 rewrite label Sep 21, 2023
@iand iand changed the base branch from v2-routing-getvalue2 to v2-develop September 21, 2023 14:05
@iand iand marked this pull request as ready for review September 22, 2023 08:23
@iand iand marked this pull request as draft September 22, 2023 08:26
@iand iand marked this pull request as ready for review September 22, 2023 14:45
exploreCfg.Clock = cfg.Clock
exploreCfg.Timeout = cfg.QueryTimeout

schedule, err := routing.NewDynamicExploreSchedule(14, cfg.Clock.Now(), time.Hour, 1, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also: // TODO: expose more config

// A RoutingBehaviour provices the behaviours for bootstrapping and maintaining a DHT's routing table.
const (
// IncludeQueryID is the id for connectivity checks performed by the include state machine.
// This identifier used for routing network responses to the state machine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This identifier used for routing network responses to the state machine.
// This identifier is used for routing network responses to the state machine.

IncludeQueryID = coordt.QueryID("include")

// ProbeQueryID is the id for connectivity checks performed by the probe state machine
// This identifier used for routing network responses to the state machine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This identifier used for routing network responses to the state machine.
// This identifier is used for routing network responses to the state machine.

@@ -28,6 +38,9 @@ type RoutingBehaviour struct {
// probe is the node probing state machine, responsible for periodically checking connectivity of nodes in the routing table
probe coordt.StateMachine[routing.ProbeEvent, routing.ProbeState]

// probe is the routing table explore state machine, responsible for increasing the occupanct of the routing table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// probe is the routing table explore state machine, responsible for increasing the occupanct of the routing table
// explore is the routing table explore state machine, responsible for increasing the occupancy of the routing table

@@ -112,7 +127,7 @@ func (r *RoutingBehaviour) notify(ctx context.Context, ev BehaviourEvent) {
case *EventGetCloserNodesSuccess:
span.SetAttributes(attribute.String("event", "EventGetCloserNodesSuccess"), attribute.String("queryid", string(ev.QueryID)), attribute.String("nodeid", ev.To.String()))
switch ev.QueryID {
case "bootstrap":
case routing.BootstrapQueryID:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible to also lift this constant out of the routing package or move all constants into that? just to be consistent

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends where the query or request is initiated. The connectivity checks performed by probe and include are initiated by the routing behaviour so the constants belong there. The bootstrap and explore queries are initiated by the state machines.

I'd like to move the behaviours into the packages with the state machines at some point so the constants would all end up there too.

queryID := coordt.QueryID("bootstrap")

qry, err := query.NewFindCloserQuery[K, N, any](b.self, queryID, b.self.Key(), iter, tev.KnownClosestNodes, qryCfg)
qry, err := query.NewFindCloserQuery[K, N, any](b.self, BootstrapQueryID, b.self.Key(), iter, tev.KnownClosestNodes, qryCfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a great benefit now but there's a coordt.Message type

Suggested change
qry, err := query.NewFindCloserQuery[K, N, any](b.self, BootstrapQueryID, b.self.Key(), iter, tev.KnownClosestNodes, qryCfg)
qry, err := query.NewFindCloserQuery[K, N, coordt.Message](b.self, BootstrapQueryID, b.self.Key(), iter, tev.KnownClosestNodes, qryCfg)

// before proceeding to the next bucket.
//
// The frequency of running an explore varies by bucket distance, such that closer buckets are processed more frequently.
type Explore[K kad.Key[K], N kad.NodeID[K]] struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will an expore operation also be triggered when we are low on peers in the routing table?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not automatically. But it can be added. I will create an issue for it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Target: st.Target,
}
case *query.StateQueryFinished[K, N]:
span.SetAttributes(attribute.String("out_state", "StateExploreFinished"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other places I used tele.AttrOutEvent(...). I also used the defer to capture the event that's passed out of this function.

Comment on lines 445 to 460
func (s *DynamicExploreSchedule) NextCpl(ts time.Time) (int, bool) {
// is an explore due yet?
next := (*s.cpls)[0]
if !next.Due.After(ts) {
// update its schedule

interval := float64(s.interval) + float64(s.interval)*float64(s.maxCpl-next.Cpl)*s.multiplier
interval *= 1 + rand.Float64()*s.jitter

next.Due = ts.Add(s.cplInterval(next.Cpl))
heap.Fix(s.cpls, 0) // update the heap
return next.Cpl, true
}

return -1, false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more go-like to unnest the happy-path:

func (s *DynamicExploreSchedule) NextCpl(ts time.Time) (int, bool) {
	// is an explore due yet?
	next := (*s.cpls)[0]
	if next.Due.After(ts) {
	  return -1, false
	}
	
	...
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will make that change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 451 to 453
interval := float64(s.interval) + float64(s.interval)*float64(s.maxCpl-next.Cpl)*s.multiplier
interval *= 1 + rand.Float64()*s.jitter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to cplInterval

Suggested change
interval := float64(s.interval) + float64(s.interval)*float64(s.maxCpl-next.Cpl)*s.multiplier
interval *= 1 + rand.Float64()*s.jitter

@dennis-tra
Copy link
Contributor

dennis-tra commented Sep 22, 2023

Two general comments:

  • we can only have one explore query at a time, right?
  • the v1 logic actually explores from lowest CPL (bucket 0) to highest CPL (bucket 15). The description of the NewDynamicExploreSchedule and the code suggest that the logic in here does it the other way around. I think it's more reasonable to do it the way v1 does it because virtually half of the queries to the routing table will hit bucket 0 (assuming uniform keys we want to query) so it's important to have that bucket filled early.
    Given the ExploreSchedule interface, it would just be a matter of adding a new implementation so we could also add it in a separate PR.

@iand
Copy link
Author

iand commented Sep 25, 2023

Two general comments:

  • we can only have one explore query at a time, right?

Yes, by design. Do you think that is an issue?

  • the v1 logic actually explores from lowest CPL (bucket 0) to highest CPL (bucket 15). The description of the NewDynamicExploreSchedule and the code suggest that the logic in here does it the other way around. I think it's more reasonable to do it the way v1 does it because virtually half of the queries to the routing table will hit bucket 0 (assuming uniform keys we want to query) so it's important to have that bucket filled early.

We had some discussion on this in the design issue and @guillaumemichel seemed to be of the opinion that exploring closer buckets more frequently was useful (one comment here).

My reasoning is that since every peer discovered is a candidate for inclusion in the routing table, querying for nearby peers will almost certainly discover peers that populate further buckets first. Bucket 0 is likely to fill up with every explore we do.

Given the ExploreSchedule interface, it would just be a matter of adding a new implementation so we could also add it in a separate PR.

Yes, this would be quite simple to add.

@dennis-tra
Copy link
Contributor

Cool, thanks for the clarification! Makes sense 👍

Yes, by design. Do you think that is an issue?

Nope, all good! Just wanted to verify my understanding of what's going on.

@iand iand merged commit ae5a094 into v2-develop Sep 25, 2023
11 checks passed
@iand iand deleted the v2-sm-explore branch September 25, 2023 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2 All issues related to the v2 rewrite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants