Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disassociate RT membership from connectivity #50

Merged
merged 5 commits into from
Feb 26, 2020
Merged

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Feb 14, 2020

@raulk @Stebalien @aschmahmann @dirkmc

For libp2p/go-libp2p-kad-dht#283

1. Address management is not a part of this PR & will be next.
2. DHT changes will be made once this goes in.
3. IMO, moving the RT inside go-libp2p-kad-dht is a bigger refactoring that will involve multiple discussions and should be done as a separate stream of work.

table.go Outdated
}

// NewRoutingTable creates a new routing table with a given bucketsize, local ID, and latency tolerance.
func NewRoutingTable(bucketsize int, localID ID, latency time.Duration, m peerstore.Metrics) *RoutingTable {
// Passing a nil PeerValidationFnc disables periodic table cleanup.
func NewRoutingTable(ctx context.Context, bucketsize int, localID ID, latency time.Duration, m peerstore.Metrics,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious; why not make the peerValidationFnc one of the Options? We could potentially convert the other args (latency, etc) into Options as well, as long as we're changing the signature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using contexts to stop services is really an anti-pattern (yeah, I know we use it everywhere...). We should provide a close function.

Otherwise, the routing table could stop before the DHT (context cancellation isn't ordered).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien Could you elaborate a bit on this ? Why is it an anti-pattern ? Also, could you direct me to some code which uses a Close function like you mention ?

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yusefnapora I've made the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context cancellation is designed for requests: https://golang.org/pkg/context/.

Basically, if we use contexts, everything will shutdown at the same time in a random order without taking dependencies into account. If we use explicit close methods, we can close a service and then have it close the sub-components it owns, etc.

We've had problems in the past where we'd, e.g., close the datastore before we're done using it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien Thanks for this. I've made the change. Also dug through the goprocess library for this & now have a pretty good idea of how it works internally.

Copy link

@bonedaddy bonedaddy Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally speaking I don't think using context this way is bad. goprocess is the antipattern, not context + cancellation. Additionally goprocess turns non-trivial goroutines into relatively speaking expensive operations.

bucket.go Outdated Show resolved Hide resolved
options.go Outdated Show resolved Hide resolved
table.go Outdated Show resolved Hide resolved
table.go Outdated Show resolved Hide resolved
table.go Outdated
}

// NewRoutingTable creates a new routing table with a given bucketsize, local ID, and latency tolerance.
func NewRoutingTable(bucketsize int, localID ID, latency time.Duration, m peerstore.Metrics) *RoutingTable {
// Passing a nil PeerValidationFnc disables periodic table cleanup.
func NewRoutingTable(ctx context.Context, bucketsize int, localID ID, latency time.Duration, m peerstore.Metrics,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using contexts to stop services is really an anti-pattern (yeah, I know we use it everywhere...). We should provide a close function.

Otherwise, the routing table could stop before the DHT (context cancellation isn't ordered).

table.go Outdated Show resolved Hide resolved
table.go Outdated Show resolved Hide resolved
cpl_replacement_cache.go Outdated Show resolved Hide resolved
cpl_replacement_cache.go Outdated Show resolved Hide resolved
@aarshkshah1992
Copy link
Contributor Author

@yusefnapora @Stebalien Have made the changes save one, where I'd like some more information on why using contexts for service cancellation is an anti-pattern. Please take a look when you can.

@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Feb 18, 2020

@Stebalien I believe other than the one comment where we are waiting for @raulk , all your comments have been addressed. Let me know if you have any additional concerns.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks right but I'm not awake enough at the moment to give a proper review. However, I do have a question.

bucket.go Outdated Show resolved Hide resolved
@aarshkshah1992
Copy link
Contributor Author

aarshkshah1992 commented Feb 20, 2020

@Stebalien

  1. Should we dial missing peers concurrently albeit with some rate limiting vs dialing them serially as we are doing now to speed up the cleanup ?

  2. Do you see any value in ALSO testing with a lower cleanup interval of say 10 seconds with just 1 peer validation each time. That one peer to be validated would be the peer we least recently heard from/have never heard from with a bias towards higher CPLs. The traffic would be less bursty/more predictable AND RT repair gets spread across over time leading to shorter feedback cycle versus a BIG BANG repair every x minutes.

@aarshkshah1992
Copy link
Contributor Author

ping @Stebalien

@Stebalien
Copy link
Member

Should we dial missing peers concurrently albeit with some rate limiting vs dialing them serially as we are doing now to speed up the cleanup ?

Eh, that's probably not an issue for now. We're not going for speed here, we're trying to do a background cleanup.

I guess we could be too slow, but that's probably not the end of the world. Regardless, we should set a pretty short timeout.

Do you see any value in ALSO testing with a lower cleanup interval of say 10 seconds with just 1 peer validation each time. That one peer to be validated would be the peer we least recently heard from/have never heard from with a bias towards higher CPLs. The traffic would be less bursty/more predictable AND RT repair gets spread across over time leading to shorter feedback cycle versus a BIG BANG repair every x minutes.

Yes, but let's do that in a followup patch. The current code should be "good enough" and there are some tricky edge-cases here (eg., don't reconnect to the same overloaded peer every 10 seconds).

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

bucket.go Outdated Show resolved Hide resolved
}
log.Infof("failed to validated candidate=%s", c)
// remove candidate
rt.HandlePeerDead(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually necessary? We've already removed it from the queue and it shouldn't be in the table. I guess it can't hurt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary as we remove it from the replacement cache as part of HandlePeerDead .

@Stebalien
Copy link
Member

(merge when ready)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants