Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

Dialer v2: modular and composable dialer #122

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

raulk
Copy link
Member

@raulk raulk commented May 7, 2019

This PR deprecates #88.

Description

Here we introduce a Pipeline construction for the dialer. Copied from the godoc:

A Pipeline assembles a set of modular components, each with a clearly-delimited responsibility, into a dialing engine. Consumers can replace, delete, add components to customize the dialing engine's behavior.

The Pipeline comprises five components. We provide brief descriptions below. For more detail, refer to the respective godocs of each interface:

  • Preparer: runs preparatory actions prior to the dial execution. Actions may include: deduplicating, populating timeouts, circuit breaking, validating the peer ID, etc.
  • AddressResolver: populates the set of addresses for a peer, either from the peerstore and/or from other sources, such as a discovery process.
  • Planner: schedules dials in time, responding to events from the environment, such as new addresses discovered, or dial jobs completing.
  • Throttler: throttles dials based on resource usage or other factors.
  • Executor: actually carries out the network dials.

Recommendations for review

Here's the recommended walkthrough to review this PR:

  1. Read the package-level godocs on dial/doc.go for an overview of the solution.
  2. Go through the godocs of the interfaces on dial/interfaces.go.
  3. Explore the Request and Job entities in dial/request.go and dial/job.go.
  4. Review the godocs on the Pipeline, as well as the pipeline logic itself in dial/pipeline.go.
  5. Review the implementations of the interfaces in the rest of the files.

@ghost ghost assigned raulk May 7, 2019
@ghost ghost added the status/in-progress In progress label May 7, 2019
@magik6k magik6k self-requested a review May 8, 2019 11:40
@@ -12,14 +12,14 @@ import (
const maxDialDialErrors = 16

// DialError is the error type returned when dialing.
type DialError struct {
type Error struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: we could borrow the technique that multierror uses to defer allocs until/if an error is recorded.

var err *dial.Error
err.recordErr(addr, err)   // does new(dial.Error) if the pointer is nil

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging into this a bit I don't see anything I'd do differently, this looks like a big, big improvement over the current system.

Random notes/thoughts:

  • Would be nice to test in go-ipfs (sharness / real network / gateways)
  • If we'd have a way to suspend pipelines, and an additional transport layer, we might be able to implement connection migration quite easily. (leave this for the future)
  • I like this, it seems to provide a good base for further development

case StatusBlocked:
return "Status(Blocked)"
default:
return fmt.Sprintf("Status(%d)", s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use %x (or %b) for readability.

released <- job
}

func (t *throttler) throttleFd(job *Job) (throttle bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future it might be nice to expose this (and other throttler info) to planners so e.g. we won't defer dials to relays and/or defer dialing expensive transports

@raulk
Copy link
Member Author

raulk commented May 22, 2019

@Stebalien @vyzo we probably want to start moving on this soon.

@raulk
Copy link
Member Author

raulk commented May 28, 2019

Calling out @vyzo @bigs specifically for a first round of review. A bunch of things are now depending on the dialler changes (self-dial, prioritising non-relay addresses, etc.), so we should move on this urgently to avoid dropping the ball.


@magik6k thanks for the review! 🎉

Would be nice to test in go-ipfs (sharness / real network / gateways)

Agree. I've so far tested with the DHT crawler, which is the most dial-intensive thing we have, and it looks good memory and CPU wise. But I agree we should integrate into IPFS, and test.

If we'd have a way to suspend pipelines, and an additional transport layer, we might be able to implement connection migration quite easily. (leave this for the future)

Could you elaborate on this? Suspending pipelines was on my mind but for a different use case (e.g. activating transports only if we need to dial peers with those transports). It allows for a more dynamic environment, but not an immediate requirement.

I like this, it seems to provide a good base for further development

Yeah, optional self-dial can now be added by removing the default Validator, and adding one that recognises self-dials and returns a singleton in-memory net.Pipe (see libp2p/go-libp2p#638 (review)). Also, we can order, prioritise and stagger dials for transports we'd like to avoid (e.g. relay).

Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to take a while to review properly.

One first observation, this all seems a little static.
I would like to have a mechanism to supply at a minimum the address resolver for individual dial jobs.

Use case in mind: direct connection upgrade.
We will want to dial while already having an existing (relay) connection. For that to happen we will have to bypass the peerstore completely and only dial using user-specified addresses.
Can this be done with this design? Hard to say; if not, something must change.

@raulk
Copy link
Member Author

raulk commented Aug 14, 2019

@bigs has stepped up to move this PR over the finish line :-)

@bigs
Copy link
Contributor

bigs commented Aug 15, 2019

@raulk @Stebalien just rebased this and updated the branch to reflect the core refactor, amongst other things. tests passing, this should be ready for a review.

//
// Likewise, the background discovery process must stop when the provided context is closed.
type AddressResolver interface {
Resolve(req *Request) (known []ma.Multiaddr, more <-chan []ma.Multiaddr, err error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulk this could probably just return a buffered channel, with size len(known). would simplify the api a bit. thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Mind creating a branch with that change, and sending a WIP PR to this branch to see how the code simplifies?

// channel to signal it has finished planning.
//
// The lifetime of asynchronous processes should obey the request context.
NewPlan(req *Request, initial []ma.Multiaddr, out chan<- []*Job) (Plan, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

furthermore, it would remove an argument here

JobComplete(completed *Job)

// ResolutionDone is called when the AddressResolver has finished.
ResolutionDone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this become an event on the eventbus? perhaps a future change.

// Run starts the throttling process. It receives planned jobs via the incoming channel, and emits jobs to dial via
// the released channel. Both channels are provided by the caller, who should abstain from closing either during
// normal operation to avoid panics. To safely shut down a throttler, first call Close() and await return.
Run(incoming <-chan *Job, released chan<- *Job)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this also implements Closer, maybe it makes sense to have the Throttler own the release channel and, thus, be responsible for closing it. this would require an API change to

Run(incoming <-chan *Job) (released chan<- *Job)

@raulk
Copy link
Member Author

raulk commented Feb 24, 2020

Resuming this thread.

@vyzo:

I would like to have a mechanism to supply at a minimum the address resolver for individual dial jobs.

IMO, that would be too complex. We really don't want users of the swarm/network providing address resolvers per-dial; that leaks an abstraction. Instead, the address resolver can take contextual decisions based on the dial that it's processing.

@raulk
Copy link
Member Author

raulk commented Feb 24, 2020

If this becomes a common case, we introduce an higher-level AddressResolverDispatcher that conforms to the AddressResolver interface and proxies a bunch of address resolvers behind a set of predicates. The dispatcher evaluates predicates sequentially against the incoming dial request, and chooses the address resolver that matches, or errors if none match.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants