Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoNAT discovery time is unexpectedly long #538

Closed
ZenGround0 opened this issue Feb 14, 2019 · 9 comments
Closed

AutoNAT discovery time is unexpectedly long #538

ZenGround0 opened this issue Feb 14, 2019 · 9 comments

Comments

@ZenGround0
Copy link

AutoNAT discovery takes a long time ~5 minutes in the filecoin devnets. Talking with @raulk today this appears to be unexpected.

@raulk
Copy link
Member

raulk commented Feb 14, 2019

@ZenGround0 pointed out that we appear to need several rounds of AutoNAT to assert our NAT status, and that creates too much of a latency as it means that a NAT’ted peer just sits idle for that long before autorelay can jump into action.

@vyzo – could you have a look?

@ZenGround0
Copy link
Author

ZenGround0 commented Feb 14, 2019

NAT status is asserted after 3 tries. Here are some concrete logs:

...
12:40:49.631  INFO    autonat: Discovered AutoNAT peer QmXq6XEYeEmUzBFuuKbVEGgxEpVD4xbSkG2Rhek6zkFMp4 notify.go:33
12:40:51.168  INFO    autonat: Discovered AutoNAT peer QmXhxqTKzBKHA5FcMuiKZv8YaMPwpbKGXHRVZcFB2DX9XY notify.go:33
12:40:51.168  INFO    autonat: Discovered AutoNAT peer QmZGDLdQLUTi7uYTNavKwCd7SBc5KMfxzWxAyvqRQvwuiV notify.go:33
12:40:51.304  INFO    autonat: Discovered AutoNAT peer QmZRnwmCjyNHgeNDiyT8mXRtGhP6uSzgHtrozc42crmVbg notify.go:33
12:40:51.569  INFO    autonat: Discovered AutoNAT peer Qmd6xrWYHsxivfakYRy6MszTpuAiEoFbgE1LWw4EvwBpp4 notify.go:33
...
12:43:00.401 DEBUG    autonat: dial error through Qmd6xrWYHsxivfakYRy6MszTpuAiEoFbgE1LWw4EvwBpp4: AutoNAT error: dial failed (E_DIAL_ERROR) autonat.go:153
...
12:43:42.629 DEBUG    autonat: dial error through QmXhxqTKzBKHA5FcMuiKZv8YaMPwpbKGXHRVZcFB2DX9XY: AutoNAT error: dial failed (E_DIAL_ERROR) autonat.go:153
...
12:44:24.755 DEBUG    autonat: dial error through QmXq6XEYeEmUzBFuuKbVEGgxEpVD4xbSkG2Rhek6zkFMp4: AutoNAT error: dial failed (E_DIAL_ERROR) autonat.go:153
12:44:24.755 DEBUG    autonat: NAT status is private autonat.go:156

@vyzo
Copy link
Contributor

vyzo commented Feb 15, 2019

If it discovers multiple autonat peers, then it will try to use multiple data points to make the determination with confidence (3 is the current confidence level).
The issue here appears to be that the autonat peers are tried sequentially, which adds 42s per peer, giving us a 126s autonat determination delay.

@vyzo
Copy link
Contributor

vyzo commented Feb 15, 2019

there is also the AutoNATBootdelay = 30s and AutoNATRetryInterval = 90s.
Presumably it didn't make the determination immediately at boot, possibly because the autonat peers hadn't been discovered yet, so it made the determination in the retry.
That gives us a total 246s delay for the determination, which matches the observed time.

@vyzo
Copy link
Contributor

vyzo commented Apr 2, 2019

These two PRs fix the main source of latency:

Will update the timings in autorelay to reflect those changes once they both merge, and we should be under 1min in total time to relay advertisement with them.

@raulk
Copy link
Member

raulk commented Sep 25, 2019

@ZenGround0 revisiting this topic again to clean the backlog. Could you confirm the PRs @vyzo mentioned fixed the slowness for you? Cheers!

@ZenGround0
Copy link
Author

I'll take a look today, thanks!

@ZenGround0
Copy link
Author

logs from first autonat peer discover to nat status detection:

10:57:33.453  INFO    autonat: Discovered AutoNAT peer Qmd6xrWYHsxivfakYRy6MszTpuAiEoFbgE1LWw4EvwBpp4 notify.go:34
...
10:59:16.501  INFO    autonat: Discovered AutoNAT peer Qmd6xrWYHsxivfakYRy6MszTpuAiEoFbgE1LWw4EvwBpp4 notify.go:34
autonat: Dialback through QmZRnwmCjyNHgeNDiyT8mXRtGhP6uSzgHtrozc42crmVbg failed autonat.go:175
10:59:17.657 DEBUG    autonat: Dialback through QmXq6XEYeEmUzBFuuKbVEGgxEpVD4xbSkG2Rhek6zkFMp4 failed autonat.go:175
10:59:17.657 DEBUG    autonat: NAT status is private autonat.go:200

Empirically that's 1 minute 48 seconds from first peer detection in the logs to status detection. It is a bit longer than the 1 minute expected. This could be related to go-filecoin's peer management interfering with this loose measurement (for example go-filecoin might be shutting down the first several logged connections). In any case this is enough of an improvement that I'm good to close on my end.

@raulk
Copy link
Member

raulk commented Sep 27, 2019

@ZenGround0 cool, thanks for that confirmation (and the logs!). If this becomes a problem again, let's do another round of optimisation.

@raulk raulk closed this as completed Sep 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants