Fix dynamic discovery timeout to not retry sending requests, but wait for the same request to complete #2337

leszko · 2022-03-24T12:38:26Z

What does this pull request do? Explain your changes. (required)
The last change with the dynamic timeout caused an issue in teststreams described in Discord.

The dynamic timeout PR changed the discovery between B<>O to work as follows:

B tries to discover O in 500 ms
If no Os found, then increase the timeout to 1s and send the requests again
In no Os found, then increase the timeout to 2s and send the requests again

The problem is that if O responds in, let's say, 1.5s. Then, now the discovery will now take: 0.5s + 1s + 1.5s = 3s
So even though the response is in 1.5s, the overall time from the black-box perspective is 3s. That pollutes the orch teststream data.

This PR changes the dynamic timeout to not send new request, but to wait for the initial requests to complete (as proposed initially by @yondonfu)

Specific updates (required)

How did you test each of these updates (required)

Tested with local geth. Introduced artificial delay in O and checked the logs in B.

Does this pull request close any open issues?

Checklist:

Read the contribution guide
make runs successfully
All tests in ./test.sh pass
~~README and other documentation updated~~
Pending changelog updated

yondonfu · 2022-03-24T13:52:46Z

The problem is that if O responds in, let's say, 1.5s. Then, now the discovery will now take: 0.5s + 1s + 1.5s = 3s
So even though the response is in 1.5s, the overall time from the black-box perspective is 3s. That pollutes the orch teststream data.

Hm yeah I think this makes sense.

orch-tester relies on metrics scraped from the B for the avg upload, download and round trip times. When B receives a segment, it will record the start time here and then run discovery here. At the beginning of a stream, discovery will block until the working O set is populated (during the stream discovery would happen in the background to re-populate the working set) so if discovery takes longer then that would be reflected in the times recorded for the first segment of the stream. Additionally, discovery will block again if the working O set is empty for any subsequent segment which I think might happen in the scenario where there is only 1 O available and it doesn't return a response fast enough before the next segment arrives [1] - in this scenario discovery would add to the times recorded as well.

[1] I think the threshold is 2x the segment duration so if the segment duration is 2s and the previous segment has taken longer than 4s then we have to re-run discovery, but if it it took less than 4s we just send the next segment to the same O.

go-livepeer/server/broadcast.go

Line 244 in fce545f

    
           // If no new sessions are available, re-use last session when oldest segment is in-flight for < 2 * segDur

discovery/discovery.go

yondonfu

LGTM after squashing

…st (but wait for the same req/res instead)

leszko requested a review from yondonfu March 24, 2022 12:38

leszko changed the title ~~Fix dynamic timeout to not retry sending requests, but wait for the same request to complete~~ Fix dynamic discovery timeout to not retry sending requests, but wait for the same request to complete Mar 24, 2022

yondonfu reviewed Mar 24, 2022

View reviewed changes

discovery/discovery.go Show resolved Hide resolved

discovery/discovery.go Show resolved Hide resolved

leszko requested a review from yondonfu March 24, 2022 14:52

yondonfu approved these changes Mar 24, 2022

View reviewed changes

leszko added 2 commits March 24, 2022 17:08

discovery: Fix dynamic timeout to not retry sending getOrchInfo reque…

c7d57ad

…st (but wait for the same req/res instead)

Update CHANGELOG_PENDING.md

1ee7a27

leszko force-pushed the rl/fix-dynamic-discovery-timeout branch from 81a6a8a to 1ee7a27 Compare March 24, 2022 16:09

leszko merged commit ce5ab12 into livepeer:master Mar 24, 2022

leszko deleted the rl/fix-dynamic-discovery-timeout branch March 24, 2022 16:27

leszko mentioned this pull request May 10, 2022

Release v0.5.30 #2393

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dynamic discovery timeout to not retry sending requests, but wait for the same request to complete #2337

Fix dynamic discovery timeout to not retry sending requests, but wait for the same request to complete #2337

leszko commented Mar 24, 2022 •

edited

Loading

yondonfu commented Mar 24, 2022

yondonfu left a comment

Fix dynamic discovery timeout to not retry sending requests, but wait for the same request to complete #2337

Fix dynamic discovery timeout to not retry sending requests, but wait for the same request to complete #2337

Conversation

leszko commented Mar 24, 2022 • edited Loading

yondonfu commented Mar 24, 2022

yondonfu left a comment

Choose a reason for hiding this comment

leszko commented Mar 24, 2022 •

edited

Loading