Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: Send requested progress notifications through watchStream (fix #15220) #15237

Merged
merged 3 commits into from
Apr 5, 2023

Conversation

scpmw
Copy link
Contributor

@scpmw scpmw commented Feb 2, 2023

Progress notifications requested using ProgressRequest were sent directly using the ctrlStream, which means that they could race against watch responses in the watchStream.

This would especially happen when the stream was not synced - e.g. if you requested a progress notification on a freshly created unsynced watcher, the notification would typically arrive indicating a revision for which not all watch responses had been sent.

This changes the behaviour so that v3rpc always goes through the watch stream, using a new RequestProgressAll function that closely matches the behaviour of the v3rpc code - i.e.

  1. Generate a message with WatchId -1, indicating the revision for all watchers in the stream

  2. Guarantee that a response is (eventually) sent

The latter might require us to defer the response until all watchers are synced, which is likely as it should be. Note that we do not guarantee that the number of progress notifications matches the number of requests, only that eventually at least one gets sent.

@scpmw
Copy link
Contributor Author

scpmw commented Feb 2, 2023

Previous discussion on this commit: scpmw@8de2177#r98961857

@ahrtr
Copy link
Member

ahrtr commented Feb 3, 2023

Please signoff the commit and resolve the pipeline errors.

cc @mitake

@ahrtr
Copy link
Member

ahrtr commented Feb 3, 2023

$ git rebase HEAD~1 --signoff
$ git push origin guarantee_progress_notify_order -f

@scpmw scpmw force-pushed the guarantee_progress_notify_order branch from 785735a to 7e589e5 Compare February 3, 2023 08:23
@codecov-commenter
Copy link

codecov-commenter commented Feb 3, 2023

Codecov Report

Merging #15237 (c055f3c) into main (9505d72) will decrease coverage by 0.17%.
The diff coverage is 73.33%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main   #15237      +/-   ##
==========================================
- Coverage   74.76%   74.59%   -0.17%     
==========================================
  Files         416      416              
  Lines       34351    34369      +18     
==========================================
- Hits        25682    25638      -44     
- Misses       7029     7075      +46     
- Partials     1640     1656      +16     
Flag Coverage Δ
all 74.59% <73.33%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/storage/mvcc/watchable_store.go 87.11% <61.90%> (-2.85%) ⬇️
server/etcdserver/api/v3rpc/watch.go 84.12% <100.00%> (-1.59%) ⬇️
server/storage/mvcc/watcher.go 100.00% <100.00%> (ø)
client/v3/concurrency/mutex.go 61.03% <0.00%> (-5.20%) ⬇️
client/v3/concurrency/session.go 85.10% <0.00%> (-4.26%) ⬇️
server/storage/mvcc/kvstore_compaction.go 95.74% <0.00%> (-4.26%) ⬇️
server/etcdserver/api/v3rpc/util.go 70.96% <0.00%> (-3.23%) ⬇️
client/v3/leasing/txn.go 88.09% <0.00%> (-3.18%) ⬇️
server/etcdserver/api/v3rpc/interceptor.go 74.47% <0.00%> (-3.13%) ⬇️
server/etcdserver/api/v3election/election.go 66.66% <0.00%> (-2.78%) ⬇️
... and 21 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@scpmw scpmw force-pushed the guarantee_progress_notify_order branch from 7e589e5 to c055f3c Compare February 3, 2023 09:16
@scpmw
Copy link
Contributor Author

scpmw commented Feb 3, 2023

Done, I think.

Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't dig into code, however I run #15235 on this and found an issue with implementation.

Looks like same watch event is emitted in twice. From the local run:

panic: duplicate operation in two responses on revision 332
---
first:
{
    "Header": {
        "cluster_id": 4512387775551231959,
        "member_id": 11173105686495536258,
        "revision": 332,
        "raft_term": 2
    },
    "Events": [
        {
            "kv": {
                "key": "MA==",
                "create_revision": 2,
                "mod_revision": 332,
                "version": 78,
                "value": "MzI4"
            }
        }
    ],
    "CompactRevision": 0,
    "Canceled": false,
    "Created": false
}
---
second:
{
    "Header": {
        "cluster_id": 4512387775551231959,
        "member_id": 11173105686495536258,
        "revision": 333,
        "raft_term": 3
    },
    "Events": [
        {
            "kv": {
                "key": "MA==",
                "create_revision": 2,
                "mod_revision": 332,
                "version": 78,
                "value": "MzI4"
            }
        },
        {
            "kv": {
                "key": "Mw==",
                "create_revision": 3,
                "mod_revision": 333,
                "version": 92,
                "value": "MzMw"
            }
        }
    ],
    "CompactRevision": 0,
    "Canceled": false,
    "Created": false
}

This shows two consecutive watch responses that include same event. First watch returned events for revisions 332-332 and second 332-333.
This situation doesn't happen on main branch nor when manual progress notifications are disabled.

@scpmw
Copy link
Contributor Author

scpmw commented Feb 3, 2023

Uh, that's concerning. This doesn't happen for my test script - how do I reproduce this? I tried make test-linearizability for your branch, but that seems to yield me a "connection refused" error from a goPanicFailpoint. I hope I'm not overlooking something obvious here?

@serathius
Copy link
Member

Sorry I still need to document linearizability tests properly. Linearizability tests require a etcd server binary compiled with gofail enabled. To do that please run

make gofail-enable
make build
make gofail-diable

@scpmw
Copy link
Contributor Author

scpmw commented Feb 5, 2023

Ah, thanks, that made it work. Yes, I am getting the same error, but this might be a separate issue... I disabled the progress notification entirely server-side (i.e. commented out the code answering ProgressRequest) for both my branch and the version from your branch, and after disabling the Expected at least one notify check it still reproduced your error. So fairly confident that my fix is not what made this appear (that would have been really puzzling).

I'll see whether I can dig some more to find out why it happened in the first place. Using my test script I was unable to reproduce responses getting sent twice, though interestingly enough I found other instances where the revision went backwards. This time it was on actual watch responses, not progress notifications though. Could even be a connection there - if your script does reconnects, a wrong revision in the header might lead to duplicates? In any case, need to make sure.

@ahrtr ahrtr marked this pull request as draft February 6, 2023 07:07
@ahrtr
Copy link
Member

ahrtr commented Feb 6, 2023

The PR #15248 should can fix the duplicate event issue. No matter how we fix the RequestProgress issue, we should guarantee that the watchRequest.rev never decrease. cc @mitake

@ahrtr
Copy link
Member

ahrtr commented Feb 6, 2023

Mark this PR as draft until it's confirmed that it indeed fixes the RequestProgress issue.

@ahrtr
Copy link
Member

ahrtr commented Feb 7, 2023

FYI. The root cause of the duplicated event issue: #15248 (comment)

@ahrtr
Copy link
Member

ahrtr commented Feb 8, 2023

Confirmed that this PR indeed can fix this issue #15220. But the solution has major issue, because it requires all watchers (from all clients) to be synced, obviously it isn't good. Instead, we should only require all the watchers which share the same gRPC stream to be synced.

We can manage to resolve it, but I don't see a good reason to continue to support it as we already have timed progress notification. It may complicate the overall implementation, and is also error prone. The source of the issue is coming from the addition of RequestProgress. Please read #9855 and kubernetes/kubernetes#59848 to get more background. The motivation of adding the RequestProgress is to get the answer to question "what's my progress", and accordingly know whether the local cache is up to date. Read kubernetes/kubernetes#59848 (comment) for a quick reference. But the existing timed progress notification should be enough.

I am thinking probably we should deprecate the manual RequestProgress. Actually K8s doesn't use the manual RequestProgress at all. See #15260. @mitake @ptabor @serathius @spzala
@wojtek-t @liggitt @dims any objections or comments?

@liggitt
Copy link
Contributor

liggitt commented Feb 8, 2023

@wojtek-t or @jpbetz might know more about the plans for requested progress notification than I do... I thought there was still a desire to use that in Kubernetes and it just hadn't happened yet

@scpmw scpmw force-pushed the guarantee_progress_notify_order branch from c055f3c to 3e79020 Compare February 8, 2023 19:32
@scpmw
Copy link
Contributor Author

scpmw commented Feb 8, 2023

@ahrtr - ah, good point. I had overlooked that the watchable is shared between multiple watchStreams. That obviously meant that the fix was wrong. Should be fixed now, also addressing @mitake's comments.

Deprecating manual progress notifications would send me back to the drawing board on how to determine the database state of watched keys from a watch stream with multiple watchers - this was the only feature in the API that looked like it would cross-synchronise watchers (after all, the documentation effectively claimed it would!). So not a fan, but obviously not up to me in the end.

@scpmw scpmw force-pushed the guarantee_progress_notify_order branch 2 times, most recently from 416eda7 to 1725aa9 Compare February 8, 2023 21:18
server/storage/mvcc/watchable_store.go Outdated Show resolved Hide resolved
@@ -58,6 +58,13 @@ type WatchStream interface {
// of the watchers since the watcher is currently synced.
RequestProgress(id WatchID)

// RequestProgressAll requests a progress notification for the
// entire watcher group. The responses will be sent through
Copy link
Member

@ahrtr ahrtr Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the comment to something like below, (not a native English speaker, please feel free to adjust the wording.)

RequestProgressAll requests a progress notification for all
the watchers (substreams) which share the same gRPC stream. 
......

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-worded it. Felt like it might be the wrong level of abstraction to talk about gRPC (strictly speaking we don't know where the WatchResponses go) and substreams (that's a client-side idea?), so I left it just at "share the same stream".

server/storage/mvcc/watchable_store.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Feb 8, 2023

this was the only feature in the API that looked like it would cross-synchronise watchers

Yes, this is still true. The overall complexity seems acceptable, and that's good news, so I am not insist on deprecating it. But the manual progress notification needs to require the write lock, so I think it may cause reduce of performance if users use it too much.

So the timed progress notification is still recommended.

@ahrtr
Copy link
Member

ahrtr commented Apr 5, 2023

@scpmw could you please backport this PR to 3.5 and 3.4?

Note: we need to get the fix included in 3.5.8 and 3.4.25.

@scpmw
Copy link
Contributor Author

scpmw commented Apr 6, 2023

I can give it a try - separate PRs, I assume? And I'd assume I can skip the linearisability / robustness tests, because that would basically require also backporting the entire infrastructure?

@ahrtr
Copy link
Member

ahrtr commented Apr 6, 2023

I can give it a try - separate PRs, I assume? And I'd assume I can skip the linearisability / robustness tests, because that would basically require also backporting the entire infrastructure?

Thanks @scpmw . Yes, please submit two separate PRs, one for release-3.5, and the other for release-3.4 branch. Regarding to robustness test, It's OK to skip it to me, but defer to @serathius to comment.

@ahrtr
Copy link
Member

ahrtr commented Apr 6, 2023

@scpmw please backport this PR to release-3.5 firstly, because we need to release 3.5.8 firstly. Thank you!

@ahrtr
Copy link
Member

ahrtr commented Apr 7, 2023

@scpmw Please let me know if you need my assistance on this.

My goal is to get it done by the end of next Monday, so that we can get 3.5.8 out before KubeCon. @serathius Do you have concern on this?

@serathius
Copy link
Member

serathius commented Apr 7, 2023

If the original PR is done, I don't see anything wrong with others finishing work by backporting them. Let's wait till Monday for @scpmw to respond.

@ahrtr
Copy link
Member

ahrtr commented Apr 7, 2023

If the original PR is done, I don't see anything wrong with others finishing work by backporting them

Agreed, but @scpmw already said he can give it a try. I will take over if no any response till next Monday.

@ahrtr ahrtr mentioned this pull request Apr 10, 2023
@scpmw
Copy link
Contributor Author

scpmw commented Apr 10, 2023

You guys move fast :)

Haven't got to it yet, and likely won't until the end of the week. I'm not confident enough in my general Go skills to squeeze something like this into the margins around Easter... So if you need this done more quickly, please don't wait for me.

@ahrtr
Copy link
Member

ahrtr commented Apr 10, 2023

Thanks @scpmw for the feedback. Then let me take care of it.

serathius added a commit to serathius/etcd that referenced this pull request Mar 8, 2024
…t doesn't match watch range.

When implementing the fix for progress notifications
(etcd-io#15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested old revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
serathius added a commit to serathius/etcd that referenced this pull request Mar 8, 2024
When implementing the fix for progress notifications
(etcd-io#15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
serathius added a commit to serathius/etcd that referenced this pull request Mar 8, 2024
When implementing the fix for progress notifications
(etcd-io#15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
@serathius
Copy link
Member

This PR introduced another bug #17557

@scpmw @ahrtr do you know why we decided to defer progress notification until an event? It makes sense that we don't send a progress notification if any watch is unsynced, but why we defer notification? There is no guarantee that event comes.

The latter might require us to defer the response until all watchers are synced, which is likely as it should be. Note that we do not guarantee that the number of progress notifications matches the number of requests, only that eventually at least one gets sent.

Eventually is good, but we currently get never.

serathius added a commit to serathius/etcd that referenced this pull request Mar 8, 2024
When implementing the fix for progress notifications
(etcd-io#15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
@scpmw
Copy link
Contributor Author

scpmw commented Mar 8, 2024

Right. Mea culpa - I actually found this issue myself. The only situation this really happens is indeed on creation, because typically the reason we aren't synchronised is because there are messages left to send. So I tested that it can be fixed by just sending an "empty" message downstream in that particular case (i.e. at https://github.com/etcd-io/etcd/blob/main/server/storage/mvcc/watchable_store.go#L378). But this struck me as a fairly ugly way to approach this and I subsequently got a bit stuck trying to think of a better way.

Your proposed solution is to just not send a response to a progress request if the stream isn't fully synchronised? I think the key comment is in your patch:

As the client is unsynchronised, the server will not sent any notification, as client can infer progress from events.

With your patch that is not quite true for two reasons. Firstly - ironically - in the problematic scenario (create a new watcher on a "quiet" key, immediately send progress notification request) the client gets nothing back at all, so the client can in fact not infer progress (inconsequential as it would be in this case).

Secondly - and more subtly, so apologies in advance if I get this slightly wrong - if you see "progress" as defined in https://github.com/etcd-io/etcd/blob/main/api/etcdserverpb/rpc.proto:

// For watch progress responses, the header.revision indicates progress. All future events
// received in this stream are guaranteed to have a higher revision number than the
// header.revision number.

Then that is something you cannot tell from standard events, as in a situation where we are watching many keys it is always possible that you will more events for a certain revision. Therefore there are situations where you really want to force a progress notification to get that particular guarantee, not because you are particularly worried that you might be falling behind enough that compaction might have caught up with you.

Now admittedely this could be characterised as somewhat misusing progress notifications, and your change isn't even catastrophic (you'd just have to poll a bit until you get a notification back). There might even be conceptually better ways to do this - e.g. have some sort of flag in the response of whether there are more events at this revision? However, that introduces a protocol change and wouldn't even be that much easier to implement. So that's where I got stuck on this question...

@serathius
Copy link
Member

Then that is something you cannot tell from standard events, as in a situation where we are watching many keys it is always possible that you will more events for a certain revision. Therefore there are situations where you really want to force a progress notification to get that particular guarantee, not because you are particularly worried that you might be falling behind enough that compaction might have caught up with you.

In standard events you just use event.Kv.ModRevision to get progress.

serathius added a commit to serathius/etcd that referenced this pull request Mar 11, 2024
When implementing the fix for progress notifications
(etcd-io#15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
serathius added a commit to serathius/etcd that referenced this pull request Mar 11, 2024
When implementing the fix for progress notifications
(etcd-io#15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
serathius added a commit to serathius/etcd that referenced this pull request Mar 11, 2024
When implementing the fix for progress notifications
(etcd-io#15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
serathius added a commit to serathius/etcd that referenced this pull request Mar 11, 2024
When implementing the fix for progress notifications
(etcd-io#15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
serathius added a commit to serathius/etcd that referenced this pull request Mar 11, 2024
When implementing the fix for progress notifications
(etcd-io#15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
serathius added a commit to serathius/etcd that referenced this pull request Mar 11, 2024
When implementing the fix for progress notifications
(etcd-io#15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
serathius added a commit to serathius/etcd that referenced this pull request Mar 11, 2024
When implementing the fix for progress notifications
(etcd-io#15237) we made a incorrect
assumption that that unsynched watches will always get at least one event.

Unsynched watches include not only slow watchers, but also newly created
watches that requested current or older revision. In case that non of the events
match watch filter, those newly created watches might become synched
without any event going through.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants