Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: let etcd v3 read_watch handle the case where a chunk contains partial event or multiple events #154

Merged
merged 14 commits into from
Mar 3, 2022

Conversation

nic-6443
Copy link
Contributor

@nic-6443 nic-6443 commented Feb 18, 2022

Try to fix #153, we have verified in our environment that can solve that problem, but the test cases still need time to complete.

@CLAassistant
Copy link

CLAassistant commented Feb 18, 2022

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@membphis membphis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing test cases

@nic-6443
Copy link
Contributor Author

We have a tricky problem, our CI includes a test of ETCD version 3.3.0, but this version of ETCD has a bug in the grpc-gateway dependency it relies on to implement HTTP API, which causes every event json to not have a trailing newline, which is the key I use to split the event.
This means the code in this PR does not work properly on version 3.3.0 of ETCD.

References:

Upgrade github.com/grpc-ecosystem/grpc-gateway v1.2.2 to v1.3.0.

Mapping streaming APIs to newline-delimited JSON streams.

To summarize, ETCD uses grpc-gateway to implement the HTTP API, and grpc-gateway explicitly states that it will use newlines as a separator for stream-type grpc responses, but because of a bug introduced in the historical version, it does not pass the test cases of v3.3.0.

@nic-6443
Copy link
Contributor Author

@tokers @starsz @membphis @spacewander Any idea?

@membphis
Copy link
Contributor

@spacewander
Copy link
Contributor

We can drop the support of etcd < 3.4

@nic-6443
Copy link
Contributor Author

We can drop the support of etcd < 3.4

Got it, thanks.

@nic-6443 nic-6443 marked this pull request as ready for review February 27, 2022 11:45
return nil, err
body = chunk
else
body = body .. chunk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use table.concat to avoid creating too much temp str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

end
if not utils.is_empty_str(body) then

if not utils.is_empty_str(chunk) and sub_str(chunk, -1) == "\n" then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use string.byte would be better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

lib/resty/etcd/v3.lua Outdated Show resolved Hide resolved
end
if not utils.is_empty_str(body) then

if not utils.is_empty_str(chunk) and str_byte(chunk, -1) == str_byte("\n") then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the response chunk doesn't contain the \n and we'll continue reading the next chunk? But if the connection is aborted / timed out this time, so the last chunk was missed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, the caller can rewatch based on the last revision it got, so it can start consuming from the correct revision again without miss any event.

return nil, err
body = chunk
else
body = tab_concat({body, chunk})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can call tab_contat out of the while.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not good, loop may execute more than twice.

Copy link
Contributor

@membphis membphis Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this way is worse than body = body .. chunk

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not good, loop may execute more than twice.

I mean just tab_concat once to concat all chunk

Copy link
Contributor Author

@nic-6443 nic-6443 Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I misunderstand advice from spacewander.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it carefully, and since this concat branch will only be executed in rare cases(while one chunk larger than nginx proxy_buffer_size), so I think it would be more appropriate to use a string directly, which avoid creating a table with only one element in most cases.
@membphis @spacewander @starsz Do you agree with this conclusion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just use .. is enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use .. but add a comment to descript why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if not chunk then
break
end

if not body then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like this branch is not executed, as body has default value ""?

Copy link
Contributor Author

@nic-6443 nic-6443 Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, my fault, body should initial with nil.

if event.kv.value then -- DELETE not have value
event.kv.value = decode_base64(event.kv.value or "")
event.kv.value = self.serializer.deserialize(event.kv.value)
local chunks, error = split(body, [[\n]], "jo")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use err would be better? error is a function in Lua.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but since err is already defined outside of read_watch, if I use err, our lint will complain about duplicate variable names, so I replace error with split_err.

t/v3/key.t Show resolved Hide resolved
@nic-6443 nic-6443 merged commit 5347a42 into master Mar 3, 2022
@nic-6443 nic-6443 deleted the fix-read-watch branch March 3, 2022 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: watch responses may be incorrectly parsed in the case of ETCD being proxied by a load balancer like nginx
6 participants