-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change(server-info): use a new approach(keepalive) to report DP info #6202
change(server-info): use a new approach(keepalive) to report DP info #6202
Conversation
@zaunist any update? |
I have update the plugin code, still need to update test case. |
cc @tzssangglass @shuaijinchao. Please take a look, thanks. |
apisix/plugins/server-info.lua
Outdated
if not newres then | ||
core.log.error("failed to set server_info: ", err) | ||
end | ||
|
||
-- set lease_id to ngx dict | ||
local ok, err = internal_status:set("lease_id", newres.body.lease_id) | ||
if not ok then | ||
core.log.error("failed to save boot_time to shdict: ", err) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not continue to set ngx dict
if newres
fetch fails
apisix/plugins/server-info.lua
Outdated
local key = "/data_plane/server_info/" .. server_info.id | ||
local ok, err = core.etcd.set(key, server_info, report_ttl) | ||
local res, _ = core.etcd.get(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to judge failure
apisix/plugins/server-info.lua
Outdated
return | ||
end | ||
|
||
local res, _ = core.etcd.get(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
apisix/plugins/server-info.lua
Outdated
@@ -203,8 +236,7 @@ end | |||
|
|||
|
|||
function _M.init() | |||
core.log.info("server info: ", core.json.delay_encode(get())) | |||
|
|||
-- core.log.info("server info: ", core.json.delay_encode(get())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove debug and redundant code
apisix/plugins/server-info.lua
Outdated
@@ -217,13 +249,14 @@ function _M.init() | |||
return | |||
end | |||
|
|||
-- local report_ttl = attr and attr.report_ttl or default_report_ttl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
misoperation |
apisix/plugins/server-info.lua
Outdated
local ok = core.table.deep_eq(server_info, res.body.node.value) | ||
-- not equal, update it | ||
if not ok then | ||
core.log.error("failed to report server info to etcd: ", err) | ||
local newres, err = core.etcd.set(key, server_info, report_ttl) | ||
if not newres then | ||
core.log.error("failed to set server_info to etcd: ", err) | ||
return false, err | ||
end | ||
|
||
-- update lease_id | ||
local _, err = internal_status:set("lease_id", res.body.lease_id) | ||
if err ~= nil then | ||
core.log.error("failed to set lease_id to shdict: ", err) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph is same to the one in the if not res.body.node then .... end
block, we can create a function to reuse it.
apisix/plugins/server-info.lua
Outdated
end | ||
|
||
if not res.body.node then | ||
local newres, err = core.etcd.set(key, server_info, report_ttl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the key
for individual instances should be unique but should we use the atomic_set
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
apisix/plugins/server-info.lua
Outdated
local server_info, err = get() | ||
if not server_info then | ||
core.log.error("failed to get server_info: ", err) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, careless.
apisix/plugins/server-info.lua
Outdated
core.log.error("failed to get server_info from etcd: ", err) | ||
end | ||
|
||
if not res.body.node then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not res.body.node then | |
if not res.body.node then |
local ok, err = core.etcd.set(key, server_info, report_ttl) | ||
local res, err = core.etcd.get(key) | ||
if err ~= nil then | ||
core.log.error("failed to get server_info from etcd: ", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err
is not nil
, why can we go ahead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a apisix node be initialed, the ETCD don't have server-info data, I think the get
func will report error, so I continue the code and set the info to ETCD. Maybe use if not res
be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if get
method reports error when no data, you should check the error type and decide whether returning or going ahead.
apisix/plugins/server-info.lua
Outdated
end | ||
|
||
if not res.body.node then | ||
local res_new, err = core.etcd.set(key, server_info, report_ttl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic for initial setting can be merged to the below logic (modified_version
is zero).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we don't have two similar code blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set modified_version
to nil
, I'm not sure if it can be set to nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already said you can set it to 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated it.
origin issue has we need to add a test case to cover this? |
Hi, Community. I have encountered follow problems.
Should I choose to turn on the server-info plugin by default or turn it off by default? If it is turned on by default, I don't know how to deal with the error here, can anyone give me some help? |
How do I change the |
apisix/plugins/server-info.lua
Outdated
@@ -132,51 +125,107 @@ local function get_server_info() | |||
end | |||
|
|||
|
|||
local function set(key,value,ttl, modified_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local function set(key,value,ttl, modified_index) | |
local function set(key, value, ttl, modified_index) |
return 503, {error_msg = err} | ||
end | ||
|
||
lease_id = res_new.body.lease_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check if lease_id
is not nil?
apisix/plugins/server-info.lua
Outdated
local res_new, err = core.etcd.atomic_set(key, value, ttl, modified_index) | ||
if not res_new then | ||
core.log.error("failed to set server_info: ", err) | ||
return 503, {error_msg = err} | ||
end | ||
|
||
lease_id = res_new.body.lease_id | ||
|
||
-- set or update lease_id | ||
local ok, err = internal_status:set("lease_id", lease_id) | ||
if not ok then | ||
core.log.error("failed to set lease_id to shdict: ", err) | ||
return 503, {error_msg = err} | ||
end | ||
|
||
return 200, nil | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the two steps should be in sync, and we can use lock in the next PR to keep them in sync.
It doesn't have to be hostname, it can be any other action that can be used to punish service info refresh. |
Hi, @tzssangglass . I checked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we need to test is keepalive instead of getting server info, so we could modify the info stored in shared DICT to test.
apisix/plugins/server-info.lua
Outdated
local res_new, err = core.etcd.set(key, value, ttl) | ||
if not res_new then | ||
core.log.error("failed to set server_info: ", err) | ||
return 503, {error_msg = err} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why should we use 503
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
503 Service Unavailable, I think it is OK to return this error when the plugin cannot provide the service, what do you suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 500
is better. 503
is commonly used for overload.
I have tried to change the server-info information in the share dict in the |
you could push the test case again to see why it failed. |
My mistake, I should have pointed out that the test cases are written in shell script. Let's drop this test case for now and focus on finishing this PR. |
Yes, if we need to get the latest |
No need to change in this PR. |
apisix/plugins/server-info.lua
Outdated
end | ||
|
||
if server_info.etcd_version == "unknown" then | ||
local res, err = core.etcd.server_version() | ||
if not res then | ||
core.log.error("failed to fetch etcd version: ", err) | ||
return | ||
return 500, {error_msg = err} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use 503 like other similar cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
lease_id, err = internal_status:get("lease_id") | ||
if not lease_id then | ||
core.log.error("failed to get lease_id from shdict: ", err) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a return
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already add return code block
apisix/plugins/server-info.lua
Outdated
local data, err = core.json.encode(server_info) | ||
if not data then | ||
core.log.error("failed to encode server_info: ", err) | ||
return | ||
return false, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The returned value is inconsistent?
And it seems the returned value of report
is not checked:
apisix/apisix/plugins/server-info.lua
Line 278 in d5166c4
report(nil, report_ttl) |
@zaunist |
Hi, @spacewander. Previously, server-info would write a lot of dirty data and cause performance problems with ETCD, but with keepalive, this problem is avoided and a simple http request does not take up any more performance. So a shorter ttl would allow for more timely reporting of Apache APISIX node status. One minute heartbeat is too long in my opinion, so it was changed to 36 seconds, or we could even set a shorter heartbeat, such as nacos, which has a default of 5 seconds. |
Yes. But keepalive request is still a write operation. Since this PR is to reduce the usage of etcd, using a lower interval will increase the number of requests to etcd.
Nacos doesn't write the heartbeat to etcd... |
OK, I will change it to 60s. |
end | ||
|
||
if not res.body.node then | ||
local ok, err = set(key, server_info, report_ttl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's strange that get returns nil, err
while set returns status, table
...
Can we use nil, err
in both cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
apisix/plugins/server-info.lua
Outdated
local ok, err = set(key, server_info, report_ttl) | ||
if not ok then | ||
core.log.error("failed to set server_info to etcd: ", err) | ||
return 503, {error_msg = err} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we get a nested err table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I just return it here without the err? That means if there is an error, it will just be printed to error.log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #6202 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply.
apisix/plugins/server-info.lua
Outdated
local ok, err = set(key, server_info, report_ttl) | ||
if not ok then | ||
core.log.error("failed to set server_info to etcd: ", err) | ||
return 503, {error_msg = err} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return directly as we don't (and can't) care about the result of report
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will fix it.
What this PR does / why we need it:
fix #6176
Pre-submission checklist: