-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent config GA #2747
agent config GA #2747
Conversation
0de5ad6
to
00af2db
Compare
Codecov Report
@@ Coverage Diff @@
## master #2747 +/- ##
=========================================
- Coverage 80.16% 78.8% -1.37%
=========================================
Files 83 82 -1
Lines 4589 4288 -301
=========================================
- Hits 3679 3379 -300
+ Misses 910 909 -1
|
failing test can't be fixed until Kibana side is merged, as Kibana returns |
depends on elastic/kibana#46995 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You explained that GA changes will be breaking. Does that mean that the agents using the beta implementation would break when talking to APM Server or would they just stop applying ACM?
agentcfg/model.go
Outdated
func dotKey(k1, k2 string) string { | ||
if k1 == "" { | ||
return k2 | ||
// ZeroResult creates a Result struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for exporting this.
beater/api/config/agent/handler.go
Outdated
return c.Request.URL.Query().Get(agentcfg.Etag) | ||
} | ||
|
||
func sanitize(isRum bool, settings agentcfg.Settings) agentcfg.Settings { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agentcfg
package has knowledge about RUM
, and the Query
has an IsRum
flag. IMO this sanitize method should be part of fetching the result from the agentcfg
package.
agentcfg/cache.go
Outdated
if !c.logger.IsDebug() { | ||
return | ||
if !authorized(q.IsRum, result.Source.Agent) { | ||
return ZeroResult(), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this implementation non authorized RUM results are not cached. IMO this should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. If then the same request comes trough the right endpoint, the result would be a wrong empty result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kibana is responding the same way to this query, no matter who requests it, right? So I don't see why caching the result
here would lead to wrong results.
Ofc I did not mean to cache the ZeroResult
but the result
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case i don't know why i would want to cache something that i am not even returning, but im fine either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning is the following:
What is the purpose of caching here?
To avoid unnecessary calls to Kibana based on legitimate requests from agents. But as soon as those requests can be triggered from requests to unsecured RUM endpoint, also to prevent overloading Kibana on any kind of attacks.
The authorized
check would only return false
if the agents had a bug or for malicious requests. In the case of malicious requests it makes sense to also avoid sending the same requests over and over to Kibana.
However, if you add it to the cache, you would probably also need to size limit the cache, as otherwise you open up another attack vector.
I don't see why the decision in which cases something should be added to the cache is in direct relation on whether or not you are returning this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure you find this useful, but I gave this some thought when introducing the caching #2220 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I am changing the caching key to be service name + env, instead of Etag.
When there are many RUM agents, they will not send an ifnonematch
arg until they get one from a previous call, which means that all of them would have to hit Kibana at least one.
Caching the service name solves this problem.
On the other side, I don't think is necessary to have a cache limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -167,6 +177,7 @@ func backendMiddleware(cfg *config.Config, m map[request.ResultID]*monitoring.In | |||
|
|||
func rumMiddleware(cfg *config.Config, m map[request.ResultID]*monitoring.Int) []middleware.Middleware { | |||
return append(apmMiddleware(m), | |||
middleware.SetRumFlagMiddleware(), | |||
middleware.SetRateLimitMiddleware(cfg.RumConfig.EventRate), | |||
middleware.CORSMiddleware(cfg.RumConfig.AllowOrigins), | |||
middleware.KillSwitchMiddleware(cfg.RumConfig.IsEnabled())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still bothers me that all of these middleware functions are run if someone has RUM disabled; as a user I'd expect the KillswitchMiddleware
to be run before the CORS and rate limiter middleware at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i actually think we should do the CORS dance properly and return to the browsers the response they expect, otherwise the browser might show an error in the lines of "domain not allowed" when the root cause is other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lack knowledge about what is the standard behavior related to CORS and browsers. If the advantage is worth the potential additional costs I'm ok with it.
@@ -31,7 +31,7 @@ func SetRateLimitMiddleware(cfg *config.EventRate) Middleware { | |||
|
|||
return func(h request.Handler) (request.Handler, error) { | |||
return func(c *request.Context) { | |||
c.RateLimiter = store | |||
c.RateLimiter = store.ForIP(c.Request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the name to include IP
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha! i was about to do that, but then i thought you will reply with something like this: #2706 (comment)
not sure what is the criteria now, but will change as I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. The logic that the rate limiter limits by IP is now in the middleware. It was not before. So now it makes sense to reflect it in the middleware name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As clarified offline, this is a comment on L29, not L34.
Changed in 0788847
they will just not get any config updates, so they won't apply it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sanitize
method is based on the querie's isRum
, the cache key is based on Service
attributes. You call sanitize
before adding the value to the cache. In case a request is made for a backend service via the RUM endpoint and then again via the backend endpoint this will lead to wrong results.
The authorized
check occurs only for non-cached information. As soon as information is cached, which is based on service attributes and ignoring RUM/backend, the information is just returned. If an unauthorized RUM request follows an authorized backend request, this will result in non-authorized response.
On the other side, I don't think is necessary to have a cache limit.
From what I see the cache can potentially grow infinite with requests for non-existing services. Since we had a bit of an offline discussion regarding caching, please describe why you decided a cache limit is not necessary.
Not true, as far I can see.
Right, will update.
Lets talk about that offline |
Within the |
This test 8bbae34#diff-6756162984ecc44fbead19b809440eaeR305 covers that scenario and passes locally, I have to see what's up |
it always worked because each handler has its own cache, added another test completeness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for one comment related to removing sourrinding "
of etags I only left nitpick comments.
agentcfg/cache.go
Outdated
func (c *cache) fetchAndAdd(q Query, fn func(Query) (*Doc, error)) (doc *Doc, err error) { | ||
id := q.ID() | ||
|
||
func (c *cache) fetch(q Query, fetch func() (Result, error)) (Result, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand from your arguments in an offline discussion, you are in favor of not using abbreviations, so please use query
here instead of q
.
agentcfg/cache.go
Outdated
if !c.logger.IsDebug() { | ||
return | ||
if c.logger.IsDebug() { | ||
c.logger.Debugf("Cache size %v. Added ID %v.", c.gocache.ItemCount(), result.Source.Etag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging result.Source.Etag
is not correct anymore.
agentcfg/cache.go
Outdated
return nil, found | ||
} | ||
return val.(*Doc), found | ||
func authorized(isRum bool, agent string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's only used in tests now
if err != nil { | ||
return nil, err | ||
} | ||
func (f *Fetcher) request(r io.Reader) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove empty line 72
agentcfg/fetch.go
Outdated
func (f *Fetcher) Fetch(query Query) (Result, error) { | ||
req := func() (Result, error) { | ||
result, error := newResult(f.request(convert.ToReader(query))) | ||
return result, error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can directly say return newResult...
now
agentcfg/model.go
Outdated
if err := parse(settings, out, "", h); err != nil { | ||
return nil, err | ||
} | ||
func (q Query) toString() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer calling this ID
as that's better describing what it does, and toString
is usually used for a nicer presentation of an instance.
tests/system/test_integration_acm.py
Outdated
config_overrides = { | ||
"logging_json": "true", | ||
"kibana_enabled": "true", | ||
"acm_cache_expiration": "1s", | ||
} | ||
|
||
@classmethod | ||
def setUpClass(cls): | ||
super(AgentConfigurationTest, cls).setUpClass() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary?
tests/system/test_integration_acm.py
Outdated
@@ -43,7 +49,10 @@ def create_service_config(self, settings, name, env=None, _id="new"): | |||
) | |||
|
|||
def update_service_config(self, settings, _id, name, env=None): | |||
return self.create_service_config(settings, name, env, _id=_id) | |||
return self.create_service_config(settings, name, agent="python", env=env, _id=_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: No need for setting agent="python"
, it reflects the default.
tests/system/test_integration_acm.py
Outdated
etag = r1.headers["Etag"] | ||
|
||
r2 = requests.get(self.rum_agent_config_url, | ||
params={"service.name": service_name, "ifnonematch": etag.replace('"', '')}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it necessary to remove surrounding "
of etag? It should be possible to just use the etag that is returned by the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the rum agent will actually send it without double quotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But shouldn't the server be able to process with or without quotes? And why is the RUM agent sending without quotes (I probably missed that discussion somewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approved the PR, as I haven't followed discussions about this. Please ensure then RUM agent properly removes the quotes from the response they get if you don't change it in the server.
07e4e65
to
520fc45
Compare
Revisiting this question:
What do you actually meant with "break"? I bumped the min version from 7.3 to 7.5, so the agents would get an error response code, and the should handle it, therefore, not applying any config |
8f94157
to
3ced356
Compare
3ced356
to
5e0a3b2
Compare
5e0a3b2
to
e62788f
Compare
fixes #2694, fixes #2630 , fixes #2646
depends on elastic/kibana#46995
required by elastic/apm-agent-rum-js#439