-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track HTTP client requests for /write and /query with /debug/requests #8350
Conversation
This is an initial try at doing this. Likely, this needs a configuration option to turn it off (or turn it on) and likely needs to keep more statistics than it currently keeps. |
services/httpd/requests.go
Outdated
} | ||
|
||
// There is an entry in the request tracker. Increment that. | ||
if n, ok := rt.requests[info]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs an RLock
to prevent a concurrent map panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was under the impression that maps could be read from concurrently. That may mean that we can't return the in-use map to be written to /debug/vars
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrent readers are fine provided there are no writers. There is a writer here: https://github.com/influxdata/influxdb/pull/8350/files#diff-badb764b9e9789c5a3bf719f6c815aadR71
There is also a data race because the map is replaced here: https://github.com/influxdata/influxdb/pull/8350/files#diff-badb764b9e9789c5a3bf719f6c815aadR110
services/httpd/requests.go
Outdated
} | ||
|
||
requests := rt.requests | ||
rt.requests = make(map[RequestInfo]*int64, len(requests)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wiping the map each minute might be confusing as a user. In one second, you would see a bunch of writers and in the next they would all be gone, then they might start to show up again. Recording it over time might show very spiky usage too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to just take the aggregated map and store it and return that aggregated map from /debug/vars
.
Maybe it would be better to just increment the request info always? But then the person would need to read the values over time and use derivative like they do with the other internal monitoring. It seems to me like any attempt to do this without storing some historical data in the database may not work. We would also have to eventually cull any clients that just disappear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to use a ring buffer of these maps where each slot is some fixed sized time window (e.g. 10s). As you roll over to new slots, you would wipe them clean of old entries.
fcf09dd
to
b994f76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a neat idea. I tried this locally, but didn't get much value out of the resulting JSON returned. For example, running concurrent writes and queries, I just see:
{"::1":{"counter":3846}}
I think it would helpful to segment the counter so you can see /write
and /query
counts. Also, user-agent or some other identifier would be useful when auth is not enabled.
services/httpd/handler.go
Outdated
return | ||
} | ||
|
||
w.Header().Add("Connection", "close") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the explicit close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that uses CloseNotify()
needs to have this since the HTTP connection can't be reused. I can remove the CloseNotify()
part, but then a lost connection will result in requests continuing to be tracked after the connection is lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Makes sense.
services/httpd/handler.go
Outdated
@@ -257,6 +268,8 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
} | |||
} else if strings.HasPrefix(r.URL.Path, "/debug/vars") { | |||
h.serveExpvar(w, r) | |||
} else if strings.HasPrefix(r.URL.Path, "/debug/track-requests") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could this just be /debug/requests
?
h.httpError(w, err.Error(), http.StatusBadRequest) | ||
return | ||
} else { | ||
d = time.Duration(seconds) * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably have a cap on this. If you did s=99999999999
, you might be able DOS/OOM the server.
services/httpd/handler.go
Outdated
func (h *Handler) serveTrackRequests(w http.ResponseWriter, r *http.Request) { | ||
var d time.Duration | ||
if s := r.URL.Query().Get("seconds"); s == "" { | ||
h.httpError(w, "missing required seconds parameter", http.StatusBadRequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a default of say 10s
if one is not specified?
services/httpd/handler.go
Outdated
if h.requestTracker != nil { | ||
return | ||
} | ||
h.requestTracker = NewRequestTracker() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a data race as there could be other goroutines checking for h.requestTracker
. Maybe use an atomic.Value
to avoid using a lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed. This was only used previously because the request tracker had a timer. Now it doesn't have a timer anymore so I can just move it to the handler creation.
services/httpd/service.go
Outdated
@@ -151,6 +151,10 @@ func (s *Service) Open() error { | |||
time.Sleep(10 * time.Millisecond) | |||
} | |||
|
|||
// Begin to track requests. | |||
s.Logger.Info("Tracking HTTP requests") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks scary. Remove?
Also, since this is behind the /debug
endpoint, probably need to check that pprof is enabled for enabling this.
services/httpd/requests.go
Outdated
|
||
for { | ||
n := rt.rand.Int63() | ||
if _, ok := rt.profiles[n]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use a slice and append the new profile to it? Id would be len(rt.profiles)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that make it difficult to remove? A linked list might actually be better for this kind of thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there 1 profiles
map for each request to /track-requests
? I can't imagine that there would be a lot of concurrent tracking requests in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that each request to profile gets its own map so it can have its own tracking. Then, internally, there's a map for the requests themselves.
So there's one profiles map (or list) and there's on entry in that map (or list) per request. A slice would technically work, but then removing the entry when the time is finished is O(n)
for removal and would shift the index of all of the other profiles making it harder for them to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed the change that I made to use list.List
instead which should be better for this situation.
Other dimensions that might be useful to record as well are |
When it comes to
Or
If you're only looking for one of the two, the second might be more useful since it separates the information to only other data that's related to it. But the first might make more sense organizationally. |
I was thinking the first:
Mainly because I was looking for a specific IP first and then figuring out what it was doing. The latter would require me to look at writes and then find my ip, then queries and find my IP again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you going to change the JSON output?
services/httpd/handler.go
Outdated
@@ -283,6 +292,10 @@ func (h *Handler) serveQuery(w http.ResponseWriter, r *http.Request, user *meta. | |||
atomic.AddInt64(&h.stats.QueryRequestDuration, time.Since(start).Nanoseconds()) | |||
}(time.Now()) | |||
|
|||
if h.requestTracker != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this should never be nil
now right?
services/httpd/requests.go
Outdated
Requests map[RequestInfo]*RequestStats | ||
tracker *RequestTracker | ||
elem *list.Element | ||
mu sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this above the items that it protects. So above Requests
services/httpd/handler.go
Outdated
|
||
w.Header().Add("Connection", "close") | ||
requests := make(map[string]*RequestStats, len(profile.Requests)) | ||
for req, st := range profile.Requests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requests
is accessed w/o a lock here but is accessed w/ one in Add
: https://github.com/influxdata/influxdb/pull/8350/files#diff-badb764b9e9789c5a3bf719f6c815aadR40
Looks like a possible data race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the profile has been stopped, it shouldn't be modified any longer. The lock should only be needed while the profile is happening.
services/httpd/requests.go
Outdated
|
||
type RequestTracker struct { | ||
profiles *list.List | ||
mu sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this above profiles
so it's clear what it protects?
services/httpd/handler.go
Outdated
@@ -585,6 +598,10 @@ func (h *Handler) serveWrite(w http.ResponseWriter, r *http.Request, user *meta. | |||
atomic.AddInt64(&h.stats.WriteRequestDuration, time.Since(start).Nanoseconds()) | |||
}(time.Now()) | |||
|
|||
if h.requestTracker != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
I'm also thinking this should output the JSON in a similar manner to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changelog?
After using `/debug/requests`, the client will wait for 30 seconds (configurable by specifying `seconds=` in the query parameters) and the HTTP handler will track every incoming query and write to the system. After that time period has passed, it will output a JSON blob that looks very similar to `/debug/vars` that shows every IP address and user account (if authentication is used) that connected to the host during that time. In the future, we can add more metrics to track. This is an initial start to aid with debugging machines that connect too often by looking at a sample of time (like `/debug/pprof`).
194364c
to
2780630
Compare
I squashed the PR, modified the format to look more like |
After using
/debug/requests
, the client will wait for 30 seconds(configurable by specifying
seconds=
in the query parameters) and theHTTP handler will track every incoming query and write to the system.
After that time period has passed, it will output a JSON blob that looks
very similar to
/debug/vars
that shows every IP address and useraccount (if authentication is used) that connected to the host during
that time.
In the future, we can add more metrics to track. This is an initial
start to aid with debugging machines that connect too often by looking
at a sample of time (like
/debug/pprof
).