Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track HTTP client requests for /write and /query with /debug/requests #8350

Merged
merged 1 commit into from
May 9, 2017

Conversation

jsternberg
Copy link
Contributor

@jsternberg jsternberg commented May 2, 2017

After using /debug/requests, the client will wait for 30 seconds
(configurable by specifying seconds= in the query parameters) and the
HTTP handler will track every incoming query and write to the system.
After that time period has passed, it will output a JSON blob that looks
very similar to /debug/vars that shows every IP address and user
account (if authentication is used) that connected to the host during
that time.

In the future, we can add more metrics to track. This is an initial
start to aid with debugging machines that connect too often by looking
at a sample of time (like /debug/pprof).

@jsternberg jsternberg requested a review from jwilder May 2, 2017 20:49
@jsternberg
Copy link
Contributor Author

This is an initial try at doing this. Likely, this needs a configuration option to turn it off (or turn it on) and likely needs to keep more statistics than it currently keeps.

}

// There is an entry in the request tracker. Increment that.
if n, ok := rt.requests[info]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an RLock to prevent a concurrent map panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was under the impression that maps could be read from concurrently. That may mean that we can't return the in-use map to be written to /debug/vars.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent readers are fine provided there are no writers. There is a writer here: https://github.com/influxdata/influxdb/pull/8350/files#diff-badb764b9e9789c5a3bf719f6c815aadR71

There is also a data race because the map is replaced here: https://github.com/influxdata/influxdb/pull/8350/files#diff-badb764b9e9789c5a3bf719f6c815aadR110

}

requests := rt.requests
rt.requests = make(map[RequestInfo]*int64, len(requests))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wiping the map each minute might be confusing as a user. In one second, you would see a bunch of writers and in the next they would all be gone, then they might start to show up again. Recording it over time might show very spiky usage too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to just take the aggregated map and store it and return that aggregated map from /debug/vars.

Maybe it would be better to just increment the request info always? But then the person would need to read the values over time and use derivative like they do with the other internal monitoring. It seems to me like any attempt to do this without storing some historical data in the database may not work. We would also have to eventually cull any clients that just disappear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to use a ring buffer of these maps where each slot is some fixed sized time window (e.g. 10s). As you roll over to new slots, you would wipe them clean of old entries.

@jsternberg jsternberg force-pushed the js-request-tracker branch 2 times, most recently from fcf09dd to b994f76 Compare May 4, 2017 20:21
Copy link
Contributor

@jwilder jwilder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat idea. I tried this locally, but didn't get much value out of the resulting JSON returned. For example, running concurrent writes and queries, I just see:

{"::1":{"counter":3846}}

I think it would helpful to segment the counter so you can see /write and /query counts. Also, user-agent or some other identifier would be useful when auth is not enabled.

return
}

w.Header().Add("Connection", "close")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the explicit close?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that uses CloseNotify() needs to have this since the HTTP connection can't be reused. I can remove the CloseNotify() part, but then a lost connection will result in requests continuing to be tracked after the connection is lost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Makes sense.

@@ -257,6 +268,8 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
}
} else if strings.HasPrefix(r.URL.Path, "/debug/vars") {
h.serveExpvar(w, r)
} else if strings.HasPrefix(r.URL.Path, "/debug/track-requests") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could this just be /debug/requests?

h.httpError(w, err.Error(), http.StatusBadRequest)
return
} else {
d = time.Duration(seconds) * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably have a cap on this. If you did s=99999999999, you might be able DOS/OOM the server.

func (h *Handler) serveTrackRequests(w http.ResponseWriter, r *http.Request) {
var d time.Duration
if s := r.URL.Query().Get("seconds"); s == "" {
h.httpError(w, "missing required seconds parameter", http.StatusBadRequest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a default of say 10s if one is not specified?

if h.requestTracker != nil {
return
}
h.requestTracker = NewRequestTracker()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a data race as there could be other goroutines checking for h.requestTracker. Maybe use an atomic.Value to avoid using a lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed. This was only used previously because the request tracker had a timer. Now it doesn't have a timer anymore so I can just move it to the handler creation.

@@ -151,6 +151,10 @@ func (s *Service) Open() error {
time.Sleep(10 * time.Millisecond)
}

// Begin to track requests.
s.Logger.Info("Tracking HTTP requests")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks scary. Remove?

Also, since this is behind the /debug endpoint, probably need to check that pprof is enabled for enabling this.


for {
n := rt.rand.Int63()
if _, ok := rt.profiles[n]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use a slice and append the new profile to it? Id would be len(rt.profiles).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that make it difficult to remove? A linked list might actually be better for this kind of thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there 1 profiles map for each request to /track-requests? I can't imagine that there would be a lot of concurrent tracking requests in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that each request to profile gets its own map so it can have its own tracking. Then, internally, there's a map for the requests themselves.

So there's one profiles map (or list) and there's on entry in that map (or list) per request. A slice would technically work, but then removing the entry when the time is finished is O(n) for removal and would shift the index of all of the other profiles making it harder for them to remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the change that I made to use list.List instead which should be better for this situation.

@jwilder
Copy link
Contributor

jwilder commented May 4, 2017

Other dimensions that might be useful to record as well are db and rp.

@jsternberg
Copy link
Contributor Author

When it comes to /write and /query, do we want the format to be:

{
    "127.0.0.1": {
        "writes": 12,
        "queries": 5,
    },
}

Or

{
    "/query": {
        "127.0.0.1": 5
    },
    "/write": {
        "127.0.0.1": 12,
    },
}

If you're only looking for one of the two, the second might be more useful since it separates the information to only other data that's related to it. But the first might make more sense organizationally.

@jwilder
Copy link
Contributor

jwilder commented May 5, 2017

I was thinking the first:

{
    "127.0.0.1": {
        "writes": 12,
        "queries": 5,
    },
}

Mainly because I was looking for a specific IP first and then figuring out what it was doing. The latter would require me to look at writes and then find my ip, then queries and find my IP again.

Copy link
Contributor

@jwilder jwilder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you going to change the JSON output?

@@ -283,6 +292,10 @@ func (h *Handler) serveQuery(w http.ResponseWriter, r *http.Request, user *meta.
atomic.AddInt64(&h.stats.QueryRequestDuration, time.Since(start).Nanoseconds())
}(time.Now())

if h.requestTracker != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this should never be nil now right?

Requests map[RequestInfo]*RequestStats
tracker *RequestTracker
elem *list.Element
mu sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this above the items that it protects. So above Requests


w.Header().Add("Connection", "close")
requests := make(map[string]*RequestStats, len(profile.Requests))
for req, st := range profile.Requests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requests is accessed w/o a lock here but is accessed w/ one in Add: https://github.com/influxdata/influxdb/pull/8350/files#diff-badb764b9e9789c5a3bf719f6c815aadR40

Looks like a possible data race.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the profile has been stopped, it shouldn't be modified any longer. The lock should only be needed while the profile is happening.


type RequestTracker struct {
profiles *list.List
mu sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this above profiles so it's clear what it protects?

@@ -585,6 +598,10 @@ func (h *Handler) serveWrite(w http.ResponseWriter, r *http.Request, user *meta.
atomic.AddInt64(&h.stats.WriteRequestDuration, time.Since(start).Nanoseconds())
}(time.Now())

if h.requestTracker != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@jsternberg
Copy link
Contributor Author

I'm also thinking this should output the JSON in a similar manner to /debug/vars, but I'll do that after we work out the other details.

Copy link
Contributor

@jwilder jwilder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog?

After using `/debug/requests`, the client will wait for 30 seconds
(configurable by specifying `seconds=` in the query parameters) and the
HTTP handler will track every incoming query and write to the system.
After that time period has passed, it will output a JSON blob that looks
very similar to `/debug/vars` that shows every IP address and user
account (if authentication is used) that connected to the host during
that time.

In the future, we can add more metrics to track. This is an initial
start to aid with debugging machines that connect too often by looking
at a sample of time (like `/debug/pprof`).
@jsternberg jsternberg force-pushed the js-request-tracker branch from 194364c to 2780630 Compare May 9, 2017 15:21
@jsternberg jsternberg changed the title Track the HTTP requests that come into the system for debugging Track HTTP client requests for /write and /query with /debug/requests May 9, 2017
@jsternberg
Copy link
Contributor Author

I squashed the PR, modified the format to look more like /debug/vars for easier readability, and updated the changelog.

@jsternberg jsternberg merged commit 38735b2 into master May 9, 2017
@jsternberg jsternberg deleted the js-request-tracker branch May 9, 2017 18:53
@jsternberg jsternberg removed the review label May 9, 2017
@rbetts rbetts added this to the 1.3.0 milestone May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants