Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProfileResult and CollectorResult should print machine readable timing information #22561

Merged
merged 4 commits into from
Jan 16, 2017

Conversation

cbuescher
Copy link
Member

When trying to write parsing code for the SearchResponse profile section I ran into problems getting back the original time information, because both ProfileResult and CollectorResult currently print their time parameter in a human readable string format (e.g. "time": "55.20315000ms"). When trying to parse this back to a long value, for example to use in the planned high level java rest client, we can lose precision because of conversion and rounding issues. I was wondering if we should introduce an additional field (e.g. time_in_nanos) to be able to get the raw value back and only print the time field for human consumption if requested with the human=true flag.
Another question would be if we can backport this in some way to 5.x, I'm not sure if the fact that the profile API is experimental means we can introduce additional fields in the response in 5.x. This PR is only a basis for discussion, maybe @polyfractal or @clintongormley have an opinion on this.

@javanna
Copy link
Member

javanna commented Jan 11, 2017

I think profile api should be consistent with all our other apis and support the human flag. In any case, we should allow for printing out machine readable time rather than only human readable time. I also wonder why we went for nanoseconds here knowing that we have milliseconds everywhere else.

@polyfractal
Copy link
Contributor

++ to returning machine readable by default, and supporting human flag. This was just a gross oversight on my part in the original design :)

Nanos probably make sense since the breakdown object is all in nanoseconds. And that was in nanoseconds since some methods can have very small times and since this was a profiling tool, I wanted those to be accurate without rounding issues.

@javanna
Copy link
Member

javanna commented Jan 11, 2017

Nanos probably make sense since the breakdown object is all in nanoseconds.

yes it makes sense, I (mis)understood that we were printing out millis on the REST layer although we had nanos on the java object.

@@ -944,6 +944,14 @@ public XContentBuilder percentageField(String rawFieldName, String readableField
return this;
}

public XContentBuilder nanosAsMillis(String rawFieldName, String readableFieldName, long nanos) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why nanosAsMillis? Does this one print out nanos or millis?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, the "machine readable" value is printed out "as is" (that its a nanos time value is actually only clear by the field and variable names). The human readable string assumes the input is nanos and prints its as millis (adding "ms" suffix, dividing by the appropriate constant). Thats why I choose that name, but now I start to think that this formatting is probably only useful for the two cases in the Profile output and we should makes this more generic?
I was thinking about putting it somewhere else (some static util in one of the Profile response classes), but since this method deals with the "humanReadable" flag its probably best here alongside the similar other methods?
One idea would make the signature (String rawFieldName, String readableFieldName, long rawLong, String formattedLong) and keeping the formatting in the respective classes, wdyt? Then the name could be sth. like "formattedLongField"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't you use the existing timeValueField that takes the TimeUnit too as an argument? Then the conversion to readable should come for free, and the raw field gets printed out as-is.

Copy link
Member Author

@cbuescher cbuescher Jan 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that TImeUnit.toString() is not very accurate, also it rounds to the "larges" unit that needs to be represented, so you get all kinds of variations like "1.1m", "50.9s", "761.5ms", "45585997303" (<- thats nanos then) whereas the current format always prints a fixed-length "ms" with 10 digits precision. I think we need to keep that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like we never used TimeValue#toString for nanoseconds then? I tend to think that we should fix that instead. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. TimeUnit#toString doesn't seem to be used for precise output, but I also don't think we should change that because it is certainly used all over the place. If we are fine with reducing the profile output precision for the human readable format we can use TimeUnit, otherwise we should stick to the current 10-digit-ms format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then we should stick with the current api, we are already printing out nanoseconds in a few places, I found for instance running_time in TaskInfo. We should be consistent with how we do things elsewhere. That means that the human readable time in the profile api will lose some precision, unless we want to go change how we print out nanoseconds at some point later.

@clintongormley clintongormley added the :Search/Search Search-related issues that do not fall into other categories label Jan 12, 2017
@clintongormley
Copy link

I'm not sure if the fact that the profile API is experimental means we can introduce additional fields in the response in 5.x.

We can certainly introduce new fields in 5.x

@cbuescher
Copy link
Member Author

Thanks for the input, so starting with for 6.0 we would have

GET ...?human=true
{
  ...
  "time": "55.20315000ms",
  "time_in_nanos": 55203150
}

GET ...?human=false
{
  ...  
  "time_in_nanos": 55203150
}

So for the default case where the "human" flag isn't set, the field name and value would change. I think thats okay, but should be marked as "breaking" and we need to adapt the docs etc.

For introducing parsing of the profile section in 5.x, it would be great if we could already introduce the "time_in_nanos" field in addition to the "time" field and print out both, regardless of the "human" flag. @clintongormley Is that something we can do in terms of breaking changes? Its an "experimental" API after all.

@clintongormley
Copy link

@cbuescher our comments crossed in the ether. yes, we can.

@cbuescher
Copy link
Member Author

Cool, so I will work in the plan above, introduce the new field in 5.x and make it the default in 6.0 with the optional "human" flag, changing documentation etc... there.

@cbuescher
Copy link
Member Author

After discussion with @javanna I added commits to change the human readable "time" field output to use XContentBuilder#timeValueField() which we use in several other places. This leads to less precision in the output than we had previously (e.g. instead of output of 5505.203150ms we will now print 5.5s) but we now always have the exact value in time_in_nanos along side of it. On 5.x. we would keep the current format and simply add the new "raw" field.

@cbuescher cbuescher removed the discuss label Jan 12, 2017
@cbuescher
Copy link
Member Author

@polyfractal could you check if changing the human readable time field format like stated above might lead to any problems on current client or UI implementations? I don't really know who might rely on that format. We will have the exact nanoseconds available always, but under a different field name, the time field needs to be activated by using ?human=true. There are some unit tests in the last commit that show both versions of the output.

@polyfractal
Copy link
Contributor

LGTM! It'll break the profiler UI since I'm a bad person and rely on that in one place for actual calculation (and not display). But it'll be trivial to support in a bwc manner...I can just apply ?human=true on all requests and check if time_in_nanos is available or not and use the appropriate field.

I can make that fix quickly/easily after this merges.

@cbuescher
Copy link
Member Author

cbuescher commented Jan 12, 2017

@polyfractal great, thanks. time_in_nanos should always be set and reflect the underlying time in nanoseconds. In 5.x. I will just add this as an additional field and keep the current time field with the current format, so there you will also be able to use time_in_nanos after this goes in. So on 5.x. nothing should break unless someone throws erros on yet unknown fields.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple of comments on docs, LGTM otherwise. I guess that 5.x will have its own separate PR.


As with other statistics apis, the Profile API supports human readable outputs for the time values. Those can be turned on by adding
`?human=true` to the query string. In this case, in addition to the `"time_in_nanos"` field, the output contains the additional `"time"`
field with the time in milliseconds, (e.g. `"time": "0.3919430000ms"`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that always in milliseconds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after the last change (using XContentHelper#timeValueField) this can be other units too, and the format will not be old the fixed 10-digits. I will update this accordingly.

@@ -276,7 +282,7 @@ the equivalent name in Elasticsearch. The `"description"` field displays the Lu
is made available to help differentiating between parts of your query (e.g. both `"message:search"` and `"message:test"`
are TermQuery's and would appear identical otherwise.

The `"time"` field shows that this query took ~15ms for the entire BooleanQuery to execute. The recorded time is inclusive
The `"time_in_nanos"` field shows that this query took ~1.8ms for the entire BooleanQuery to execute. The recorded time is inclusive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it changed from 15ms to 1.8ms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thing the 15ms was an error, the example above used to have "1.873811000ms" in its main node output, so I changed this accordingly. Maybe @polyfractal can confirm this as an existing error in the old docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all likelihood just a typo :) I generated those by running the query and c/p the results. But it changed a few times, so I probably just missed it during one of the changes

@cbuescher
Copy link
Member Author

@clintongormley I wasn't sure if this change requires a section in the migrate.asciidocs, so I added one an will mark this as "breaking", since it changes the name of the field that we print by default in the profile response (please correct if thats wrong). I will add a warning to the 5.x documentation where we will print out both "time" and the new "time_in_nanos" in a follow up PR against 5.x.

@cbuescher cbuescher merged commit 59a48ff into elastic:master Jan 16, 2017
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 16, 2017
* master: (131 commits)
  Replace EngineClosedException with AlreadyClosedExcpetion (elastic#22631)
  Remove HttpServer and HttpServerAdapter in favor of a simple dispatch method (elastic#22636)
  move ignore parameter support from yaml test client to low level rest client (elastic#22637)
  Fix thread safety of Stempel's token filter factory (elastic#22610)
  Update profile.asciidoc
  Wrap rest httpclient with doPrivileged blocks (elastic#22603)
  Revert "Add a deprecation notice to shadow replicas (elastic#22025)"
  Revert "Don'y use `INDEX_SHARED_FS_ALLOW_RECOVERY_ON_ANY_NODE_SETTING` directly as it triggers (many) deprecation logging"
  Don'y use `INDEX_SHARED_FS_ALLOW_RECOVERY_ON_ANY_NODE_SETTING` directly as it triggers (many) deprecation logging
  Add a deprecation notice to shadow replicas (elastic#22025)
  [Docs] Fix section title in profile.asciidoc
  ProfileResult and CollectorResult should print machine readable timing information (elastic#22561)
  Add replica ops with version conflict to translog
  Replace custom Functional interface in ElasticsearchException with CheckedFunction
  Make RestChannelConsumer extend CheckedConsumer<RestChannel, Exception>
  replace ShardSearchRequest.FilterParser functional interface with CheckedFunction
  replace custom functional interface with CheckedFunction in percolate module
  [TEST] replace SizeFunction with Function<Integer, Integer>
  Expose CheckedFunction
  Expose logs base path
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Search/Search Search-related issues that do not fall into other categories v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants