-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: Improved tokenizer docs #18356
Conversation
used to build <<analysis-custom-analyzer,custom analyzers>>. | ||
The tokenizer is also responsible for recording the order or _position_ of | ||
each term (used for phrase and word proximity queries) and the start and end | ||
_character offsets_ of the original word which the term represents (used foor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/foor/for/
Left some points about wording. I'd love to be able to test those results arrays. We could probably build something to make that work, given the number of _analyze results that we have floating around the in the docs. It is nice to have such a short response form in the docs. |
I think the "it'd be nice to test" part can come later. This already adds more tests by adding |
Thanks for the review @nik9000. I addressed most of your comments in a9b3db65625b52664825ba2db18057f45c12a9c6 In e79d5a85afc6fde119c7ab58f04a1076a1456391 I added TESTRESPONSEs for all of the tokenizer examples, wrapping them in comment blocks so that they don't render in the docs, eg:
There's just one test in the edge ngrams which I'm not sure how to fix, as the |
} | ||
----------------------------------- | ||
// CONSOLE | ||
// TEST[skip:The took value will change on every request] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This GET request will return a different took
value every time. I tried just removing it with with TEST[s/"took".*"//]
but that didn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/"took": 123/"took": "$body.took"/ is the way I've done it. Might be the
best way but it works.
On May 15, 2016 8:07 AM, "Clinton Gormley" notifications@github.com wrote:
In docs/reference/analysis/tokenizers/edgengram-tokenizer.asciidoc
#18356 (comment)
:+POST my_index/_refresh
+
+GET my_index/_search
+{
- "query": {
- "match": {
"title": {
"query": "Quick Fo", <2>
"operator": "and"
}
- }
- }
+}
+-----------------------------------
+// CONSOLE
+// TEST[skip:The took value will change on every request]This GET request will return a different took value every time. I tried
just removing it with with TEST[s/"took".*"//] but that didn't work.—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/elastic/elasticsearch/pull/18356/files/e79d5a85afc6fde119c7ab58f04a1076a1456391#r63290592
-------------------------------------------------- | ||
---------------------------- | ||
<1> Note the incorrect highlight. | ||
// TESTRESPONSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the following exception when running the docs check:
TESTRESPONSE not paired with a snippet at .../charfilters/pattern-replace-charfilter.asciidoc:296
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means it didn't match the // TESTRESPONSE with the snippet above it.
It doesn't like if there is a newline between the snippet and the marker.
Or something like that.
On May 15, 2016 10:59 AM, "Clinton Gormley" notifications@github.com
wrote:
In docs/reference/analysis/charfilters/pattern-replace-charfilter.asciidoc
#18356 (comment)
:}
+----------------------------
+<1> Note the incorrect highlight.
+// TESTRESPONSEI get the following exception when running the docs check:
TESTRESPONSE not paired with a snippet at .../charfilters/pattern-replace-charfilter.asciidoc:296
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/elastic/elasticsearch/pull/18356/files/89bbc7f861ea99c3af0f0db8718be64f7d62db79#r63292994
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch the <1> and the comment. Would you like to propose a more useful error message?
Thanks @nik9000 - fixed that. There are two issues which remain:
|
|
this worked, thanks
That won't work because it changes what is displayed/sent to ES during the test. It's skipped for now. |
|
||
The output from the above is: | ||
|
||
[source,js] | ||
---------------------------- | ||
{ | ||
"timed_out": false, | ||
"took": $body.took, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually put a number here and use the replacement below to put the "$body.took" bit in. I usually use however long it took on my laptop so a particularly savvy ready gets a sense of how long it might take.
LGTM. If you merge I'll have a look at the escape thing. |
60d9e13
to
0c6446e
Compare
Added descriptions and runnable examples
One interprets "$1" as a stack variable - same problem exists with the REST tests The other because the "took" value is always different
0c6446e
to
3817516
Compare
Added descriptions and runnable examples to all tokenizers