Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Improved tokenizer docs #18356

Merged
merged 8 commits into from
May 19, 2016

Conversation

clintongormley
Copy link

Added descriptions and runnable examples to all tokenizers

@clintongormley clintongormley added >docs General docs changes review :Search Relevance/Analysis How text is split into tokens labels May 14, 2016
used to build <<analysis-custom-analyzer,custom analyzers>>.
The tokenizer is also responsible for recording the order or _position_ of
each term (used for phrase and word proximity queries) and the start and end
_character offsets_ of the original word which the term represents (used foor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/foor/for/

@nik9000
Copy link
Member

nik9000 commented May 15, 2016

Left some points about wording. I'd love to be able to test those results arrays. We could probably build something to make that work, given the number of _analyze results that we have floating around the in the docs. It is nice to have such a short response form in the docs.

@nik9000
Copy link
Member

nik9000 commented May 15, 2016

I think the "it'd be nice to test" part can come later. This already adds more tests by adding // CONSOLE snippets. I'd be happy to make a quick change to get the responses tested after this is merged.

@clintongormley
Copy link
Author

Thanks for the review @nik9000. I addressed most of your comments in a9b3db65625b52664825ba2db18057f45c12a9c6

In e79d5a85afc6fde119c7ab58f04a1076a1456391 I added TESTRESPONSEs for all of the tokenizer examples, wrapping them in comment blocks so that they don't render in the docs, eg:

/////////////////////

[source,js]
----------------------------
{  response body }
----------------------------
// TESTRESPONSE

/////////////////////

There's just one test in the edge ngrams which I'm not sure how to fix, as the took value will be different on every test run

}
-----------------------------------
// CONSOLE
// TEST[skip:The took value will change on every request]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This GET request will return a different took value every time. I tried just removing it with with TEST[s/"took".*"//] but that didn't work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/"took": 123/"took": "$body.took"/ is the way I've done it. Might be the
best way but it works.
On May 15, 2016 8:07 AM, "Clinton Gormley" notifications@github.com wrote:

In docs/reference/analysis/tokenizers/edgengram-tokenizer.asciidoc
#18356 (comment)
:

+POST my_index/_refresh
+
+GET my_index/_search
+{

  • "query": {
  • "match": {
  •  "title": {
    
  •    "query": "Quick Fo", <2>
    
  •    "operator": "and"
    
  •  }
    
  • }
  • }
    +}
    +-----------------------------------
    +// CONSOLE
    +// TEST[skip:The took value will change on every request]

This GET request will return a different took value every time. I tried
just removing it with with TEST[s/"took".*"//] but that didn't work.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/elastic/elasticsearch/pull/18356/files/e79d5a85afc6fde119c7ab58f04a1076a1456391#r63290592

@clintongormley
Copy link
Author

I've added TESTRESPONSES for analyzers in 38b89f9, and docs and TESTRESPONSES for character filters in 89bbc7f

The pattern_replace tests aren't working as //TEST[continued] isn't working as expected.

--------------------------------------------------
----------------------------
<1> Note the incorrect highlight.
// TESTRESPONSE
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the following exception when running the docs check:

TESTRESPONSE not paired with a snippet at .../charfilters/pattern-replace-charfilter.asciidoc:296

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means it didn't match the // TESTRESPONSE with the snippet above it.
It doesn't like if there is a newline between the snippet and the marker.
Or something like that.
On May 15, 2016 10:59 AM, "Clinton Gormley" notifications@github.com
wrote:

In docs/reference/analysis/charfilters/pattern-replace-charfilter.asciidoc
#18356 (comment)
:

}

+----------------------------
+<1> Note the incorrect highlight.
+// TESTRESPONSE

I get the following exception when running the docs check:

TESTRESPONSE not paired with a snippet at .../charfilters/pattern-replace-charfilter.asciidoc:296


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/elastic/elasticsearch/pull/18356/files/89bbc7f861ea99c3af0f0db8718be64f7d62db79#r63292994

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch the <1> and the comment. Would you like to propose a more useful error message?

@clintongormley
Copy link
Author

Thanks @nik9000 - fixed that. There are two issues which remain:

  • The replacement of "$1" is interpreted as a stash variable and so fails. This would fail in the REST tests too
  • The "took" value is always different. I tried `TESTRESPONSE[s/"took".*//] but it seems to remove this from the provided response snippet, rather than from the received response.

@nik9000
Copy link
Member

nik9000 commented May 16, 2016

The replacement of "$1" is interpreted as a stash variable and so fails. This would fail in the REST tests too

\\$1 I think will do it, but I'm not sure. If not then maybe disable that test and merge and I'll hack on the rest infrastructure to make it work?

TESTRESPONSE[s/"took".//]
`// TESTRESPONSE[s/"took".
/"took": "$body.took"/]

s// just changes the generated tests. "$body.took" gets it from the stash.

@clintongormley
Copy link
Author

`// TESTRESPONSE[s/"took"./"took": "$body.took"/]

this worked, thanks

$1 I think will do it,

That won't work because it changes what is displayed/sent to ES during the test. It's skipped for now.


The output from the above is:

[source,js]
----------------------------
{
"timed_out": false,
"took": $body.took,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually put a number here and use the replacement below to put the "$body.took" bit in. I usually use however long it took on my laptop so a particularly savvy ready gets a sense of how long it might take.

@nik9000
Copy link
Member

nik9000 commented May 17, 2016

That won't work because it changes what is displayed/sent to ES during the test. It's skipped for now.

LGTM. If you merge I'll have a look at the escape thing.

Added descriptions and runnable examples
@clintongormley clintongormley merged commit 5da9e5d into elastic:master May 19, 2016
@clintongormley clintongormley deleted the analysis_docs branch May 19, 2016 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :Search Relevance/Analysis How text is split into tokens
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants