refactor HTMLWriter search index construction #966

visr · 2019-02-18T12:16:48Z

The mutable SearchIndexBuffer that held an IOBuffer that was filled
with JSON text is replaced by the immutable SearchRecord.

The HTMLContext now instead holds a Vector{SearchRecord}, which is
serialized to JSON at the end, done so with the help of
JSON.lower(::SearchRecord).

This should make it easier to possibly upload the search index to a
search server in the future. (Working on it, thought to try to get this in first)

Note that this adds JSON as a dependency. But this also allows code deletion in this package. It REQUIREs JSON 0.19, does the UUID in the TOML need to match this version? Now I believe it matches 0.20.

The aim was for this to be a nonfunctional change, but I see the generated search index has almost twice as many entries here. I'm not sure if that is because of inadvertent bugfixes or that I made a mistake, would be great to have some input there. Here are two files two compare:
https://gist.github.com/visr/fffcfb3d62c9a5d99f86104d3ac62ac5. This index is quite large and hard to compare, so here is a simpler one: https://gist.github.com/visr/8247c5c44344e995b38c4ee6d934bfbb/revisions?diff=split#diff-964a59d491880661889b0e908c47a1e0. One big difference is that now each paragraph gets it's own entry. Maybe it was always intended like this? I didn't change it on purpose. Actually I prefer one record per paragraph, since it can make search results better.

fredrikekre · 2019-02-18T12:23:37Z

does the UUID in the TOML need to match this version

UUID is an identifier of a package, not a version.

mortenpi · 2019-02-24T23:24:42Z

Apologies for not reviewing this earlier.

First, the refactoring of the search generation looks great. How do you intend to use this when uploading an index? Do you need a separate serializer for the SearchRecords, hence the need for a better intermediate representation of the index?

I am a bit sceptical about the JSON.jl dependency. In general, we have been very conservative with dependencies, as this this effectively adds JSON as a dependency to Julia. A JSON serializer for this limited case should be simple enough that we could maintain it here. On the other hand, in the 1.x era, a pinned JSON.jl should never really break and the package is pretty lightweight. So I need to think about this a bit more, but would love to hear opinions.

On a separate note, we're actually not using JSON, but JavaScript in the index file. Do you know if JSON.jl escapes the problematic Unicode characters in strings (this should be the only thing where JSON is not a subset of JS)?

.travis.yml

KristofferC · 2019-02-25T09:24:32Z

this effectively adds JSON as a dependency to Julia

FWIW, the benchmarking framework that runs daily on Julia also requires JSON.

…Buffer` that held an `IOBuffer` that was filledwith JSON text is replaced by the immutable `SearchRecord`.The `HTMLContext` now instead holds a `Vector{SearchRecord}`, which isserialized to JSON at the end, done so with the help of`JSON.lower(::SearchRecord)`.This should make it easier to possibly upload the search index to asearch server in the future.

To try to fix error about missing JSON

And git ignore them all. Also adds Documenter to the Project.toml in docs.

visr · 2019-02-25T12:46:41Z

How do you intend to use this when uploading an index? Do you need a separate serializer for the SearchRecords, hence the need for a better intermediate representation of the index?

So far I believe I can use the same index for both uses. The easiest way would be to add a keyword argument search to HTML where you can set "lunr" or "algolia". For "lunr" we keep the current general behavior. For "algolia" we use HTTP.jl to upload the JSON index to Algolia servers, and write a different search.js in the build, which uses Algolia search.

I am a bit sceptical about the JSON.jl dependency.

One of the reasons I grabbed for JSON.jl was because the JSON currently generated was not valid, despite attempts at this with jsescape. If for instance from here you take the list (value of "docs"), and copy it in jsonlint.com, you can see there are still parser errors. So I don't think it's that simple to do it right without JSON as a dependency. Though ideas welcome, I understand you want to keep the dependencies as light as possible.

Though, seeing my comment about using HTTP.jl to upload the index, would it be a problem to depend on that as well? Do you see any alternative? Taking into account that hopefully in the future Base search will use the "algolia" option.

We're actually not using JSON, but JavaScript.

Good point, I just added a commit to just use JSON: aceb999

EDIT: I verified that search still works with a simple fork of Example.jl that uses this branch: https://visr.github.io/Example.jl/dev/.

mortenpi · 2019-02-25T22:38:41Z

Does it work locally with file:// URLs? I am worried about same-origin policy.. but it might not actually be an issue here.

visr · 2019-02-25T22:52:17Z

Locally I need to run a http-server for search to work, just like before. And on GitHub pages it seems to work fine: https://visr.github.io/Example.jl/dev/. Seems to me this would be same-origin, unless I misunderstand.

mortenpi · 2019-02-25T22:54:42Z

One of the reasons I grabbed for JSON.jl was because the JSON currently generated was not valid, despite attempts at this with jsescape. If for instance from here you take the list (value of "docs"), and copy it in jsonlint.com, you can see there are still parser errors.

FWIW, it seems to be valid JS, which is what it was supposed to be. It looks like you're not allowed to escape ' characters in JSON. An MWE would be {"docs": "\'"}.

Locally I need to run a http-server for search to work, just like before. And on GitHub pages it seems to work fine: https://visr.github.io/Example.jl/dev/. Seems to me this would be same-origin, unless I misunderstand.

No, search should work without a local server too. And it's when you open with file:// URLs that you can run into problems with same-origin policy (if I remember correctly, when you try to access files from parent directories, but I am not sure).

mortenpi · 2019-02-25T23:09:29Z

Yeah, unfortunately we need to stick to JS files unfortunately. With pretty URLs off:

jquery.min.js:4 Access to XMLHttpRequest at 'file:///home/mortenpi/Julia/JuliaDocs/Documenter/docs/build/search_index.json' from origin 'null' has been blocked by CORS policy: Cross origin requests are only supported for protocol schemes: http, data, chrome, chrome-extension, https.

However, that does not mean we have to give up on JSON.jl necessarily, as long as we can add in the additional Unicode escapes. And then for Algolia one can, without much extra code, output a proper JSON file.

Speaking of Algolia and HTTP.jl, one possible alternative would be to just use curl for uploading perhaps?

visr · 2019-02-25T23:22:47Z

Hmm interesting, how did you check this? If I just open an old build and try to search anything, I just get this:

Good to know it's valid JS. If I want to use the same for Algolia it indeed needs to be JSON. It seems possible to keep most of this PR intact and drop the JSON dependency if you prefer. I might copy some serialisation code from JSON.jl.

I guess curl may work instead, not sure about Windows. Though I mostly expect it to be built from Linux CI, where you then set secure environment variables with the API keys.

mortenpi · 2019-02-25T23:53:11Z

Hmm interesting, how did you check this?

You need to build with HTML(prettyurls=false) to get a "local" build (it doesn't create all the directories then, but .html files instead). With Documenter's docs you can just run julia docs/make.jl local.

It seems possible to keep most of this PR intact and drop the JSON dependency if you prefer. I might copy some serialisation code from JSON.jl.

My current feeling is that JSON.jl would actually be fine. Let's see if we get any strong objections, and if not, we can keep JSON.

HTTP.jl might be a bit too large and has too many deps of its own I feel. But that is not directly relevant for this PR.

By escaping two Unicode characters

visr · 2019-02-26T14:55:01Z

Since for now there are no strong objections yet to JSON.jl, I went ahead a bit. I removed the commit that created a JSON file in the index, and instead added a commit that makes the extra escapes for JS that you mentioned.

As far as I'm concerned, this is now good to go. Though have you looked at the index differences mentioned in my first post?

mortenpi · 2019-03-03T02:51:11Z

Apologies for the delay, but LGTM! And yeah, let's keep JSON.jl.

The index differences seem to be due to the fact that previously we created a record for each header, with everything up to the next header appended to it, whereas with this we have one record for each top-level MD block. No objections to that from me. The search results are a bit different, but not obviously better or worse as far as I could tell. It seems to lower the priority of the "section" results slightly, which is probably a good thing.

mortenpi reviewed Feb 24, 2019

View reviewed changes

.travis.yml Show resolved Hide resolved

mortenpi added this to the 0.22.0 milestone Feb 24, 2019

mortenpi added Type: Maintenance Type: Enhancement labels Feb 24, 2019

visr added 3 commits February 25, 2019 12:00

Travis dev Documenter for docs/pdf as well

2dcb102

To try to fix error about missing JSON

remove checked in Manifest

301edc3

And git ignore them all. Also adds Documenter to the Project.toml in docs.

visr force-pushed the search branch from d3a4c12 to aceb999 Compare February 25, 2019 12:21

convert JSON into valid JS

73281c2

By escaping two Unicode characters

visr force-pushed the search branch from aceb999 to 73281c2 Compare February 26, 2019 09:03

Merge branch 'master' into search

c8331d0

Add CHANGELOG

3688874

mortenpi merged commit 28de68c into JuliaDocs:master Mar 3, 2019

visr mentioned this pull request Mar 3, 2019

add option to upload search index to Algolia #976

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor HTMLWriter search index construction #966

refactor HTMLWriter search index construction #966

visr commented Feb 18, 2019

fredrikekre commented Feb 18, 2019

mortenpi commented Feb 24, 2019

KristofferC commented Feb 25, 2019

visr commented Feb 25, 2019 •

edited

Loading

mortenpi commented Feb 25, 2019

visr commented Feb 25, 2019

mortenpi commented Feb 25, 2019

mortenpi commented Feb 25, 2019

visr commented Feb 25, 2019

mortenpi commented Feb 25, 2019

visr commented Feb 26, 2019

mortenpi commented Mar 3, 2019

refactor HTMLWriter search index construction #966

refactor HTMLWriter search index construction #966

Conversation

visr commented Feb 18, 2019

fredrikekre commented Feb 18, 2019

mortenpi commented Feb 24, 2019

KristofferC commented Feb 25, 2019

visr commented Feb 25, 2019 • edited Loading

mortenpi commented Feb 25, 2019

visr commented Feb 25, 2019

mortenpi commented Feb 25, 2019

mortenpi commented Feb 25, 2019

visr commented Feb 25, 2019

mortenpi commented Feb 25, 2019

visr commented Feb 26, 2019

mortenpi commented Mar 3, 2019

visr commented Feb 25, 2019 •

edited

Loading