-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor HTMLWriter search index construction #966
Conversation
UUID is an identifier of a package, not a version. |
Apologies for not reviewing this earlier. First, the refactoring of the search generation looks great. How do you intend to use this when uploading an index? Do you need a separate serializer for the I am a bit sceptical about the JSON.jl dependency. In general, we have been very conservative with dependencies, as this this effectively adds JSON as a dependency to Julia. A JSON serializer for this limited case should be simple enough that we could maintain it here. On the other hand, in the 1.x era, a pinned JSON.jl should never really break and the package is pretty lightweight. So I need to think about this a bit more, but would love to hear opinions. On a separate note, we're actually not using JSON, but JavaScript in the index file. Do you know if JSON.jl escapes the problematic Unicode characters in strings (this should be the only thing where JSON is not a subset of JS)? |
FWIW, the benchmarking framework that runs daily on Julia also requires JSON. |
…Buffer` that held an `IOBuffer` that was filledwith JSON text is replaced by the immutable `SearchRecord`.The `HTMLContext` now instead holds a `Vector{SearchRecord}`, which isserialized to JSON at the end, done so with the help of`JSON.lower(::SearchRecord)`.This should make it easier to possibly upload the search index to asearch server in the future.
To try to fix error about missing JSON
And git ignore them all. Also adds Documenter to the Project.toml in docs.
So far I believe I can use the same index for both uses. The easiest way would be to add a keyword argument
One of the reasons I grabbed for JSON.jl was because the JSON currently generated was not valid, despite attempts at this with Though, seeing my comment about using HTTP.jl to upload the index, would it be a problem to depend on that as well? Do you see any alternative? Taking into account that hopefully in the future Base search will use the
Good point, I just added a commit to just use JSON: aceb999 EDIT: I verified that search still works with a simple fork of Example.jl that uses this branch: https://visr.github.io/Example.jl/dev/. |
Does it work locally with |
Locally I need to run a http-server for search to work, just like before. And on GitHub pages it seems to work fine: https://visr.github.io/Example.jl/dev/. Seems to me this would be same-origin, unless I misunderstand. |
FWIW, it seems to be valid JS, which is what it was supposed to be. It looks like you're not allowed to escape
No, search should work without a local server too. And it's when you open with |
Yeah, unfortunately we need to stick to JS files unfortunately. With pretty URLs off:
However, that does not mean we have to give up on JSON.jl necessarily, as long as we can add in the additional Unicode escapes. And then for Algolia one can, without much extra code, output a proper JSON file. Speaking of Algolia and HTTP.jl, one possible alternative would be to just use |
Hmm interesting, how did you check this? If I just open an old build and try to search anything, I just get this: Good to know it's valid JS. If I want to use the same for Algolia it indeed needs to be JSON. It seems possible to keep most of this PR intact and drop the JSON dependency if you prefer. I might copy some serialisation code from JSON.jl. I guess |
You need to build with
My current feeling is that JSON.jl would actually be fine. Let's see if we get any strong objections, and if not, we can keep JSON. HTTP.jl might be a bit too large and has too many deps of its own I feel. But that is not directly relevant for this PR. |
By escaping two Unicode characters
Since for now there are no strong objections yet to JSON.jl, I went ahead a bit. I removed the commit that created a JSON file in the index, and instead added a commit that makes the extra escapes for JS that you mentioned. As far as I'm concerned, this is now good to go. Though have you looked at the index differences mentioned in my first post? |
Apologies for the delay, but LGTM! And yeah, let's keep JSON.jl. The index differences seem to be due to the fact that previously we created a record for each header, with everything up to the next header appended to it, whereas with this we have one record for each top-level MD block. No objections to that from me. The search results are a bit different, but not obviously better or worse as far as I could tell. It seems to lower the priority of the "section" results slightly, which is probably a good thing. |
The mutable
SearchIndexBuffer
that held anIOBuffer
that was filledwith JSON text is replaced by the immutable
SearchRecord
.The
HTMLContext
now instead holds aVector{SearchRecord}
, which isserialized to JSON at the end, done so with the help of
JSON.lower(::SearchRecord)
.This should make it easier to possibly upload the search index to a
search server in the future. (Working on it, thought to try to get this in first)
Note that this adds JSON as a dependency. But this also allows code deletion in this package. It
REQUIRE
s JSON 0.19, does the UUID in the TOML need to match this version? Now I believe it matches 0.20.The aim was for this to be a nonfunctional change, but I see the generated search index has almost twice as many entries here. I'm not sure if that is because of inadvertent bugfixes or that I made a mistake, would be great to have some input there. Here are two files two compare:
https://gist.github.com/visr/fffcfb3d62c9a5d99f86104d3ac62ac5. This index is quite large and hard to compare, so here is a simpler one: https://gist.github.com/visr/8247c5c44344e995b38c4ee6d934bfbb/revisions?diff=split#diff-964a59d491880661889b0e908c47a1e0. One big difference is that now each paragraph gets it's own entry. Maybe it was always intended like this? I didn't change it on purpose. Actually I prefer one record per paragraph, since it can make search results better.