Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc,tools: Replace client-side syntax highlighting for API docs with compile-time syntax highlighting #34148

Closed

Conversation

tolmasky
Copy link
Contributor

@tolmasky tolmasky commented Jul 1, 2020

This PR removes the need to do syntax highlighting at runtime whenever a user loads the API docs by instead doing it during the generation of the documentation. I pulled these changes out of the larger RunKit docs changes since these have been ready to go for a while and I think we could benefit from them now (you can read more here). Originally this change made the RunKit changes significantly simpler to implement, but it has a number of benefits on its own:

  1. The page will load much faster since there is much less JavaScript taking place.
  2. Users with slow connections or JavaScript turned off will still see highlighted code.

To expand a bit more and bring some of the information from the other PR into here:

  1. This change keeps all the fundamental mechanics of the current highlighting system:
    a. We still use highlightjs to produce the markup.
    b. We use the same theme.
    c. Syntax highlighting is triggered in the same way from markdown.
  2. Instead of committing a vendored copy of the bundled highlightjs code that is loaded on the client and run after load, we instead include highlightjs in the build tools and run it at documentation generation time. For pages like /docs/api/all.html, where the client highlightjs has to highlight 27,325 code examples, I think this makes a dramatic difference in the experience (not to mention the electricity we'll be saving spread across all the computers that load these pages).
  3. One additional nice side-effect is that the tool will now warn when you create a code block without a language specified, leading to a reduction of accidental unhighlighted cases in the future. I know a lot of work recently went in to manually spot these in the existing docs, so it'll be nice to keep things at 100% highlighted going forward.
  4. I have made the allhtml.js code a little less brittle by specifically looking for a marker in the code instead of relying on the regex for establishing the correct boundary.

I'm happy to get this back-ported to earlier versions too if someone can direct me to the right way to do that.

  • documentation is changed or added (technically only generated documentation is changed)
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory. labels Jul 1, 2020
@tolmasky tolmasky force-pushed the syntax-highlight-at-compile-time branch 3 times, most recently from 8beb944 to b607359 Compare July 3, 2020 02:47
@@ -16,7 +16,8 @@
"unified": "^7.0.0",
"unist-util-find": "^1.0.1",
"unist-util-select": "^1.5.0",
"unist-util-visit": "^1.4.0"
"unist-util-visit": "^1.4.0",
"highlightjs": "^9.10.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised npm didn't insert this in alphabetical order. Are you using an old version of npm possibly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is my fault from doing a rebase/merge. I'll put it in the right place and re-push.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, a rebase, that makes sense. It doesn't much matter. I was just wondering how it happened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and fixed it (and rebased again) -- that way if someone else adds a package they won't be confused having it move around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably an oversight, but the highlightjs package is now deprecated, so you'd want the newer highlight.js one instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good catch! I went ahead and changed it to highlight.js. Thanks!

@Trott
Copy link
Member

Trott commented Jul 3, 2020

Seems like a good idea to me, but wondering if @DerekNonGeneric has thoughts they'd like to share.

@tolmasky tolmasky force-pushed the syntax-highlight-at-compile-time branch from b607359 to f3fdfe4 Compare July 3, 2020 19:07
@DerekNonGeneric
Copy link
Contributor

I've not looked at the other PR referenced since it doesn't seem related, but please clarify if I'm mistaken.


What you are doing in this PR is something that I had considered and decided against. It's also not the way syntax highlighting was being performed in the past.

The page will load much faster since there is much less JavaScript taking place.

I don't believe this to be true, but there is a way to confirm. Have you compared the size of the highlight.js bundle to the size difference of the HTML documents before and after you perform SSR of the syntax highlighting? If the differences are larger than the bundle, they will generally take longer to download (there is also HTTP request times to take into account).

Users with slow connections or JavaScript turned off will still see highlighted code.

The point I've made above affects users w/ slow connections, and almost nobody has JavaScript turned off (especially in the target audience of these documents).


Doing what is proposed in this PR will bloat the HTML files, which is why syntax highlighting is a good candidate for client-side JavaScript. Unless I'm missing something, I don't believe these changes are beneficial, however, it's possible that don't understand the motivation behind them.

I'd also like to add that I don't really hold too strong of an opinion on the matter, so no big deal if others have a preference.

@tolmasky
Copy link
Contributor Author

tolmasky commented Jul 3, 2020

It's also not the way syntax highlighting was being performed in the past.

Perhaps there is some confusion here (very much possibly on my end), but the only point I was making here is that this change doesn't change the syntax highlighter -- it only changes when the highlighting is done. Both use highlightjs and thus should generate the same final HTML and require no changes to CSS, etc. In other words, no one should have to change anything in response to this change.

If the differences are larger than the bundle, they will generally take longer to download (there is also HTTP request times to take into account).

From my own tests on a MacBook 2.3 GHz, all.html loads half a second faster with this change, and almost the entirety of this can be traced to the fact that it takes a full 450ms+ for the syntax highlighting to currently sequentially pop in each individual example. This is also particularly noticeable on mobile.

This is at the cost of a difference of 32KB in gzipped size (729KB for inlined, 697KB for original). Which, depending on cache, is further offset by highlight.pack.js which is 12KB zipped and not present in the new version. So a worst case scenario difference of 32KB increase over the wire, and best case scenario difference of 20KB over the wire. This is why the profile difference is dominated by the JavaScript execution vs. byte transmission.

The individual documentation files are even less of a concern. For example, async_hooks.html has the exact same gzipped size of 16KB, and crypto.html is only 1KB larger.

One other benefit to this is that people can now freely include whatever language they want in the markdown examples without having to recompile highlight.pack.js if they include a language that wasn't previously used.

Prior to this commit, API docs would be highlighted after the page
loaded using `highlightjs`. This commit still uses `highlightjs`, but
runs the generation during the compilation of the documentation, making
it so no script needs to load on the client. This results in a faster
load, as well as allowing the pages to fully functional even when
JavaScript is turned off.
@tolmasky tolmasky force-pushed the syntax-highlight-at-compile-time branch from f3fdfe4 to 1e73b8c Compare July 5, 2020 16:56
@tolmasky
Copy link
Contributor Author

tolmasky commented Jul 7, 2020

Just wanting to close the loop on this -- there are no other changes you want me to make right?

@addaleax addaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jul 7, 2020
@addaleax
Copy link
Member

addaleax commented Jul 7, 2020

Fwiw, I also don’t have a strong preference here, but I’m okay with landing this and keeping the possibility of changing it back open if somebody complains or longer page load times do turn out to be problematic.

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@tolmasky
Copy link
Contributor Author

Just following up here again, not sure if what is needed is approval from all the reviewers listed on the right or if anything else is required from me.

@DerekNonGeneric
Copy link
Contributor

@tolmasky, I've removed my thumbs down in case that might help you get this landed. :)

@addaleax
Copy link
Member

Yeah, thanks for the ping. I’ve kicked off a new CI run. 👍

@nodejs-github-bot
Copy link
Collaborator

@tolmasky
Copy link
Contributor Author

Is passing CI a requirement for this commit to go in? I was under the impression that the main branch was failing/flaky generally (just my impression from here: https://github.com/nodejs/node/commits/master). Should I rebase to get this on top of the latest stuff to see if that helps? Or is CI automatically attempting the build as if it was merged on top of the main branch?

@@ -3,6 +3,7 @@
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width">
<meta name="nodejs.org:node-version" content="__VERSION__">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tolmasky, would you mind explaining what this meta tag will be used for (if anything)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem it solves is that there is currently no easy way to determine what version of the docs you are viewing without falling back regex hacks on the title or h1 element, which should probably not be relied on as a future-proof method, especially when localizing the docs. The URL itself also isn't reliable since these docs often live at vXX-latest. This meta tag provides a place you can reliably query the version of the node docs, whether as a script on the page (as in the other commit), or a program operating on the page (for example if you are running end-to-end tests with puppeteer), or even when visually comparing a diff.
#34148 (comment)

I wonder if this meta tag would be useful for @zeke's localization work.

@addaleax
Copy link
Member

Is passing CI a requirement for this commit to go in?

@tolmasky Green or yellow – so yes, this is ready and I’ll go ahead and merge it. (Yellow means known flaky tests failed, which is okay.)

I know @DerekNonGeneric just posted another comment, but it doesn’t seem like something that needs to be figured out before merging. (As mentioned above, merging something doesn’t need to mean that the conversation has come to an end.)

I was under the impression that the main branch was failing/flaky generally (just my impression from here: https://github.com/nodejs/node/commits/master).

That impression is correct, unfortunately 😕

Should I rebase to get this on top of the latest stuff to see if that helps? Or is CI automatically attempting the build as if it was merged on top of the main branch?

Yes, CI automatically rebases, so nothing to do there – rebasing can be helpful if it resolves merge conflicts or it’s a very outdated PR, but otherwise, it can obscure whether new changes have been added that would need to be reviewed and tested separately or not – so no rebasing is definitely okay here.

@addaleax
Copy link
Member

Landed in fa2e460

@addaleax addaleax closed this Jul 15, 2020
addaleax pushed a commit that referenced this pull request Jul 15, 2020
Prior to this commit, API docs would be highlighted after the page
loaded using `highlightjs`. This commit still uses `highlightjs`, but
runs the generation during the compilation of the documentation, making
it so no script needs to load on the client. This results in a faster
load, as well as allowing the pages to fully functional even when
JavaScript is turned off.

PR-URL: #34148
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@tolmasky
Copy link
Contributor Author

tolmasky commented Jul 15, 2020

Hi @DerekNonGeneric, I wanted to make sure I answered your question:

This is a hold-over from the previous commit, but I kept it in because it is generally useful for working in this part of the code base, both for debugging and also for consumers of the page (for example in the companion scripts I wrote to test my changes).

The problem it solves is that there is currently no easy way to determine what version of the docs you are viewing without falling back regex hacks on the title or h1 element, which should probably not be relied on as a future-proof method, especially when localizing the docs. The URL itself also isn't reliable since these docs often live at vXX-latest. This meta tag provides a place you can reliably query the version of the node docs, whether as a script on the page (as in the other commit), or a program operating on the page (for example if you are running end-to-end tests with puppeteer), or even when visually comparing a diff.

@tolmasky
Copy link
Contributor Author

@addaleax

Yes, CI automatically rebases, so nothing to do there – rebasing can be helpful if it resolves merge conflicts or it’s a very outdated PR, but otherwise, it can obscure whether new changes have been added that would need to be reviewed and tested separately or not – so no rebasing is definitely okay here.

Makes sense, thanks!

DerekNonGeneric added a commit to DerekNonGeneric/remark-preset-lint-node that referenced this pull request Jul 17, 2020
cjihrig pushed a commit that referenced this pull request Jul 23, 2020
Prior to this commit, API docs would be highlighted after the page
loaded using `highlightjs`. This commit still uses `highlightjs`, but
runs the generation during the compilation of the documentation, making
it so no script needs to load on the client. This results in a faster
load, as well as allowing the pages to fully functional even when
JavaScript is turned off.

PR-URL: #34148
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Trott pushed a commit to nodejs/remark-preset-lint-node that referenced this pull request Jul 27, 2020
MylesBorins pushed a commit that referenced this pull request Jul 27, 2020
Prior to this commit, API docs would be highlighted after the page
loaded using `highlightjs`. This commit still uses `highlightjs`, but
runs the generation during the compilation of the documentation, making
it so no script needs to load on the client. This results in a faster
load, as well as allowing the pages to fully functional even when
JavaScript is turned off.

PR-URL: #34148
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@ruyadorno ruyadorno mentioned this pull request Jul 28, 2020
addaleax pushed a commit that referenced this pull request Sep 22, 2020
Prior to this commit, API docs would be highlighted after the page
loaded using `highlightjs`. This commit still uses `highlightjs`, but
runs the generation during the compilation of the documentation, making
it so no script needs to load on the client. This results in a faster
load, as well as allowing the pages to fully functional even when
JavaScript is turned off.

PR-URL: #34148
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
addaleax pushed a commit that referenced this pull request Sep 22, 2020
Prior to this commit, API docs would be highlighted after the page
loaded using `highlightjs`. This commit still uses `highlightjs`, but
runs the generation during the compilation of the documentation, making
it so no script needs to load on the client. This results in a faster
load, as well as allowing the pages to fully functional even when
JavaScript is turned off.

PR-URL: #34148
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@codebytere codebytere mentioned this pull request Sep 28, 2020
Marlyfleitas added a commit to Marlyfleitas/Node-remark-preset-lint that referenced this pull request Aug 26, 2022
…s (#131)

These instructions are no longer relevant due to nodejs/node#34148.
patrickm68 added a commit to patrickm68/Node-preset-lint that referenced this pull request Sep 14, 2023
…s (#131)

These instructions are no longer relevant due to nodejs/node#34148.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants