Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement size threshold for HTML pages #2205

Merged
merged 12 commits into from
Aug 17, 2023
Merged

Implement size threshold for HTML pages #2205

merged 12 commits into from
Aug 17, 2023

Conversation

mortenpi
Copy link
Member

@mortenpi mortenpi commented Aug 10, 2023

Adds size_threshold and size_threshold_warn arguments to the Documenter.HTML constructor that can be used to set a maximum size for the generated HTML files. If any page goes above that threshold, the build will fail with an error or a warning like

┌ Warning: Generated HTML over size_threshold_warn limit: release-notes/index.html
│     Generated file size: 181277 (bytes)
│     size_threshold_warn: 102400 (bytes)
│     size_threshold:      204800 (bytes)
└ @ Documenter.HTMLWriter ~/juliadocs/Documenter/src/html/HTMLWriter.jl:1761

The current default values for the thresholds are 200 KiB and 100 KiB, respectively. This is quite conservative, but I reckon most manuals that do not have huge pages or a lot of generated content should be fine. Also, for SEO, the recommendation is to have HTML pages be less than 100 KiB, which is what informed this choice.

Close #2142.

@mortenpi mortenpi added Type: Enhancement Type: Breaking Format: HTML Related to the default HTML output labels Aug 10, 2023
@mortenpi mortenpi added this to the 1.0.0 milestone Aug 10, 2023
@mortenpi mortenpi marked this pull request as ready for review August 16, 2023 01:17
@mortenpi
Copy link
Member Author

I think I'm happy with the API here now, though this could use some tests.


The size threshold, with a reasonable default, exists so that users would not deploy huge pages
accidentally (which among other this will result in bad UX for the readers and negatively impacts
SEO). It is relatively easy to have e.g. an `@example` produce a lot of output.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the alternative? Could documenter automatically save plots to file and load them from there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@mortenpi mortenpi Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I should also check JuMP docs. But yes: #2143. It won't be in 1.0, but hopefully in 1.1 or so. In the meanwhile, the solution is just to set large enough size_threshold (or set it to nothing).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there was an exclude::Vector{String} argument where you could explicitly ignore some large pages?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a good idea. Forcing people to set a global size setting to ignore one page is not ideal. I'll merge this as is, but let's get that into 1.0 in a follow-up PR.

@mortenpi
Copy link
Member Author

mortenpi commented Aug 16, 2023

As a few cases for file sizes, the SciML main docs have the following pages that are >= 50 KiB:

  "dev/showcase/optimization_under_uncertainty/index.html" => 19_434_876
             "dev/showcase/bayesian_neural_ode/index.html" =>  1_452_564
               "dev/showcase/symbolic_analysis/index.html" =>  1_096_047
                        "dev/showcase/gpu_spde/index.html" =>    710_297
                 "dev/showcase/missing_physics/index.html" =>    646_781
                       "dev/showcase/ode_types/index.html" =>    417_528
         "dev/getting_started/first_simulation/index.html" =>    328_700

I suspect all those pages have huge figures.

The Julia manual has generally much bigger pages, with quite a few pages over 100 KiB, and the falloff is actually pretty slow:

              "en/v1.11-dev/manual/unicode-input/index.html" => 717_743
                         "en/v1.11-dev/base/base/index.html" => 449_392
              "en/v1.11-dev/stdlib/LinearAlgebra/index.html" => 434_960
                  "en/v1.11-dev/base/collections/index.html" => 228_140
                       "en/v1.11-dev/base/arrays/index.html" => 220_060
                         "en/v1.11-dev/base/math/index.html" => 212_540
                    "en/v1.11-dev/stdlib/LibGit2/index.html" => 163_118
                      "en/v1.11-dev/base/strings/index.html" => 148_760
                      "en/v1.11-dev/stdlib/Dates/index.html" => 141_631
                   "en/v1.11-dev/base/io-network/index.html" => 133_591
           "en/v1.11-dev/manual/performance-tips/index.html" => 105_290
                      "en/v1.11-dev/base/numbers/index.html" => 103_572
                      "en/v1.11-dev/manual/types/index.html" => 102_737
      "en/v1.11-dev/manual/distributed-computing/index.html" => 101_829

This is a harder case I think. JuMP is a bit similar:

                                   "dev/tutorials/conic/ellipse_approx/index.html" => 804_455
                                                         "dev/api/JuMP/index.html" => 546_327
             "dev/tutorials/nonlinear/space_shuttle_reentry_trajectory/index.html" => 448_344
                             "dev/tutorials/applications/power_systems/index.html" => 299_415
                                    "dev/tutorials/nonlinear/portfolio/index.html" => 296_243
 "dev/tutorials/getting_started/getting_started_with_data_and_plotting/index.html" => 285_223
                                "dev/tutorials/linear/factory_schedule/index.html" => 262_728
                                                    "dev/release_notes/index.html" => 227_653
                               "dev/tutorials/nonlinear/rocket_control/index.html" => 210_290
                                                "dev/moi/release_notes/index.html" => 205_582
                           "dev/moi/submodules/Bridges/list_of_bridges/index.html" => 164_758
                                                        "dev/changelog/index.html" => 157_416
                               "dev/moi/submodules/Utilities/reference/index.html" => 154_978
                                             "dev/moi/reference/models/index.html" => 151_846
                                      "dev/moi/reference/standard_form/index.html" => 151_187
                               "dev/tutorials/linear/facility_location/index.html" => 139_441
                                                    "dev/moi/changelog/index.html" => 134_961
                        "dev/tutorials/linear/multi_objective_knapsack/index.html" => 119_384
                                 "dev/moi/submodules/Bridges/reference/index.html" => 116_642
                        "dev/tutorials/algorithms/tsp_lazy_constraints/index.html" => 106_673

Documenter's own manual pages are all <= 13 KiB though (except for release notes, which is ~ 120 KiB).

@mortenpi mortenpi merged commit 03a99e0 into master Aug 17, 2023
20 of 21 checks passed
@mortenpi mortenpi deleted the mp/html-size-limit branch August 17, 2023 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Format: HTML Related to the default HTML output Type: Breaking Type: Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn or error if generated HTML page is too large
2 participants