Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: do not fail on empty # and #top fragments #1609

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MichaIng
Copy link
Member

@MichaIng MichaIng commented Jan 8, 2025

The empty "#" and "#top" fragments are always valid without related HTML element. Browser will scroll to the top of the page. Hence lychee must not fail on those.

Fixed #1599

@MichaIng MichaIng requested a review from mre January 8, 2025 07:05
@MichaIng MichaIng added bug Something isn't working false-positive labels Jan 8, 2025
@MichaIng MichaIng force-pushed the fix-empty+top-fragments branch from 21b9188 to bd21857 Compare January 8, 2025 07:10
@MichaIng
Copy link
Member Author

MichaIng commented Jan 8, 2025

Went with the two == first, but lint suggests is_empty(). The two test runs show the currently inconsistent results:

  1. https://github.com/lycheeverse/lychee/actions/runs/12665712696/job/35295837656
    #top in file.html succeeds, # in file.html and # + #top in file1.md fail.
  2. https://github.com/lycheeverse/lychee/actions/runs/12665772582/job/35296001037
    # in file.html succeeds, #top in file.html and # + #top in file1.md fail.

On my local tests, randomly, at least one, up to all of those failed. Would be interesting to know where this inconsistency is coming from, as I would suggest the same fragment to succeed or fail every time the same way 🤔.

@thiru-appitap
Copy link

updated my findings, the resolution and test information in here

please let me know if this works.

@MichaIng MichaIng force-pushed the fix-empty+top-fragments branch 4 times, most recently from 2a46b4d to 4158f2a Compare January 13, 2025 06:56
@MichaIng
Copy link
Member Author

Awesome, the embarrassingly missed empty/top fragment handling in case of existing cache entry was the solution 👍. The redundant map_or is unrelated to this PR.

Entry::Occupied(entry) => {
Ok(entry.get().contains(fragment)
|| entry.get().contains(&fragment_decoded as &str))
Ok(is_emtpty_or_top_fragment || contains_fragment)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably Rust optimises such automatically, but in code logic, it does not make sense to check first whether the fragment is contained, if it is empty or top. So instead of creating contains_fragment, it could be better in terms of performance/logic to align it with below, to skip contains_fragment = and instead do:

Suggested change
Ok(is_emtpty_or_top_fragment || contains_fragment)
Ok(is_emtpty_or_top_fragment
|| file_frags.contains(fragment)
|| file_frags.contains(&fragment_decoded as &str))

Copy link
Member Author

@MichaIng MichaIng Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, that does not work since file_frags is moved by insert. Another approach, insert it if missing, and check the cache entry in any case afterwards. But I am not sure how to implement this correctly, with this Entry:: structure, which expects a result in both cases.

Wait a second, I have a much better idea!

Copy link
Member Author

@MichaIng MichaIng Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it is checked and returned much earlier, if the fragment is empty or "top". No point to further process the URL and Markdown file, create or check the cache entry etc. Those cases can be handled just the same way as if there was no fragment at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaIng
yes, indeed - this is a much better option (I'll start to to look from these perspectives too - thank you!)

one quick observation - by exiting earlier, we're postponing the extraction operation (to later) and thereby the cache updates - I don't see it creating any issues but wanted to note it here.

Copy link
Member Author

@MichaIng MichaIng Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was also thinking about the skipped/delayed cache entry, but it actually makes sense:

  • Either the same URL is not checked with any other fragment later, then the cache entry is now skipped. It would have remained unused anyway.
  • Or the same URL is checked with another fragment later, then the cache entry is created at this later point, which does not cause any downside. And the cache is not filled by an empty top "top" fragment anymore, which theoretically speeds up lookups for a nanosecond? 😄

@MichaIng MichaIng force-pushed the fix-empty+top-fragments branch 6 times, most recently from 76c771f to 3cac5a2 Compare January 13, 2025 07:44
The empty "#" and "#top" fragments are always valid without related HTML element. Browser will scroll to the top of the page. Hence lychee must not fail on those.

Credits go to @thiru-appitap for initial attempt and helping to find missing parts of the implementation.

Solves: lycheeverse#1599

Signed-off-by: MichaIng <micha@dietpi.com>
@MichaIng MichaIng force-pushed the fix-empty+top-fragments branch from 3cac5a2 to 17bf32c Compare January 13, 2025 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working false-positive
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Always succeed empty fragment checks
2 participants