-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor filepath equality checks #197
Comments
Problem: The current usage of filepaths is error-prone and can be simplified. Solution: Canonicalize filepaths at the boundaries, so their management will be safer and will simplify the codebase.
I have been trying to tackle this refactor with the proposed approach and I think that the result looks promising. Just some details: With this change, files in the output which used to be relative become absolute (canonical) unless we store also their non canonical representation or manage to get the relative path again before printing. Should we add this overhead in order to keep relative paths in the program output? Or let the output to show canonical paths from now on? Also, currently |
Good concern. I would go with printing the old, non-canonicalized filepaths. Let me read through the code for a while, I'll respond to your last paragraph then. |
You are referring to "(current file)" next to the link in the xrefcheck's output? I guess something went wrong here. Initially it was named "local" and was intended to be a property related to the link's format (i.e. denoted whether the link contained the part with filepath or not). And in fd96731 it was renamed, and you well spotted that "current file" is likely to be interpreted differently now 🤔. It was not meant to be smart in understanding whether we point to the same file or not. Let's probably take that commit as example and rename everything related to |
It makes sense, so that tag just classifies file links into relative (./file.md#example), absolute (/file.md#example) or file-local (#example). As you said, I misunderstood it because of the "current file" text. I will change it before applying the refactor, "file-local" sounds good and less unambiguous to me. |
Problem: The current usage of filepaths is error-prone and can be simplified. Solution: Canonicalize filepaths at the boundaries, so their management will be safer and will simplify the codebase.
I have been applying the refactor in a branch. My main concern regarding the result is currently that I think that we should try to specify how we want to show filepaths in the program output, because the current behaviour may be difficult to exactly replicate after the refactor. These are some details that I have noticed by observing the golden test expected outputs:
I will try to think about a possible strategy and propose it here. |
I think, let's not care about preserving this. Running from within the repository can produce the same output as if run from the repository root, this may be even considered as a benign behaviour. Although to make sure I get it correctly, I would be glad to see the diff myself. Could you create a pull request, maybe in a draft state, so that we could continue the discussion there?
Not vice versa? |
👍
Right, not vice versa. These are examples from the current expected outputs in golden tests. Look at how the "file that does not exist" is shown:
|
That's hilarious. I believe displaying it like this can be considered reasonable from several points of view, but anyway there is no need to stick to it, let's consider on a PR basis whether the change in the output format is worth it or not. |
Problem: The current usage of filepaths is error-prone and can be simplified. Solution: Canonicalize filepaths at the boundaries, so their management will be safer and will simplify the codebase.
Problem: The current usage of filepaths is error-prone and can be simplified. Solution: Canonicalize filepaths at the boundaries, so their management will be safer and will simplify the codebase.
Problem: the current usage of filepaths is error-prone and can be simplified. Solution: canonicalize filepaths at the boundaries, so their management will be safer and will simplify the codebase.
Problem: the current usage of filepaths is error-prone and can be simplified. Solution: canonicalize filepaths at the boundaries, so their management will be safer and will simplify the codebase.
Clarification and motivation
In the past, we've had many bugs related to path equality. A few examples (there are probably more):
./
in paths #106We've solved them mainly by sprinkling the codebase with
normalise
,normaliseWithNoTrailing
,expandIndirections
,dropTrailingPathSeparator
,canonizeLocalRef
etc.But this approach is extremely error-prone.
A more principled approach might be to canonicalize filepaths using
canonicalizePath
.We should canonicalize:
git ls-files
Here's a suggestion (haven't actually tried it):
Then we should be able to simply use the
Eq
andOrd
instances as usual for comparingCanonicalPath
s, checking if aCanonicalPath
exists in aMap
/Set
, etc.Let's review the uses of
normalise
and similar functions, and try to get rid of as many as possible.Acceptance criteria
git ls-files
and from scanning markdown files)normalise/dropTrailingPathSeparator/etc
throughout the codebaseThe text was updated successfully, but these errors were encountered: