-
Notifications
You must be signed in to change notification settings - Fork 541
Tree difference does not work correctly with unnormalized Unicode names #1057
Comments
Linux doesn't normalize unicode in fs, so this means that the directories are different. git behaves different? |
Yes, Cgit also treats files as different. |
However:
This is the reason why our diff tree algorithm thinks those directories are the same - it relies on Git hashes. |
Tagging @alcortesm to let him know |
Because the directory names are precomposed as @jfontan found: https://sourcegraph.com/github.com/git/git@d62dad7/-/blob/compat/precompose_utf8.c#L113:25 |
Interesting observation: the original tree diff algorithm checks directory names first, and then proceeds to checking the hashes: https://github.com/git/git/blob/master/tree-diff.c#L483 |
Removing the normalization from https://github.com/src-d/go-git/blob/master/utils/merkletrie/noder/path.go#L83 resolves the original issue. I am going to prepare the PR with tests. |
Fixes src-d#1057 Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
Fixes src-d#1057 Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
Tree difference algorithm does not handle unnormalized Unicode names correctly (Tree listing does it correctly, however). Here is how to reproduce:
bug.go:
We see:
The expected output is empty.
Here is what is happening. 55c59d621ea22921ecaabd99266d45a7921aab70 and 6fcbefa4f7a0016a68d3cda52779298a5cd20837 are two consecutive commits.
Output:
There are two almost identical directories. One is in normalized Unicode, the other is not.
The text was updated successfully, but these errors were encountered: