Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-commit keeps doing whitespace changes #3017

Open
peternewman opened this issue Aug 9, 2023 · 19 comments
Open

Pre-commit keeps doing whitespace changes #3017

peternewman opened this issue Aug 9, 2023 · 19 comments

Comments

@peternewman
Copy link
Collaborator

If you edit via the GitHub web UI, you get a Pre-commit afterwards that just does whitespace changes:
174a5bf

@peternewman peternewman added the bug label Aug 9, 2023
@peternewman
Copy link
Collaborator Author

Worse, this means you can't review and make or merge suggested changes because the diff is now too large. 😢

@DimitriPapadopoulos
Copy link
Collaborator

Ah, I thought this was somehow because the dictionary file had grown too large, but yes, GitHub complains the diff is too large (50,000 lines). I was wondering where this comes from.

Which files are being changed by pre-commit? 50,000 lines means it's the dictionary file, but then why does it mess with the dictionary file?

@DimitriPapadopoulos
Copy link
Collaborator

Is 174a5bf just about whitespace changes? How can a couple of whitespace changes result in a diff of 50,000 lines? I'm lost.

@peternewman
Copy link
Collaborator Author

I've not looked properly, but I suspect it's changing all the line endings back and forth.

@DimitriPapadopoulos
Copy link
Collaborator

To/from DOS/POSIX line endings? But then it would be an effect of mixed-line-ending, not trailing-whitespace.

It's still a mystery to me what triggers the massive diffs. Does the GitHub editor write files with DOS endings?

@DimitriPapadopoulos
Copy link
Collaborator

In the short term, the --fix=no option of mixed-line-ending might work around these back & forth line endings changes.

@DimitriPapadopoulos
Copy link
Collaborator

It looks like the editor you are using in GitHub changes the newlines in all of dictionary.txt.

  • I am at a loss why it does that, a Google search comes up with nothing.
  • I am at a loss why we hadn't noticed that before.

@peternewman
Copy link
Collaborator Author

I wonder if something has changed, or it's just that pre-commit is changing them to say LF, whereas GitHub generates CRLF, and most other editors won't explicitly change things, so it never used to have an impact.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Aug 11, 2023

Some of your commits change all newlines to CR LF. For example, after 95a2fbc, all lines end with \r\n:

$ git checkout peternewman-obsloete 
$ 
$ git checkout 95a2fbc985065cff6f47c982e82355bb22d885b4
$ 
$ od -c dictionary.txt
0000000   1   n   d   -   >   1   s   t  \r  \n   2   r   d   -   >   2
0000020   n   d  \r  \n   2   s   t   -   >   2   n   d  \r  \n   3   n
0000040   d   -   >   3   r   d  \r  \n   3   r   t   -   >   3   r   d
0000060  \r  \n   3   s   t   -   >   3   r   d  \r  \n   4   r   d   -
0000100   >   4   t   h  \r  \n   _   _   a   t   t   r   i   b   y   t
[...]

I suspect this is new behaviour. I mean, either we would have CR LF in all files by now, or we would have seen similar diffs of 50,000 lines in the past. The question is why now? Does pre-commit somehow affect the GitHub editor? Was this specific commit really created using the GitHub editor?

@DimitriPapadopoulos
Copy link
Collaborator

Which of the two following options did you choose to edit the dictionary?

Edit

@peternewman
Copy link
Collaborator Author

They were all edit in place. I used find (and maybe replace), which is a fairly new feature in the basic editor, but that was about all of note.

@DimitriPapadopoulos
Copy link
Collaborator

I was able to reproduce this issue in #3027. Indeed, the GitHub in place editor replaces all POSIX LF newlines by CR LF.

I wonder whether this is somehow caused by the addition of pre-commit. I'll try with a repository that doesn't have pre-commit enabled.

@peternewman
Copy link
Collaborator Author

Well I'm glad I'm not imagining it!

I wonder if it could be that maybe the first run of pre-commit fixed all the line endings to be consistent and the presumed auto-mode behaves differently when the file is consistent. Or worse it's clever enough to realise I'm editing on Windows and behave accordingly!

We could try something like this to fix it (if that editor honours it):
https://www.aleksandrhovhannisyan.com/blog/crlf-vs-lf-normalizing-line-endings-in-git/

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Aug 11, 2023

The newlines in dictionaries had always been a simple LF. I checked old commits to make sure.

I have already run dos2unix on dictionaries, the dictionaries haven't changed. This means all newlines are standard POSIX LF newlines.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Aug 11, 2023

This is a bug of the GitHub in place editor. What appears to trigger it is not pre-commit, but the size/contents of the modified file.

The size of file dictionary.txt increased considerably in the last few weeks. Between 8d0d82b and current master branch:

  • 813 →1,129 KB
  • 37,406 →51,304 lines

I suspect the size or number of lines triggered some bug, perhaps a variable overflow:

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Aug 11, 2023

The bug is triggered when adding just 1 byte (and no new lines) to a file of size > 1 MB.
It is not triggered when adding 1 byte (or more) to a file of size <= 1 MB.

Tried with these two files:

I guess this clears things up: this really is a bug of the GitHub in place editor. Does this mean we should split dictionary.txt into chunks < 1 MB? I would recommend we leave it as is for now. I have notified GitHub about the bug.

@peternewman
Copy link
Collaborator Author

I guess this clears things up: this really is a bug of the GitHub in place editor. Does this mean we should split dictionary.txt into chunks < 1 MB? I would recommend we leave it as is for now.

That's some excellent digging! Can't we fix this in the short term by adding a pre-commit or repo .gitattributes to force it back to LF, which will at least mean the overall diff of a PR will be small, even if the changes per commit are significant?

I have notified GitHub about the bug.

Excellent, is that something public you can link to, or a private report option they have?

@DimitriPapadopoulos
Copy link
Collaborator

We already have a pre-commit hook that forces back to LF (hence the unreadable diffs). I have tried a .gitattributes file that would force LF, but it doesn't seem to be taken into account by GitHub.

Unfortunately, the ticket I managed to create appears to be private – makes sense since GitHub is not open source.

@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Aug 22, 2023

This is a know GitHub bug when editing files > 1 MB. it had already been reported to them. They will probably fix it in time.

Also, see #1275 about breaking up dictionary.txt into separate files, which might result in files <= 1 MB.

@peternewman Can we close this ticket now that it is clear this is a GutHub bug for files > 1 MB, unrelated to pre-comit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants