-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip UTF-8 BOM sequence when reading BUILD & .bzl files. #4551
Comments
@philwo Did we have a discussion about this a long time ago? |
Well... if you ask me, then BUILD files should be parsed as UTF-8 by default, file names contain human readable text and we should support this and also https://www.python.org/dev/peps/pep-0263/. Unfortunately BUILD files are parsed as latin1 and.. you know what, we could just still support this and ignore the BOM, it can't get much worse than the current situation anyway. Let's do this. |
@dslomov @laurentlb Your opinion on this? |
Seems reasonable to skip those bytes. |
In the short term, skipping the BOM bytes is an obvious win. |
Now that I understand Bazel internals better, I have updated my opinion from 2018.
Changing this behavior is expensive, so the I am back to just skipping the BOM sequence as the win. |
Cool! |
I agree: UTF-8 completely eliminates the need for BOMs; the fact that some Microsoft tools put BOMs into UTF-8 text files is really a bug. (And I agree that for our purposes here, Bazel treats BUILD files as UTF-8, even though internally it's a horror show.) |
Shall we label this "help wanted" and/or good first issue? |
Because U+FEFF is a space (albeit zero-width), I was hoping we could just leave it there and let the Starlark scanner treat it like any other space---except of course spaces are significant to the syntax, so we can't do that. I think the correct solution involves two changes:
|
Update: Frontend team takes the position that BUILD and .bzl files are UTF-8, and has no plans to add a PEP 263-style encoding declaration. It sounds like we're in agreement that the BOM should be permitted, and I've filed bazelbuild/starlark#170 to change the Starlark spec accordingly (not that a Starlark change is necessarily a prerequisite for a Bazel change). Since we intend to eliminate the latin-1 hack I don't think we need to worry about accommodating the BOM in it. |
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team ( |
This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team ( |
Description of the problem / feature request:
Unfortunately bazel does not support UTF-8 BOM sequences in
BUILD
files.Output of
bazel build ...
:Please find an example in my repository:
https://github.com/excitoon/bazel-issues/tree/master/utf-8-bom-support
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
https://github.com/excitoon/bazel-issues/tree/master/utf-8-bom-support
What operating system are you running Bazel on?
Windows 10 x64
What's the output of
bazel info release
?Have you found anything relevant by searching the web?
Nothing.
The text was updated successfully, but these errors were encountered: