-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow any characters in filenames / labels #374
Comments
|
In POSIX, filenames are "bags of bytes"--there is no encoding; however, |
Well, I think we can probably require valid UTF-8 file names and strongly recommend that people use UTF-8 for their file system. For labels / BUILD files, we probably need an escaping scheme, at least for the control characters. If there's a file that isn't valid UTF-8, we give an error message? |
Our company codes mainly in C++, but our frontend uses a lot of JS and nodejs modules which have all sorts of characters in the filenames--for example, -, #, @, (, and ). Right now this is a major blocker for getting all our codebase under one build system since we can't reference files with semi-special characters. I don't think Bazel should decide what characters are acceptable in file names, as that reduces file names to those that fit both (1) supported languages and (2) supported platforms. This seems unnecessarily restrictive, and is becoming a major pain point for us. |
Agreed. Unfortunately, it's a bit tricky to fix, as a lot of code assumes that the mapping from labels to file names (and vice versa) is trivial, and doesn't require escaping. Any suggestions on an escaping scheme? |
URL based? |
You mean an own URI scheme? Sounds good. |
I mean replacing special characters by %XX where XX is the UTF-8 code in hexa. |
Sorry, I won't be able to work on this. @philwo had an interest, maybe he can make some progress here. :-/ |
This blocks our Bazel deployment as well. |
This is blocking us. We have a templating system where we need to build our template files. The filenames themselves contains template variables (e.g. |
I totally agree that this is important, should be done, I want this myself, however I don't have the time to work on it in the coming months, thus I have to unassign it. |
Here is my proposal:
|
Plain ASCII (and even that partial) makes this feels like we are in the early 90s. There are reasonable ways to handle that. If my project is C/C++, and it is cross-platform, and I have problems handling Unicode, then I will not use Unicode in file names. And the fact that bazel "explodes" is not such a problem. Even better would be to to allow for a character-set option in the project file. I did not move one project to bazel because test units check that Unicode file names work. |
Thanks! |
Stardoc assumes Latin-1 for docstrings, though. Encoding a Starlark file in UTF-8 will result in double-encoding, cf. https://github.com/phst/rules_elisp/blob/master/docs/generate.py#L236-L239 |
If Starlark files are now asssumed to be UTF-8, then I guess for Stardoc https://github.com/bazelbuild/bazel/blob/8.0.0/src/main/java/com/google/devtools/build/lib/starlarkdocextract/RuleInfoExtractor.java#L65 and similar occurrences (basically wherever a string proto field in the Stardoc proto is set) need to be fixed |
Yes, all output files produced by Bazel should use UTF-8 and
Thanks for pointing that out, I sent #24935 to fix this. |
See bazelbuild/bazel#374 (comment): > all output files produced by Bazel should use UTF-8 and \n line endings on > all platforms, including Windows.
See bazelbuild/bazel#374 (comment): > all output files produced by Bazel should use UTF-8 and \n line endings on > all platforms, including Windows.
See bazelbuild/bazel#374 (comment): > all output files produced by Bazel should use UTF-8 and \n line endings on > all platforms, including Windows.
Here's another doc that I guess is outdated now: https://bazel.build/concepts/labels
|
OK, then the runfiles libraries also need to be adapted.
|
See bazelbuild/bazel#374 (comment): > all output files produced by Bazel should use UTF-8 and \n line endings on > all platforms, including Windows.
Thanks for sending the fix for Python!
Microsoft now recommends using the |
…2568) See bazelbuild/bazel#374 (comment): > all output files produced by Bazel should use UTF-8 and \n line endings on > all platforms, including Windows. Previously this would use the legacy ANSI codepage on Windows.
bazelbuild/bazel#24935 changes the observable behavior of starlark_doc_extract, and consumers need to adapt. Work towards bazelbuild/bazel#374 Work towards phst/rules_elisp#818
Work towards bazelbuild#374 Closes bazelbuild#24935. PiperOrigin-RevId: 718549143 Change-Id: Ibe6c685a2f8dd75430cae7f770d392de35bdeb68
bazelbuild/bazel#24935 changes the observable behavior of starlark_doc_extract, and consumers need to adapt. Work towards bazelbuild/bazel#374 Work towards phst/rules_elisp#818
bazelbuild/bazel#24935 changes the observable behavior of starlark_doc_extract, and consumers need to adapt. Work towards bazelbuild/bazel#374 Work towards phst/rules_elisp#818
Work towards bazelbuild#374 Closes bazelbuild#24935. PiperOrigin-RevId: 718549143 Change-Id: Ibe6c685a2f8dd75430cae7f770d392de35bdeb68
If enabled (or set to `error`), fail if Starlark files are not UTF-8 encoded. If set to `warning` (the default), emits a warning instead. Bazel already assumes that Starlark files are UTF-8 encoded for e.g. filenames in actions executed remotely. This flag doesn't affect this, it only makes encoding failures more visible. Work towards #374 Closes #24944. PiperOrigin-RevId: 721513249 Change-Id: I1d3363168c6cd5d37abf96e0401e34866b6679d7
If enabled (or set to `error`), fail if Starlark files are not UTF-8 encoded. If set to `warning` (the default), emits a warning instead. Bazel already assumes that Starlark files are UTF-8 encoded for e.g. filenames in actions executed remotely. This flag doesn't affect this, it only makes encoding failures more visible. Work towards bazelbuild#374 Closes bazelbuild#24944. PiperOrigin-RevId: 721513249 Change-Id: I1d3363168c6cd5d37abf96e0401e34866b6679d7 (cherry picked from commit e7934ce)
If enabled (or set to `error`), fail if Starlark files are not UTF-8 encoded. If set to `warning` (the default), emits a warning instead. Bazel already assumes that Starlark files are UTF-8 encoded for e.g. filenames in actions executed remotely. This flag doesn't affect this, it only makes encoding failures more visible. Work towards bazelbuild#374 Closes bazelbuild#24944. PiperOrigin-RevId: 721513249 Change-Id: I1d3363168c6cd5d37abf96e0401e34866b6679d7 (cherry picked from commit e7934ce)
If enabled (or set to `error`), fail if Starlark files are not UTF-8 encoded. If set to `warning` (the default), emits a warning instead. Bazel already assumes that Starlark files are UTF-8 encoded for e.g. filenames in actions executed remotely. This flag doesn't affect this, it only makes encoding failures more visible. Work towards #374 Closes #24944. PiperOrigin-RevId: 721513249 Change-Id: I1d3363168c6cd5d37abf96e0401e34866b6679d7 (cherry picked from commit e7934ce) Fixes #25148
Ultimately any character can be part of a filename. We should probably allow that.
Some mangling to generate the corresponding label should probably be done.
Original report on the mailing-list:
https://groups.google.com/d/msgid/bazel-discuss/CAN0GiO3__5jXo5rZqroSj0mFxpqCzUZZVkY%3DSNsJK1%2BZ1BdJLg%40mail.gmail.com
The text was updated successfully, but these errors were encountered: