You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The language spec currently says that files are UTF-8 encoded. Following the FR in bazelbuild/bazel#4551, we should decide whether to allow an optional BOM (EF BB BF) at the beginning of the file, which would be stripped before lexxing.
BOMs are unnecessary and not recommended for UTF-8, but prohibiting them is hostile to some windows text editors. Conversely, allowing them seems harmless.
From what I can tell, standard UTF-8 passes a decoded BOM through unmodified without stripping. But that doesn't stop plenty of decoders from stripping the BOM, e.g. Python's utf-8-sig codec (as distinct from its utf-8 codec).
The text was updated successfully, but these errors were encountered:
Not harmless: it has a complexity cost, and the lexer is already complicated.
From what I can tell, standard UTF-8 passes a decoded BOM through unmodified without stripping.
A BOM is just a special kind of space character that our lexer rejects (outside of a string literal). I would prefer that we teach people to fix their misconfigured editors to stop putting unwanted invisible spaces in text files.
The language spec currently says that files are UTF-8 encoded. Following the FR in bazelbuild/bazel#4551, we should decide whether to allow an optional BOM (
EF BB BF
) at the beginning of the file, which would be stripped before lexxing.BOMs are unnecessary and not recommended for UTF-8, but prohibiting them is hostile to some windows text editors. Conversely, allowing them seems harmless.
From what I can tell, standard UTF-8 passes a decoded BOM through unmodified without stripping. But that doesn't stop plenty of decoders from stripping the BOM, e.g. Python's
utf-8-sig
codec (as distinct from itsutf-8
codec).The text was updated successfully, but these errors were encountered: