-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec: bytes data type #161
Conversation
794ca8b
to
4a69c45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Still reviewing)
@@ -419,6 +475,41 @@ b" # "a\\\nb" | |||
It is an error for a backslash to appear within a string literal other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github won't let me comment on the above lines so quoting here:
Regardless of the platform's convention for text line endings---for example, a linefeed (\n) on UNIX, or a carriage return followed by a linefeed (\r\n) on Microsoft Windows---an unescaped line ending in a multiline string literal always denotes a line feed (\n).
Do we specify what constitutes a line ending in a Starlark source file? Specifically, what the algorithm is for converting a raw U+000D or U+000A or a pair of them, into a single U+000A?
Starlark also supports raw string literals, which look like an ordinary single- or double-quotation preceded by r. Within a raw string literal, there is no special processing of backslash escapes, other than an escaped quotation mark (which denotes a literal quotation mark), or an escaped newline (which denotes a backslash followed by a newline). This form of quotation is typically used when writing strings that contain many quotation marks or backslashes (such as regular expressions or shell commands) to reduce the burden of escaping:
r'\''
denotes a literal backslash and quote, not a quote by itself. Also, raw string literals don't help you when there are many literal quotes, since you still have to escape them (if they match the string's opening and closing quote type), but they do help you when there are many literal backslashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we specify what constitutes a line ending in a Starlark source file? Specifically, what the algorithm is for converting a raw U+000D or U+000A or a pair of them, into a single U+000A?
What more needs to be said? The scanner needs to recognize line endings (however they are defined by the platform), in three places:
- escaped, in which case they are ignored;
- unescaped, outside a string literal, where they make a NEWLINE token;
- unescaped, in a multiline string literal, where they make a \n (as the quoted paragraph explains).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scanner is part of the spec, no? I believe Python defines newlines in a platform-independent way, and we should probably do the same. A raw \r\n
on unix should still produce a single \n
, not a \r\n
, inside a multiline string literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A TODO is fine for unblocking this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that the only place that needs to take a stance on the concrete representation of a line ending is case 3, which already spells it out thus:
Regardless of the platform's convention for text line endings---for
example, a linefeed (\n) on UNIX, or a carriage return followed by a
linefeed (\r\n) on Microsoft Windows---an unescaped line ending in a
multiline string literal always denotes a line feed (\n).
Driveby: "type(x) returns a string describing the type of its operand." -> backticks around |
Also:
Needs a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jon; PTAL.
This change adds initial specification of the bytes data type following length discussion in bazelbuild#112. It also explains the implementation-dependent encoding of text strings, and the \u and \U escapes. More will follow, but let's get the easy parts out of the way first. Updates bazelbuild#112 Change-Id: I8cfbb4910c2f85a1076f9b8bdf1081c89dd5948a
@@ -419,6 +475,41 @@ b" # "a\\\nb" | |||
It is an error for a backslash to appear within a string literal other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scanner is part of the spec, no? I believe Python defines newlines in a platform-independent way, and we should probably do the same. A raw \r\n
on unix should still produce a single \n
, not a \r\n
, inside a multiline string literal.
Done.
Done.
Yes. We say that here: |
This change adds initial specification of the bytes data type
following length discussion in #112.
It also explains the implementation-dependent encoding of
text strings, and the \u \U \X escapes.
More will follow, but let's get the easy parts out of the way first.
Updates #112