-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove backslash escapes from TextElement #123
Comments
Note that |
Regardless of the outcome of the discussion in #115, I'd like to go ahead with this issue and remove all escape sequences from
This proposal leverages If we introduce a new Unicode literal in #115, then it can be used inside of |
I'm not fond of a proposal that keeps No HTML entities, please. My expectation is to make writing text easy, not just possible. |
Spelling out a Unicode escape is already not-easy and in fact, it's also harder to spot while reading. Fluent should encourage most Unicode characters to be written as-is, which also means that Unicode escapes are edge-cases. |
I think we're in the make complex things possible territory. |
A couple of revised thoughts: To me it's straight-forward that there are things working in text that don't work in quoted-text. Simply by the virtue that quoted text is quoted means that quotes need special treatment. The opposite direction, things that work in quoted text but not in normal text, is rubbing me backwards. An additional aspect is that we're forgiving in unreadable content, and more so now in the end-game of Fluent 1.0. Invalidating a message just because the unicode escape isn't in a placeable is weird. Also, in normal text, a unicode literal would be in a string literal, but in a string literal, it wouldn't be. I do like the idea to encourage people to put unicode escapes into string literals if possible, and I think it'd be a good use of a compare-locales warning to encourage to use |
I agree with the second sentence. I feel differently about the first one. It's because string literals already require special handling of escaped quotes that I see an opportunity to streamline text elements.
I think it's generally agreed that string literals are easier to scan for and notice. Compare: toc1 = Terms\u00a0and\u00a0Conditions
toc2 = Terms{"\u00a0"}and{"\u00a0"}Conditions Furthermore, forbidding
Which is OK in my book, because string literals are not meant to hold too much translation content anyways. It should go into text elements. Localizers should prefer I'd like to make a decision about this soon, ideally early next week. Right now I'm 90% in support of removing all escapes sequences from |
Just dawned on me that if we were to make the grammar simpler, toc1 = Terms\u00a0and\u00a0Conditions would actually need to render as Otherwise we need to explicitly forbid the escape sequences in text that are allowed in string literals. I'd be willing to look at a patch if this would actually improve the readability of the spec. We also hear about string literals as one tool to extend Variant Keys, at which point there might not be a way to denote a readable nbsp as an individual string literal. Also, I still think that syntax isn't the right hammer for this nail. With 0.7 we made our syntax accepting ugly Fluent, and I think unicode escapes etc are along those lines. |
Yes, that was my intent from the beginning, sorry for not making this clear! :) I'll prepare a patch on Monday. |
I opened #181 with the proposed change. |
@Pike I'd like to move ahead with this proposal. Did you have time to look at the proposed PR? |
I looked at the PR, and I learned stuff. That's what lead me to say "We don't have a spec for the resolver." yesterday. I take it that the intent here is that the following code parses successfully: msg = This is a <strong>Mozilla</strong>\u00A0<em>Monitor</em> product{"\u00A0"}statement. and that it'd render as (the ' ' aside, can't get my markdown down): This is a Mozilla\u00A0 Monitor product statement. I'd see this behavior as a bug. Big picture architecture view I developed this week: What I took from the patch is that it doesn't demonstrate the intended change. So I tried to find what we're doing, and what I found doesn't make me happier :-/ . The way it's currently implemented is by unspecified runtime parser "bugs", i.e., /* The opening brace in text starts a placeable, \ an escape sequence. */
let text_char =
either(
and(
not(string("{")),
not(string("\\")),
regular_char),
escape_sequence,
string("\u0020"));
let escape_sequence =
either(
LiteralEscape,
UnicodeEscape);
let LiteralEscape =
either(
"\\\\",
"\\{",
"\\\"");
let UnicodeEscape =
sequence(
"\\u",
regex(/[0-9a-fA-F]{4}/)); w/out quoted_text and probably ordering/defer etc. And avoiding to write tests for the POC. And I haven't sprinkled the |
Can you explain what's buggy about this? I see this as perfectly valid and consistent product of the translation in question. All text outside of
The runtime version of I agree that it would help to have an official spec for the resolver.
I like your suggestion to add the |
Yes, I don't think that To use a name that shows up in the grammar, I feel Practically, I'd expect to be able to copy and paste content between |
Thanks for filing #195. I agree that Fluent would benefit from some way of defining the expected behavior of known escape sequences. I don't think it should be done with AST nodes, however. Let's continue this conversation in the issue you filed.
Would you require escaping |
I made a few changes to the PR implementing this proposal (#181) in 5ea35db which are inspired by @Pike's suggestions to lexically define the escape sequences in the grammar from #123 (comment).
|
@Pike asked me to clarify the impact of this proposal, with examples. I'll start with the role of
This proposal together with #186 are about making the
It's the Simplicity principle applied.
I'd like Recognizing backslash escapes in OK, time for some examples. The # █ denotes a non-breaking space.
terms = Terms{"\u00A0"}and{"\u00A0"}Privacy
↓
Terms█and█Privacy terms = Terms\u00A0and\u00A0Privacy
↓
Terms\u00A0and\u00A0Privacy open-placeable = Use '{"{"}' to open a placeable.
↓
Use '{' to open a placeable. # · denotes a leading space.
leading-whitespace = {" "}There are 5 spaces in front of this.
↓
·····There are 5 spaces in front of this. # █ denotes a non-breaking space.
markup = <em>Terms</em>{"\u00A0}"and{"\u00A0"}<em>Privacy</em>
↓
<em>Terms</em>█and█<em>Privacy</em> In the browser, the above would render as:
markup = <em>Terms</em>\u00A0and\u00A0<em>Privacy</em>
↓
<em>Terms</em>\u00A0and\u00A0<em>Privacy</em> In the browser, the above would render as:
markup = <em>Terms</em> and <em>Privacy</em>
↓
<em>Terms</em> and <em>Privacy</em> In the browser, the above would render as:
As you can see from these example, only |
Thanks for that post. For a hypothetical variant name, we'd have? city-name = { $city ->
[ "New\u00a0York" ] Nueva{"\u00a0"}York
*[unknown] { $city }
}
↓
Nueva█York Similar for arguments, though I couldn't make up a use-case there :-/ |
Yes, that's right. For the arguments, I could imagine a far far future with |
We currently allow a few backspace-prefixed escape sequences inside of
TextElements
. I think there's an opportunity to simplify the grammar and make the syntax for special characters more consistent and user-friendly by leveragingStringExpressions
.\uXXXX
could be recognized inStringExpressions
or replaced by a new Unicode literal syntax (New syntax for Unicode literals #115).\{
could be written as{"{"}
, providedStringExpressions
continue to forbid placeables in them. This is consistent with how explicit whitespace can be currently spelled out with{" "}
.\\
would become unnecessary.The text was updated successfully, but these errors were encountered: