-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameterized Terms #176
Comments
This is a re-write of the runtime parser. It supports Fluent Syntax 0.7, runs against the reference fixtures, has half the lines of code, and is as fast in SpiderMonkey as the old one (and slightly faster in V8). Goals 1. Support 100% of Fluent Syntax 0.7. This includes the indentation relaxation, dropping tabs and CR as syntax whitespace, normalizing new lines to LF, and only allowing numbers and identifiers as variant keys. 2. Maintain good performance. The parser is used in performance-critical code paths. Back in the days of Firefox OS it had to be both fast _and_ produce tightly packed results so that translations don't take up too much space on the device. I think the storage requirements can be relaxed these days. 3. Write code which will be easy to maintain in the future. The parser was first written even before Fluent branched off from L20n. It's seen many changes and additions over the last two years. As new features accrued it became hard to maintain it and also to keep track of all known bugs. My goal for the re-write was not only to clean it up but also to define the conformance story for the future and to improve the testing infrastructure. Design The parser focuses on minimizing the number of false negatives at the expense of increasing the risk of false positives. In other words, it aims at parsing _valid_ Fluent messages with a success rate of 100%, but it may also parse some invalid messages which the reference parser would reject. The parser doesn't perform any validation and may produce entries which wouldn't make sense in the real world. For best results users are advised to validate translations with the fluent-syntax parser pre-runtime. The main parser loop iterates over the beginnings of messages and terms. This is to efficiently skip over comments (which have no use on runtime), and to recover from errors. When a fatal error is encountered, the parser instantly bails out of the currently-parsed message and moves on to the next one. Errors are discarded and are not visible to the users of `FluentResource`. The do carry a minimal description of what went wrong which may be useful when reading the code and for debugging, though. The parser makes an extensive use of sticky regexes which can be anchored to any offset of the source string without slicing it. In some places, it's easier to just check the character currently at the cursor, so it does a fair share of that, too. Conformance My original plan was to base the parser on the EBNF and only parse well-formed syntax. In this PR, I went for something a bit wider than that: a superset of well-formed syntax. The main deviation from the EBNF is related to parsing `VariantExpressions` and `CallExpressions`. The EBNF verifies that the they are called on `Terms` and `Functions` respectively. The optimistic parser doesn't differentiate between `Messages`, `Terms` and `Functions`. I decided to implement it this way because this code might soon change anyways (see projectfluent/fluent#176). Another deviation is that the parser treats commas in argument lists as whitespace, similar to how Clojure treats them in sequence lists. I might suggest we upstream this in the spec, too, because it makes the implementation of args lists _much_ simpler. I based this PR on top of the `zeroseven` branch. The `fluent-syntax` parser already supports Syntax 0.7 and passes the [reference fixtures](https://github.com/projectfluent/fluent/tree/master/test/fixtures). This made it possible to also turn on the reference testing in the runtime parser, too. `make fixtures` creates the parsed results for all reference fixtures; for now they must be verified manually before they're committed. `make test` can be used in development to assert that the output of the runtime parser still matches the committed one.
This is a re-write of the runtime parser. It supports Fluent Syntax 0.7, runs against the reference fixtures, has half the lines of code, and is as fast in SpiderMonkey as the old one (and slightly faster in V8). Goals 1. Support 100% of Fluent Syntax 0.7. This includes the indentation relaxation, dropping tabs and CR as syntax whitespace, normalizing new lines to LF, and only allowing numbers and identifiers as variant keys. 2. Maintain good performance. The parser is used in performance-critical code paths. Back in the days of Firefox OS it had to be both fast _and_ produce tightly packed results so that translations don't take up too much space on the device. I think the storage requirements can be relaxed these days. 3. Write code which will be easy to maintain in the future. The parser was first written even before Fluent branched off from L20n. It's seen many changes and additions over the last two years. As new features accrued it became hard to maintain it and also to keep track of all known bugs. My goal for the re-write was not only to clean it up but also to define the conformance story for the future and to improve the testing infrastructure. Design The parser focuses on minimizing the number of false negatives at the expense of increasing the risk of false positives. In other words, it aims at parsing _valid_ Fluent messages with a success rate of 100%, but it may also parse some invalid messages which the reference parser would reject. The parser doesn't perform any validation and may produce entries which wouldn't make sense in the real world. For best results users are advised to validate translations with the fluent-syntax parser pre-runtime. The main parser loop iterates over the beginnings of messages and terms. This is to efficiently skip over comments (which have no use on runtime), and to recover from errors. When a fatal error is encountered, the parser instantly bails out of the currently-parsed message and moves on to the next one. Errors are discarded and are not visible to the users of `FluentResource`. The do carry a minimal description of what went wrong which may be useful when reading the code and for debugging, though. The parser makes an extensive use of sticky regexes which can be anchored to any offset of the source string without slicing it. In some places, it's easier to just check the character currently at the cursor, so it does a fair share of that, too. Conformance My original plan was to base the parser on the EBNF and only parse well-formed syntax. In this PR, I went for something a bit wider than that: a superset of well-formed syntax. The main deviation from the EBNF is related to parsing `VariantExpressions` and `CallExpressions`. The EBNF verifies that the they are called on `Terms` and `Functions` respectively. The optimistic parser doesn't differentiate between `Messages`, `Terms` and `Functions`. I decided to implement it this way because this code might soon change anyways (see projectfluent/fluent#176). Another deviation is that the parser treats commas in argument lists as whitespace, similar to how Clojure treats them in sequence lists. I might suggest we upstream this in the spec, too, because it makes the implementation of args lists _much_ simpler. I based this PR on top of the `zeroseven` branch. The `fluent-syntax` parser already supports Syntax 0.7 and passes the [reference fixtures](https://github.com/projectfluent/fluent/tree/master/test/fixtures). This made it possible to also turn on the reference testing in the runtime parser, too. `make fixtures` creates the parsed results for all reference fixtures; for now they must be verified manually before they're committed. `make test` can be used in development to assert that the output of the runtime parser still matches the committed one.
More open questions from the last time we talked about this in person:
|
I have a proposal which solves questions: 1, 4, 5, 6, and maybe 7 and 8. # @param $case - Letter case to use.
-fxaccount-brand-name = { $case ->
[lowercase] account Firefox
*[uppercase] Account Firefox
}
sync-signedout-account-title =
Connetti il tuo { -fxaccount-brand-name($case: "lowercase") } Let's treat all variables referenced in Terms don't need to define their format parameters in this proposal at all ( When calling a term, both positional and named arguments can be used, but positional are ignored, because, by their anonymous nature, they can't be looked up via variable names. This answers question 5 and 6. I'm still not entirely sure about question 7 and 8. I can see arguments to both using |
As far as questions 2 and 3 are concerned, and assuming term attributes are only used to describe grammatical or stylistic properties of the term itself... do we actually need parameterization there? -brand-name =
{ $case ->
*[nominative] Firefox
[accusative] Firefoxa
}
.gender = masculine
has-updated =
{ -brand-name.gender ->
[masculine] { -brand-name($case: "nominative") } został zaktualizowany.
[feminine] { -brand-name($case: "nominative") } została zaktualizowana.
*[other] Program { -brand-name($case: "nominative") } został zaktualizowany.
} |
Vowel harmony could be a use-case (didn't read the article in full) where you'd like to denote to the calling site that the term's dominating vowels depend on case or plurality? |
Interesting, thanks for the link. I'd also like to point out that it's not possible to achieve this with the current design of |
a.k.a. The Return of Macros.
In #102 we added
VariantLists
. They are a solution to abusingSelectExpressions
by not defining theselector
. Variants of aVariantList
can be retrieved externally from where the term is referenced with aVariantExpression
.The design of
VariantLists
has its own shortcomings which I outlined in #102 (comment) and which required the addition of specialized code inabstract.mjs
to tightly control where exactlyVariantLists
are allowed.At the time, we accepted these shortcomings as trade-offs and also because we didn't find other alternatives. I think I may have stumbled across an idea which is worth bringing it up here.
Proposal
Rather than treating terms with
VariantLists
for value as dictionaries whose members must be accessed from the outside, what if we encapsulated the member access in a form of a function?In other words, rather than defining the following term as a
VariantList
:We could parameterize it with the
$case
local variable and use a regularPattern
as the value:Using the default variant, or referencing a term which doesn't have any parameters would use the current
TermReference
syntax without any paranthesis:Advantages
Patterns
again.CallExpression
syntax. No need for the specializedVariantExpression
.SelectExpressions
inPatterns
do right now. New features ofSelectExpressions
, like Allow selectors to be lists #4, would automatically be available in terms.$number
variable. In the current design, the name of this variable is effectively hardcoded in the term and all messages referencing it must have the variable of this exact name passed into them by the app. This would limit the utility of private terms, which we might introduce in the future.Open questions
{-brand-name("lowercase").gender}
) or to the attribute ({-brand-name.gender("lowercase")}
)?The text was updated successfully, but these errors were encountered: