-
Notifications
You must be signed in to change notification settings - Fork 236
Unicode Support in F★
All versions of F* up to 0.9.5.0 were completely unicode unaware. Only ASCII characters could appear in F* sources, except in comments and strings, which were previously treated as byte arrays, and were allowed to contain any sequence of byte.
In the F* standard library, FStar.String
was for all purposes equivalent to OCaml's Bytes
or F#'s array<byte>
, and the FStar.Char.char
type was equivalent to OCaml's char
or F#'s byte
type.
Starting from September 8, 2017, F* introduces Unicode support in the compiler and in some standard libraries. From this point on, ALL F* source files MUST BE encoded in UTF-8, including all comments and strings.
Non-UTF-8 literals must be either encoded, or escaped using \x
or \u
(which is new, and may appear in strings, comments, and character literals). For instance, to write an euro sign, one can write the UTF-8 literal € (\xE2\x82\xAC
in the source file) or escape it as \u20AC
.
Unicode letters are allowed to appear in identifiers and constructor names, as long as they follow the capitalization rules. For instance, Étoile
is a valid constructor name, and étoile
is a valid variable name. What constitutes lowercase or uppercase letters is defined according to the Letter, lowercase
and Letter, uppercase
planes of Unicode.
Mathematical operators from the Symbols, mathematical
plane can be used as operator names. Some of those have been defined as being used for prefix operators: ¬±∁∂√
. Others can be used as infix operators. In fact, some of these are pre-defined and can be used as a replacement for some literals, as shown in the table below:
Literal | Equivalent |
---|---|
∀ | forall |
∃ | exists |
⊤ | True |
⊥ | False |
⇒ | ==> |
⇔ | <==> |
→ | -> |
∧ | /\ |
∨ | \/ |
λ | fun |
The model of strings is now Unicode-aware, which means that string and bytes are no longer equivalent (e.g. String.length and Bytes.length may differ), and must be explicitly converted using the verified UTF-8 encoding functions.