-
Notifications
You must be signed in to change notification settings - Fork 13.2k
/
Copy pathrust.texi
3628 lines (2932 loc) · 131 KB
/
rust.texi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename rust.info
@settitle Rust Documentation
@setchapternewpage odd
@c %**end of header
@include version.texi
@ifinfo
This manual is for the ``Rust'' programming language.
@uref{http://www.rust-lang.org}
Version: @gitversion
Copyright 2006-2010 Graydon Hoare
Copyright 2009-2011 Mozilla Foundation
See accompanying LICENSE.txt for terms.
@end ifinfo
@dircategory Programming
@direntry
* rust: (rust). Rust programming language
@end direntry
@titlepage
@title Rust
@subtitle A safe, concurrent, practical language.
@author Graydon Hoare
@author Mozilla Foundation
@page
@vskip 0pt plus 1filll
@uref{http://rust-lang.org}
Version: @gitversion
@sp 2
Copyright @copyright{} 2006-2010 Graydon Hoare
Copyright @copyright{} 2009-2011 Mozilla Foundation
See accompanying LICENSE.txt for terms.
@end titlepage
@everyfooting @| @emph{-- Draft @today --} @|
@ifnottex
@node Top
@top Top
Rust Documentation
@end ifnottex
@menu
* Disclaimer:: Notes on a work in progress.
* Introduction:: Background, intentions, lineage.
* Tutorial:: Gentle introduction to reading Rust code.
* Reference:: Systematic reference of language elements.
* Index:: Index
@end menu
@ifnottex
Complete table of contents
@end ifnottex
@contents
@c ############################################################
@c Disclaimer
@c ############################################################
@node Disclaimer
@chapter Disclaimer
To the reader,
Rust is a work in progress. The language continues to evolve as the design
shifts and is fleshed out in working code. Certain parts work, certain parts
do not, certain parts will be removed or changed.
This manual is a snapshot written in the present tense. Some features
described do not yet exist in working code. Some may be temporary. It
is a @emph{draft}, and we ask that you not take anything you read here
as either definitive or final. The manual is to help you get a sense
of the language and its organization, not to serve as a complete
specification. At least not yet.
If you have suggestions to make, please try to focus them on @emph{reductions}
to the language: possible features that can be combined or omitted. At this
point, every ``additive'' feature we're likely to support is already on the
table. The task ahead involves combining, trimming, and implementing.
@c ############################################################
@c Introduction
@c ############################################################
@node Introduction
@chapter Introduction
@quotation
We have to fight chaos, and the most effective way of doing that is
to prevent its emergence.
@flushright
- Edsger Dijkstra
@end flushright
@end quotation
@sp 2
Rust is a curly-brace, block-structured expression language. It visually
resembles the C language family, but differs significantly in syntactic and
semantic details. Its design is oriented toward concerns of ``programming in
the large'', that is, of creating and maintaining @emph{boundaries} -- both
abstract and operational -- that preserve large-system @emph{integrity},
@emph{availability} and @emph{concurrency}.
It supports a mixture of imperative procedural, concurrent actor,
object-oriented and pure functional styles. Rust also supports generic
programming and metaprogramming, in both static and dynamic styles.
@menu
* Goals:: Intentions, motivations.
* Sales Pitch:: A summary for the impatient.
* Influences:: Relationship to past languages.
@end menu
@node Goals
@section Goals
The language design pursues the following goals:
@sp 1
@itemize
@item Compile-time error detection and prevention.
@item Run-time fault tolerance and containment.
@item System building, analysis and maintenance affordances.
@item Clarity and precision of expression.
@item Implementation simplicity.
@item Run-time efficiency.
@item High concurrency.
@end itemize
@sp 1
Note that most of these goals are @emph{engineering} goals, not showcases for
sophisticated language technology. Most of the technology in Rust is
@emph{old} and has been seen decades earlier in other languages.
All new languages are developed in a technological context. Rust's goals arise
from the context of writing large programs that interact with the internet --
both servers and clients -- and are thus much more concerned with
@emph{safety} and @emph{concurrency} than older generations of program. Our
experience is that these two forces do not conflict; rather they drive system
design decisions toward extensive use of @emph{partitioning} and
@emph{statelessness}. Rust aims to make these a more natural part of writing
programs, within the niche of lower-level, practical, resource-conscious
languages.
@page
@node Sales Pitch
@section Sales Pitch
The following comprises a brief ``sales pitch'' overview of the salient
features of Rust, relative to other languages.
@itemize
@sp 1
@item No @code{null} pointers
The initialization state of every slot is statically computed as part of the
typestate system (see below), and requires that all slots are initialized
before use. There is no @code{null} value; uninitialized slots are
uninitialized and can only be written to, not read.
The common use for @code{null} in other languages -- as a sentinel value -- is
subsumed into the more general facility of disjoint union types. A program
must explicitly model its use of such types.
@sp 1
@item Lightweight tasks with no shared values
Like many @emph{actor} languages, Rust provides an isolation (and concurrency)
model based on lightweight tasks scheduled by the language runtime. These
tasks are very inexpensive and statically unable to manipulate one another's
local memory. Breaking the rule of task isolation is possible only by calling
external (C/C++) code.
Inter-task communication is typed, asynchronous, and simplex, based on passing
messages over channels to ports.
@sp 1
@item Predictable native code, simple runtime
The meaning and cost of every operation within a Rust program is intended to
be easy to model for the reader. The code should not ``surprise'' the
programmer once it has been compiled.
Rust compiles to native code. Rust compilation units are large and the
compilation model is designed around multi-file, whole-library or
whole-program optimization. The compiled units are standard loadable objects
(ELF, PE, Mach-O) containing standard debug information (DWARF) and are
compatible with existing, standard low-level tools (disassemblers, debuggers,
profilers, dynamic loaders). The compiled units include custom metadata that
carries full type and version information.
The Rust runtime library is a small collection of support code for scheduling,
memory management, inter-task communication, reflection and runtime
linkage. This library is written in standard C++ and is quite
straightforward. It presents a simple interface to embeddings. No
research-level virtual machine, JIT or garbage collection technology is
required. It should be relatively easy to adapt a Rust front-end on to many
existing native toolchains.
@sp 1
@item Integrated system-construction facility
The units of compilation of Rust are multi-file amalgamations called
@emph{crates}. A crate is described by a separate, declarative type of source
file that guides the compilation of the crate, its packaging, its versioning,
and its external dependencies. Crates are also the units of distribution and
loading. Significantly: the dependency graph of crates is @emph{acyclic} and
@emph{anonymous}: there is no global namespace for crates, and module-level
recursion cannot cross crate barriers.
Unlike many languages, individual modules do @emph{not} carry all the
mechanisms or restrictions of crates. Modules and crates serve different
roles.
@sp 1
@item Static control over memory allocation, packing and aliasing.
Many values in Rust are allocated @emph{within} their containing stack-frame
or parent structure. Numbers, records, tuples and tags are all allocated this
way. To allocate such values in the heap, they must be explicitly
@emph{boxed}. A @dfn{box} is a pointer to a heap allocation that holds another
value, its @emph{content}. Boxes may be either shared or unique, depending
on which sort of storage management is desired.
Boxing and unboxing in Rust is explicit, though in some cases (such as
name-component dereferencing) Rust will automatically dereference a
box to access its content. Box values can be passed and assigned
independently, like pointers in C; the difference is that in Rust they always
point to live contents, and are not subject to pointer arithmetic.
In addition to boxes, Rust supports a kind of pass-by-pointer slot called a
reference. Forming or releasing a reference does not perform reference-count
operations; references can only be formed on values that will provably outlive
the reference. References are not ``general values'', in the sense that they
cannot be independently manipulated. They are a lot like C++'s references,
except that they are safe: the compiler ensures that they always point to live
values.
In addition, every slot (stack-local allocation or reference) has a static
initialization state that is calculated by the typestate system. This permits
late initialization of slots in functions with complex control-flow, while
still guaranteeing that every use of a slot occurs after it has been
initialized.
@sp 1
@item Immutable data by default
All types in Rust are immutable by default. A field within a type must be
declared as @code{mutable} in order to be modified.
@sp 1
@item Move semantics and unique pointers
Rust differentiates copying values from moving them, and permits moving and
swapping values explicitly rather than copying. Moving can be more efficient and,
crucially, represents an indivisible transfer of ownership of a value from its
source to its destination.
In addition, pointer types in Rust come in several varieties. One important
type of pointer related to move semantics is the @emph{unique} pointer,
denoted @code{~}, which is statically guaranteed to be the only pointer
pointing to its referent at any given time.
Combining move-semantics and unique pointers, Rust permits a very lightweight
form of inter-task communication: values are sent between tasks by moving, and
only types composed of unique pointers can be sent. This statically ensures
there can never be sharing of data between tasks, while keeping the costs of
transferring data between tasks as cheap as moving a pointer.
@sp 1
@item Stack-based iterators
Rust provides a type of function-like multiple-invocation iterator that is
very efficient: the iterator state lives only on the stack and is tightly
coupled to the loop that invoked it.
@sp 1
@item Direct interface to C code
Rust can load and call many C library functions simply by declaring
them. Calling a C function is an ``unsafe'' action, and can only be taken
within a block marked with the @code{unsafe} keyword. Every unsafe block
in a Rust compilation unit must be explicitly authorized in the crate file.
@sp 1
@item Structural algebraic data types
The Rust type system is primarily structural, and contains the standard
assortment of useful ``algebraic'' type constructors from functional
languages, such as function types, tuples, record types, vectors, and
nominally-tagged disjoint unions. Such values may be @emph{pattern-matched} in
an @code{alt} expression.
@sp 1
@item Generic code
Rust supports a simple form of parametric polymorphism: functions, iterators,
types and objects can be parametrized by other types.
@sp 1
@item Argument binding
Rust provides a mechanism of partially binding arguments to functions,
producing new functions that accept the remaining un-bound arguments. This
mechanism combines some of the features of lexical closures with some of the
features of currying, in a smaller and simpler package.
@sp 1
@item Local type inference
To save some quantity of programmer key-pressing, Rust supports local type
inference: signatures of functions, objects and iterators always require type
annotation, but within the body of a function or iterator many slots can be
declared without a type, and Rust will infer the slot's type from its uses.
@sp 1
@item Structural object system
Rust has a lightweight object system based on structural object types: there
is no ``class hierarchy'' nor any concept of inheritance. Method overriding
and object restriction are performed explicitly on object values, which are
little more than order-insensitive records of methods sharing a common private
value.
@sp 1
@item Static metaprogramming (syntactic extension)
Rust supports a system for syntactic extensions that can be loaded into the
compiler, to implement user-defined notations, macros, program-generators and
the like. These notations are @emph{marked} using a special form of
bracketing, such that a reader unfamiliar with the extension can still parse
the surrounding text by skipping over the bracketed ``extension text''.
@sp 1
@item Idempotent failure
If a task fails due to a signal, or if it evaluates the special @code{fail}
expression, it enters the @emph{failing} state. A failing task unwinds its
control stack, frees all of its owned resources (executing destructors) and
enters the @emph{dead} state. Failure is idempotent and non-recoverable.
@sp 1
@item Supervision hierarchy
Rust has a system for propagating task-failures, either directly to a
supervisor task, or indirectly by sending a message into a channel.
@sp 1
@item Resource types with deterministic destruction
Rust includes a type constructor for @emph{resource} types, which have an
associated destructor and cannot be moved in memory. Resources types belong to
the kind of @emph{pinned} types, and any value that directly contains a
resource is implicitly pinned as well.
Resources can only contain types from the pinned or unique kinds of type,
which means that unlike finalizers, there is always a deterministic, top-down
order to run the destructors of a resource and its sub-resources.
@sp 1
@item Typestate system
Every storage slot in a Rust frame participates in not only a conventional
structural static type system, describing the interpretation of memory in the
slot, but also a @emph{typestate} system. The static typestates of a program
describe the set of @emph{pure, dynamic predicates} that provably hold over
some set of slots, at each point in the program's control-flow graph within
each frame. The static calculation of the typestates of a program is a
function-local dataflow problem, and handles user-defined predicates in a
similar fashion to the way the type system permits user-defined types.
A short way of thinking of this is: types statically model values,
typestates statically model @emph{assertions that hold} before and
after statements and expressions.
@end itemize
@page
@node Influences
@section Influences
@sp 2
@quotation
The essential problem that must be solved in making a fault-tolerant
software system is therefore that of fault-isolation. Different programmers
will write different modules, some modules will be correct, others will have
errors. We do not want the errors in one module to adversely affect the
behaviour of a module which does not have any errors.
@flushright
- Joe Armstrong
@end flushright
@end quotation
@sp 2
@quotation
In our approach, all data is private to some process, and processes can
only communicate through communications channels. @emph{Security}, as used
in this paper, is the property which guarantees that processes in a system
cannot affect each other except by explicit communication.
When security is absent, nothing which can be proven about a single module
in isolation can be guaranteed to hold when that module is embedded in a
system [...]
@flushright
- Robert Strom and Shaula Yemini
@end flushright
@end quotation
@sp 2
@quotation
Concurrent and applicative programming complement each other. The
ability to send messages on channels provides I/O without side effects,
while the avoidance of shared data helps keep concurrent processes from
colliding.
@flushright
- Rob Pike
@end flushright
@end quotation
@sp 2
@page
Rust is not a particularly original language. It may however appear unusual by
contemporary standards, as its design elements are drawn from a number of
``historical'' languages that have, with a few exceptions, fallen out of
favour. Five prominent lineages contribute the most:
@itemize
@sp 1
@item
The NIL (1981) and Hermes (1990) family. These languages were developed by
Robert Strom, Shaula Yemini, David Bacon and others in their group at IBM
Watson Research Center (Yorktown Heights, NY, USA).
@sp 1
@item
The Erlang (1987) language, developed by Joe Armstrong, Robert Virding, Claes
Wikstr@"om, Mike Williams and others in their group at the Ericsson Computer
Science Laboratory (@"Alvsj@"o, Stockholm, Sweden) .
@sp 1
@item
The Sather (1990) language, developed by Stephen Omohundro, Chu-Cheow Lim,
Heinz Schmidt and others in their group at The International Computer Science
Institute of the University of California, Berkeley (Berkeley, CA, USA).
@sp 1
@item
The Newsqueak (1988), Alef (1995), and Limbo (1996) family. These languages
were developed by Rob Pike, Phil Winterbottom, Sean Dorward and others in
their group at Bell labs Computing Sciences Reserch Center (Murray Hill, NJ,
USA).
@sp 1
@item
The Napier (1985) and Napier88 (1988) family. These languages were developed
by Malcolm Atkinson, Ron Morrison and others in their group at the University
of St. Andrews (St. Andrews, Fife, UK).
@end itemize
@sp 1
Additional specific influences can be seen from the following languages:
@itemize
@item The structural algebraic types and compilation manager of SML.
@item The deterministic destructor system of C++.
@end itemize
@c ############################################################
@c Tutorial
@c ############################################################
@node Tutorial
@chapter Tutorial
@emph{TODO}.
@c ############################################################
@c Reference
@c ############################################################
@node Reference
@chapter Reference
@menu
* Ref.Lex:: Lexical structure.
* Ref.Path:: References to items.
* Ref.Gram:: Grammar.
* Ref.Comp:: Compilation and component model.
* Ref.Mem:: Semantic model of memory.
* Ref.Task:: Semantic model of tasks.
* Ref.Item:: The components of a module.
* Ref.Type:: The types of values held in memory.
* Ref.Typestate:: Predicates that hold at points in time.
* Ref.Stmt:: Components of an executable block.
* Ref.Expr:: Units of execution and evaluation.
* Ref.Run:: Organization of runtime services.
@end menu
@node Ref.Lex
@section Ref.Lex
@c * Ref.Lex:: Lexical structure.
@cindex Lexical structure
@cindex Token
The lexical structure of a Rust source file or crate file is defined in terms
of Unicode character codes and character properties.
Groups of Unicode character codes and characters are organized into
@emph{tokens}. Tokens are defined as the longest contiguous sequence of
characters within the same token type (identifier, keyword, literal, symbol),
or interrupted by ignored characters.
Most tokens in Rust follow rules similar to the C family.
Most tokens (including whitespace, keywords, operators and structural symbols)
are drawn from the ASCII-compatible range of Unicode. Identifiers are drawn
from Unicode characters specified by the @code{XID_start} and
@code{XID_continue} rules given by UAX #31@footnote{Unicode Standard Annex
#31: Unicode Identifier and Pattern Syntax}. String and character literals may
include the full range of Unicode characters.
@emph{TODO: formalize this section much more}.
@menu
* Ref.Lex.Ignore:: Ignored characters.
* Ref.Lex.Ident:: Identifier tokens.
* Ref.Lex.Key:: Keyword tokens.
* Ref.Lex.Res:: Reserved tokens.
* Ref.Lex.Num:: Numeric tokens.
* Ref.Lex.Text:: String and character tokens.
* Ref.Lex.Syntax:: Syntactic extension tokens.
* Ref.Lex.Sym:: Special symbol tokens.
@end menu
@node Ref.Lex.Ignore
@subsection Ref.Lex.Ignore
@c * Ref.Lex.Ignore:: Ignored tokens.
Characters considered to be @emph{whitespace} or @emph{comment} are ignored,
and are not considered as tokens. They serve only to delimit tokens. Rust is
otherwise a free-form language.
@dfn{Whitespace} is any of the following Unicode characters: U+0020 (space),
U+0009 (tab, @code{'\t'}), U+000A (LF, @code{'\n'}), U+000D (CR, @code{'\r'}).
@dfn{Comments} are @emph{single-line comments} or @emph{multi-line comments}.
A @dfn{single-line comment} is any sequence of Unicode characters beginning
with U+002F U+002F (@code{"//"}) and extending to the next U+000A character,
@emph{excluding} cases in which such a sequence occurs within a string literal
token.
A @dfn{multi-line comments} is any sequence of Unicode characters beginning
with U+002F U+002A (@code{"/*"}) and ending with U+002A U+002F (@code{"*/"}),
@emph{excluding} cases in which such a sequence occurs within a string literal
token. Multi-line comments may be nested.
@node Ref.Lex.Ident
@subsection Ref.Lex.Ident
@c * Ref.Lex.Ident:: Identifier tokens.
@cindex Identifier token
Identifiers follow the rules given by Unicode Standard Annex #31, in the form
closed under NFKC normalization, @emph{excluding} those tokens that are
otherwise defined as keywords or reserved
tokens. @xref{Ref.Lex.Key}. @xref{Ref.Lex.Res}.
That is: an identifier starts with any character having derived property
@code{XID_Start} and continues with zero or more characters having derived
property @code{XID_Continue}; and such an identifier is NFKC-normalized during
lexing, such that all subsequent comparison of identifiers is performed on the
NFKC-normalized forms.
@emph{TODO: define relationship between Unicode and Rust versions}.
@footnote{This identifier syntax is a superset of the identifier syntaxes of C
and Java, and is modeled on Python PEP #3131, which formed the definition of
identifiers in Python 3.0 and later.}
@node Ref.Lex.Key
@subsection Ref.Lex.Key
@c * Ref.Lex.Key:: Keyword tokens.
The keywords are:
@cindex Keywords
@sp 2
@include keywords.texi
@node Ref.Lex.Res
@subsection Ref.Lex.Res
@c * Ref.Lex.Res:: Reserved tokens.
The reserved tokens are:
@cindex Reserved
@sp 2
@multitable @columnfractions .15 .15 .15 .15 .15
@item @code{f16}
@tab @code{f80}
@tab @code{f128}
@item @code{m32}
@tab @code{m64}
@tab @code{m128}
@tab @code{dec}
@end multitable
@sp 2
At present these tokens have no defined meaning in the Rust language.
These tokens may correspond, in some current or future implementation,
to additional built-in types for decimal floating-point, extended
binary and interchange floating-point formats, as defined in the IEEE
754-1985 and IEEE 754-2008 specifications.
@node Ref.Lex.Num
@subsection Ref.Lex.Num
@c * Ref.Lex.Num:: Numeric tokens.
@cindex Number token
@cindex Hex token
@cindex Decimal token
@cindex Binary token
@cindex Floating-point token
@c FIXME: This discussion isn't quite right since 'f' and 'i' can be used as
@c suffixes
A @dfn{number literal} is either an @emph{integer literal} or a
@emph{floating-point literal}.
@sp 1
An @dfn{integer literal} has one of three forms:
@enumerate
@item A @dfn{decimal literal} starts with a @emph{decimal digit} and continues
with any mixture of @emph{decimal digits} and @emph{underscores}.
@item A @dfn{hex literal} starts with the character sequence U+0030
U+0078 (@code{"0x"}) and continues as any mixture @emph{hex digits}
and @emph{underscores}.
@item A @dfn{binary literal} starts with the character sequence U+0030
U+0062 (@code{"0b"}) and continues as any mixture @emph{binary digits}
and @emph{underscores}.
@end enumerate
By default, an integer literal is of type @code{int}. An integer literal may
be followed (immediately, without any spaces) by a @dfn{integer suffix}, which
changes the type of the literal. There are three kinds of integer literal
suffix:
@enumerate
@item The @code{u} suffix gives the literal type @code{uint}.
@item The @code{g} suffix gives the literal type @code{big}.
@item Each of the signed and unsigned machine types @code{u8}, @code{i8},
@code{u16}, @code{i16}, @code{u32}, @code{i32}, @code{u64} and @code{i64}
give the literal the corresponding machine type.
@end enumerate
@sp 1
A @dfn{floating-point literal} has one of two forms:
@enumerate
@item Two @emph{decimal literals} separated by a period
character U+002E ('.'), with an optional @emph{exponent} trailing after the
second @emph{decimal literal}.
@item A single @emph{decimal literal} followed by an @emph{exponent}.
@end enumerate
By default, a floating-point literal is of type @code{float}. A floating-point
literal may be followed (immediately, without any spaces) by a
@dfn{floating-point suffix}, which changes the type of the literal. There are
only two floating-point suffixes: @code{f32} and @code{f64}. Each of these
gives the floating point literal the associated type, rather than
@code{float}.
A set of suffixes are also reserved to accommodate literal support for
types corresponding to reserved tokens. The reserved suffixes are @code{f16},
@code{f80}, @code{f128}, @code{m}, @code{m32}, @code{m64} and @code{m128}.
@sp 1
A @dfn{hex digit} is either a @emph{decimal digit} or else a character in the
ranges U+0061-U+0066 and U+0041-U+0046 (@code{'a'}-@code{'f'},
@code{'A'}-@code{'F'}).
A @dfn{binary digit} is either the character U+0030 or U+0031 (@code{'0'} or
@code{'1'}).
An @dfn{exponent} begins with either of the characters U+0065 or U+0045
(@code{'e'} or @code{'E'}), followed by an optional @emph{sign character},
followed by a trailing @emph{decimal literal}.
A @dfn{sign character} is either U+002B or U+002D (@code{'+'} or @code{'-'}).
Examples of integer literals of various forms:
@example
123; // type int
123u; // type uint
123_u; // type uint
0xff00; // type int
0xffu8; // type u8
0b1111_1111_1001_0000_i32; // type i32
0xffff_ffff_ffff_ffff_ffff_ffffg; // type big
@end example
Examples of floating-point literals of various forms:
@example
123.0; // type float
0.1; // type float
0.1f32; // type f32
12E+99_f64; // type f64
@end example
@node Ref.Lex.Text
@subsection Ref.Lex.Text
@c * Ref.Lex.Key:: String and character tokens.
@cindex String token
@cindex Character token
@cindex Escape sequence
@cindex Unicode
A @dfn{character literal} is a single Unicode character enclosed within two
U+0027 (single-quote) characters, with the exception of U+0027 itself, which
must be @emph{escaped} by a preceding U+005C character ('\').
A @dfn{string literal} is a sequence of any Unicode characters enclosed
within two U+0022 (double-quote) characters, with the exception of U+0022
itself, which must be @emph{escaped} by a preceding U+005C character
('\').
Some additional @emph{escapes} are available in either character or string
literals. An escape starts with a U+005C ('\') and continues with one
of the following forms:
@itemize
@item An @dfn{8-bit codepoint escape} escape starts with U+0078 ('x') and is
followed by exactly two @dfn{hex digits}. It denotes the Unicode codepoint
equal to the provided hex value.
@item A @dfn{16-bit codepoint escape} starts with U+0075 ('u') and is followed
by exactly four @dfn{hex digits}. It denotes the Unicode codepoint equal to
the provided hex value.
@item A @dfn{32-bit codepoint escape} starts with U+0055 ('U') and is followed
by exactly eight @dfn{hex digits}. It denotes the Unicode codepoint equal to
the provided hex value.
@item A @dfn{whitespace escape} is one of the characters U+006E, U+0072, or
U+0074, denoting the unicode values U+000A (LF), U+000D (CR) or U+0009 (HT)
respectively.
@item The @dfn{backslash escape} is the character U+005C ('\') which must be
escaped in order to denote @emph{itself}.
@end itemize
@node Ref.Lex.Syntax
@subsection Ref.Lex.Syntax
@c * Ref.Lex.Syntax:: Syntactic extension tokens.
Syntactic extensions are marked with the @emph{pound} sigil U+0023 (@code{#}),
followed by an identifier, one of @code{fmt}, @code{env},
@code{concat_idents}, @code{ident_to_str}, @code{log_syntax}, @code{macro}, or
the name of a user-defined macro. This is followed by a vector literal. (Its
value will be interpreted syntactically; in particular, it need not be
well-typed.)
@emph{TODO: formalize those terms more}.
@node Ref.Lex.Sym
@subsection Ref.Lex.Sym
@c * Ref.Lex.Sym:: Special symbol tokens.
@cindex Symbol
@cindex Operator
The special symbols are:
@sp 2
@multitable @columnfractions .1 .1 .1 .1 .1 .1
@item @code{@@}
@tab @code{_}
@item @code{#}
@tab @code{:}
@tab @code{.}
@tab @code{;}
@tab @code{,}
@item @code{[}
@tab @code{]}
@tab @code{@{}
@tab @code{@}}
@tab @code{(}
@tab @code{)}
@item @code{=}
@tab @code{<-}
@tab @code{<->}
@tab @code{->}
@item @code{+}
@tab @code{++}
@tab @code{+=}
@tab @code{-}
@tab @code{--}
@tab @code{-=}
@item @code{*}
@tab @code{/}
@tab @code{%}
@tab @code{*=}
@tab @code{/=}
@tab @code{%=}
@item @code{&}
@tab @code{|}
@tab @code{!}
@tab @code{~}
@tab @code{^}
@item @code{&=}
@tab @code{|=}
@tab @code{^=}
@tab @code{!=}
@item @code{>>}
@tab @code{>>>}
@tab @code{<<}
@tab @code{<<=}
@tab @code{>>=}
@tab @code{>>>=}
@item @code{<}
@tab @code{<=}
@tab @code{==}
@tab @code{>=}
@tab @code{>}
@item @code{&&}
@tab @code{||}
@end multitable
@page
@page
@node Ref.Path
@section Ref.Path
@c * Ref.Path:: References to items.
@cindex Names of items or slots
@cindex Path name
@cindex Type parameters
A @dfn{path} is a sequence of one or more path components separated by a
namespace qualifier (@code{::}). If a path consists of only one component, it
may refer to either an item or a slot in a local control
scope. @xref{Ref.Mem.Slot}. @xref{Ref.Item}. If a path has multiple
components, it refers to an item.
Every item has a @emph{canonical path} within its crate, but the path naming
an item is only meaningful within a given crate. There is no global namespace
across crates; an item's canonical path merely identifies it within the
crate. @xref{Ref.Comp.Crate}.
Path components are usually identifiers. @xref{Ref.Lex.Ident}. The last
component of a path may also have trailing explicit type arguments.
Two examples of simple paths consisting of only identifier components:
@example
x;
x::y::z;
@end example
In most contexts, the Rust grammar accepts a general @emph{path}, but
subsequent passes may restrict paths occurring in various contexts to refer to
slots or items, depending on the semantics of the occurrence. In other words:
in some contexts a slot is required (for example, on the left hand side of the
copy operator, @pxref{Ref.Expr.Copy}) and in other contexts an item is
required (for example, as a type parameter, @pxref{Ref.Item}). In no case is
the grammar made ambiguous by accepting a general path and interpreting the
reference in later passes. @xref{Ref.Gram}.
An example of a path with type parameters:
@example
m::map<int,str>;
@end example
@page
@node Ref.Gram
@section Ref.Gram
@c * Ref.Gram:: Grammar.
@emph{TODO: mostly LL(1), it reads like C++, Alef and bits of Napier;
formalize here}.
@page
@node Ref.Comp
@section Ref.Comp
@c * Ref.Comp:: Compilation and component model.
@cindex Compilation model
Rust is a @emph{compiled} language. Its semantics are divided along a
@emph{phase distinction} between compile-time and run-time. Those semantic
rules that have a @emph{static interpretation} govern the success or failure
of compilation. A program that fails to compile due to violation of a
compile-time rule has no defined semantics at run-time; the compiler should
halt with an error report, and produce no executable artifact.
The compilation model centres on artifacts called @emph{crates}. Each
compilation is directed towards a single crate in source form, and if
successful produces a single crate in executable form.
@menu
* Ref.Comp.Crate:: Units of compilation and linking.
* Ref.Comp.Attr:: Attributes of crates, modules and items.
* Ref.Comp.Syntax:: Syntax extensions.
@end menu
@node Ref.Comp.Crate
@subsection Ref.Comp.Crate
@c * Ref.Comp.Crate:: Units of compilation and linking.
@cindex Crate
A @dfn{crate} is a unit of compilation and linking, as well as versioning,
distribution and runtime loading. Crates are defined by @emph{crate source
files}, which are a type of source file written in a special declarative
language: @emph{crate language}.@footnote{A crate is somewhat analogous to an
@emph{assembly} in the ECMA-335 CLI model, a @emph{library} in the SML/NJ
Compilation Manager, a @emph{unit} in the Owens and Flatt module system, or a
@emph{configuration} in Mesa.} A crate source file describes:
@itemize
@item Metadata about the crate, such as author, name, version, and copyright.
@item The source-file and directory modules that make up the crate.
@item Any external crates or native modules that the crate imports to its top level.
@item The organization of the crate's internal namespace.
@item The set of names exported from the crate.
@end itemize
A single crate source file may describe the compilation of a large number of
Rust source files; it is compiled in its entirety, as a single indivisible
unit. The compilation phase attempts to transform a single crate source file,
and its referenced contents, into a single compiled crate. Crate source files
and compiled crates have a 1:1 relationship.
The syntactic form of a crate is a sequence of @emph{directives}, some of
which have nested sub-directives.
A crate defines an implicit top-level module: within this module,
all members of the crate have canonical path names. @xref{Ref.Path}. The
@code{mod} directives within a crate file specify sub-modules to include in
the crate: these are either directory modules, corresponding to directories in
the filesystem of the compilation environment, or file modules, corresponding
to Rust source files. The names given to such modules in @code{mod} directives
become prefixes of the paths of items defined within any included Rust source
files.
If a .rs file exists in the filesystem alongside the .rc crate file, then it
will be used to provide the top-level module of the crate. Similarly,
directory modules may be paired with .rs files of the same name as the
directory to provide the code for those modules. These source files are never
mentioned explicitly in the crate file; they are simply used if they are
present.
The @code{use} directives within the crate specify @emph{other crates} to scan
for, locate, import into the crate's module namespace during compilation, and
link against at runtime. Use directives may also occur independently in rust
source files. These directives may specify loose or tight ``matching
criteria'' for imported crates, depending on the preferences of the crate
developer. In the simplest case, a @code{use} directive may only specify a
symbolic name and leave the task of locating and binding an appropriate crate
to a compile-time heuristic. In a more controlled case, a @code{use} directive
may specify any metadata as matching criteria, such as a URI, an author name
or version number, a checksum or even a cryptographic signature, in order to
select an an appropriate imported crate. @xref{Ref.Comp.Attr}.
The compiled form of a crate is a loadable and executable object file full of
machine code, in a standard loadable operating-system format such as ELF, PE
or Mach-O. The loadable object contains metadata, describing:
@itemize