-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathcifgen.mi
1598 lines (1392 loc) · 66.8 KB
/
cifgen.mi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
dnl $ Id: $
dnl Copyright{2000,2024}: Albert van der Horst, HCC FIG Holland by GNU Public License
undefine({worddoc})
\input texinfo
@setfilename thisfilename
@afourpaper
@settitle Generating ciforth's
@setchapternewpage odd
@titlepage
@title ciforth Manual
A system to generate
a ciforth together with
its documentation.
@author Albert van der Horst
Dutch Forth Workshop
@page
@c @vskip Opt plus 1fill
Copyright @copyright{{}} 2024 Albert van der Horst
Permission is granted to copy with attribution.
Program is protected by the GNU Public License.
@end titlepage
@node top, , ,
@chapter Overview
forthvar({ci86.gnr}) is a system to build a ciforth in diverse
configuration's.
This is a configurators manual.
For each ciforth there is a corresponding documentation;
there is however just this one documentation for the generic
system.
What is common to all generated Forth's that they
are intended for the Intel 86 family and
comply in detail with DPANS 94 (ISO/IEC 15145), for the CORE wordset
at least.
It is assumed that you are familiar with Forth and with ciforth
in particular.
Linux is used for a development system,
and the main tool is forthprog({m4}) , the macro preprocessor.
@pindex m4
This extracts an assembler source file and a raw documentation file
out of the single generic source, controlled by a configuration
file.
In addition there is a file with blocks, that is common to all Forth's.
This library is a concatenation of blocks that handle options,
a system dependant file of error messages, and blocks with
utilities, the library proper.
For further processing you need an assembler, such as forthprog({nasm})
and one or more documentation tools, such as forthprog({info}). The raw
documentation file can be ordered only by a more sophisticated tool
than the usual forthprog({sort}).
Formerly I used forthprog({ssort}) a tool I wrote in C++.
In order not to make an already complicated
forthprog({ciforth}) project dependant on C++,
I use a ciforth script forthprog({sortworddoc.frt}) to handle the sorting
since 2017.
@pindex sortworddoc.frt
The library contains indications in each index line (the first line of
a 16 line screen) that tell for what configuration
the screen is intended, e.g. 32 or 64 bit, and Linux or Microsoft.
@chapter Non-technical background.
@section Legalese
The Forth's called ciforth are made available by Albert van der Horst a
member of foundation DFW , the "Dutch Forth Workshop" .
The copyright still resides with him.
All publications of the DFW are available under GPL, the GNU public license.
The file COPYING containing the legal expression of these lines must
accompany it.
This forthsamp({ci86.gnr}) system is protected by GPL.
This applies to the generic source, the macro files and the Forth
source in the block file.
@subsection Copyright of the ciforth's build by this tool.
A ciforth extracted from ci86.gnr is probably not a derived work
(a thesis written in TeX is not a derived work from TeX).
So Albert van der Horst separately claims copyright for the different
versions of ciforth generated by her using this tool.
On the other hand any copyright for a version of ciforth you build by
this tool is waived explicitly.
The following is present in all documentation of ciforth's:
forthquotation
Because Forth is ``programming by extending the language'' the GPL
could be construed to mean that systems based on ciforth
always are legally obliged to make the source available.
But we consider this ``fair use in the Forth sense''.
forthendquotation
In addition to the GPL the Albert van der Horst states the
following:
forthquotation
The GPL is interpreted in the sense that a system based on ciforth
and intended to serve a particular purpose, that purpose not being a
``general purpose Forth system'', is fair use of the system, even if it
could accomplish everything ciforth could, under the condition that the
ciforth it is based on is available in accordance to the GPL rules,
and this is made known to the user of the derived system.
Consequently, for these systems the obligation to make the source available
does not apply.
forthendquotation
@section Legal matters
My extensions are GPL-ed or library GPL-ed.
See the copyright documents that go with the distribution.
The original figforth is public domain and is still available.
@section Rationale
This has been split off of a similar generic Forth, that is intended
to be the last of the fig-Forth's. This generic system is no longer
maintained, but the last fig-Forth is, in the sense that if there
ever should be found a bug, it can be fixed. It also features
the Fig glossary, which I made available
in electronic form after scanning and OCR. I shamelessly
copied from it.
Apart from being ISO compliant, the Forth you have here is similar in many
respects to fig-Forth.
The motivation for having this type of Forth available follows from its
characteristics. It is available as an assembler source, and it is an
indirect threaded Forth.
An assembler source has distinct advantages for getting started.
An engineer might balk at the description of how to use a meta
compiler, but feels at ease with an assembler source.
(eForth, FIGForth, CamelForth, JonesForth prove the popularity of
assembler source as a starting point for own developments.)
Although speed is currently in fashion, using subroutine threaded Forth's
with optimizers, indirect threading is the preferred choice for some
applications. I did this work, because I needed it.
I have also the firm belief that an optimizer on an indirect threaded system
has more information to work with and can ultimately outperform any
other system in speed.
@subsection Source and Copyright
As discussed ciforth is released under the GPL.
In practice the GPL
means (note: this is an explanation and has no legal value!)
They may be
further reproduced and distributed subject to the following conditions:
The three file comprising it must be kept together and in particular
the reference section with the World Wide Web sites.
This Forth builds on figforth, for its source see the next section.
The maintainer can be reached at forthmail({ciforth@@spenarnc.xs4all.nl})
@section History
From the introduction to the figforth installation manual:
forthquotation
The figforth implementation project occurred because a key group of Forth
fanciers wished to make this valuable tool available on a personal computing
level. In June of 1978, we gathered a team of nine systems level
programmers, each with a particular target computer. The charter of the
group was to translate a common model of Forth into assembly language
listings for each computer. It was agreed that the group's work would be
distributed in the public domain by FIG.
We intend that our primary recipients of the Implementation Project be
computer users groups, libraries, and commercial vendors.
We expect that each will further customize for particular computers and
redistribute. No restrictions are placed on cost, but we expect faithfulness
to the model. FIG does not intend to distribute machine readable versions,
as that entails customization, revision, and customer support better
reserved for commercial vendors.
Of course, another broad group of recipients of the work is the community of
personal computer users. We hope that our publications will aid in the use
of Forth and increase the user expectation of the performance of high level
computer languages.
forthendquotation
ciforth want to following in those footsteps.
@subsection Evolution off the FIG model
The first version of ciforth complied faithfully to the fig model,
at least as faithfully as is customary.
Now it is ISO compliant, which
means a lot of details are changed about how words work.
In the following we will discuss not some details and the changes
made to the general build up of the Forth.
The rigid subdivision in 7 area's was never adhered to.
In particular the boot up parameters
are not up front as CP/M and MS-DOS require a 100H byte reserved
area there.
There is mention of forthvar({(KEY)}) being ``implementation dependent code''
but these were not often present in
fig implementations.
This was based on the idea that there was some EPROM with console commands.
This has been replaced by calls to an operating system, that do
not
comply with a simple function that could be called.
Here
the code definitions for forthcode({KEY}) itself
become implementation dependent code, but often it can written in
high level.
An important change is that the character by character i/o of fig
( forthcode({KEY}) and forthcode({EMIT}) ) is replaced by the unix idea of buffer by buffer i.o
( forthcode({READ-FILE}) and forthcode({WRITE-FILE}) ).
All documentation has been updated to accurately describe
ciforth, and only ciforth.
The forthcode({RUBOUT}) key is a bona-fide forthcode({USER}) variable
and now has a name.
DR0 and DR1 are removed. There is only one consecutive mass storage area, be
it a disk or a file.
The assumption in using forthcodeni({OFFSET}) was that you could switch
the blocks to a different area on disk in order to accommodate users
with different needs.
Multi usage of the same Forth is nowadays an unlikely scenario.
Instead I put forthcodeni({OFFSET})
to good use to screen off a part of the floppy that must not be used (such
as an MS-DOS directory or the hard disk part that contains the forth system.)
forthvar({MOVE MON BLOCK-READ BLOCK-WRITE DLIST}) are not present.
Altering OUT to influence formatting doesn't work here, nor on
any figforth I know off.
forthcode({+ORIGIN}) now points to the boot up version of the first user
variable instead to some not well defined start of the boot image.
The layout has not changed, so the negative offsets can be used for
the other data traditionally there.
Their indices have not changed, but there is now a boot-up
parameter for each and every user variable. A system with other boot
parameters can now be generated in an even more portable fashion by using
phrase like forthsamp({7 CELLS +ORIGIN}).
Where possible installation dependent code is using a generic call to the
operating system in particular MS-DOS BIOS or XOS. See below.
The false urban legend that one could forthcodeni({FORGET}) forthcodeni({TASK})
has been replaced by an accurate description of forthcodeni({TASK}).
The following words have been documented for the first time:
forthcodeni({FLUSH}) forthcodeni({CURRENT}) forthcodeni({2DUP})
forthcodeni({RP@@}) forthcodeni({U.}).
Some non-substantial deviation of the original FIG source have been made
for good reasons.
The FIG philosophy is that sectors, blocks and screens must be compatible, but
may be all different. The original 8086 FIG had one sector for a block. I
changed that in having one block for a screen. This is a boon for those
wanting to ISO-fy the sources.
The way I coded the character I/O points ahead to vectoring
forthcodeni({TYPE}) and forthcodeni({EXPECT}) rather than
forthcodeni({EMIT}) and forthcodeni({KEY}) . This way I can have the
host system handle the rub out key. See the above remark about unix
philosophy.
I added generic words for accessing system resources
forthcodeni({BIOS}) , forthcodeni({BDOS}) and forthcodeni({XOS}) .
(See subsection The joy of genericity.)
Some real errors were fixed:
forthenumerate
forthitem
The redefine forthsample({NULL}) bug is fixed.
It is no longer possible to redefine this word,
that handles the refill of the forthcode({TIB}),
by typing a <ret> immediately after a defining word.
forthitem
Forgetting part of a namespace, other than the forthcode({FORTH}) namespace
no longer crashes.
forthitem
Loading a screen with characters having an 8th bit set,
no longer crashes.
forthendenumerate
@subsection Evolution of ciforth
The first version of
ciforth was in fact the
figforth for the 8086
that was put in the framework of this manual.
By adding a 32 bits macro file, programming
I/O for Linux, programming I/O with non-obsolete MS-DOS calls
and a way to switch to protected mode,
this figforth came available in all ciforth configurations.
The RCS version numbers of the generic file fig86.gnr
are in the 2-branch and the latest version is available still.
(The 1-branch was experimental).
This version has however an manual not split between a generic and
a user part. But the user part of the manual forthemph({is})
generated from the generic source.
This version 2 can be seen as a 32-bit figForth.
The third and fourth
versions of ciforth (RCS branch 3 and 4)
are generated according to this manual.
As you see there is little pertinent information about these Forth's
in this manual.
All the information you need to use it is in the user manual,
generated with that version.
Branch 3 evolves towards an ISO compatible system.
Version 4 is a stable maintained ISO compatible system,
a balance between
technical criteria and compatibility issues.
It has a load-on-demand library facility: forthcode({REQUIRE}) which is a
quantum leap regards usability.
@ Release 5
The forthcode({REQUIRE}) facility of release 4 conflicted with usage of
forthcode({REQUIRE}) in other Forths.
So it was renamed in forthcode({WANT}) and this warrants a
major new release as it affects nearly all code.
Release 5 also sees a MS-Windows configuration that is based on calls
to DLL's instead of usage of DPMI (Dos Protected Mode Interface). That
means ciforth runs on the latest 32 and 64 bit MS-windows OS-es.
@section Acknowledgment
ciforth is based on the figforth
of Charlie Krajewski and Thomas Newman, Hayward, Ca.
This figforth (as are all figforth's) is public domain.
It is still available via taygeta. And of course kudos to FIG.
forthurl({ftp://ftp.forth.org/pub/Forth/compilers/native/dos})
This original version is public domain according to the
following statement:
forthquotation
All publications of the Forth Interest Group are public domain. They may be
further reproduced and distributed by inclusion of this credit notice:
This publication has been made available by the Forth Interest Group,
P. O. Box 1105, San Carlos, Ca 94070
forthendquotation
I also want to thank J. E. Smith, Philadelphia for another fig Intel86
implementation was obtainable from
forthurl({http://www.simtel.net/pub/simtelnet/msdos/forth/fig86.zip})
(You'll could use the wayback machine though, as per 2018).
This is a fairly good documented FIG Forth for IBM PC, but its
"Seattle Computer 8086 assembler" format makes it less practical.
@chapter Background.
If you are a Unix and a Forth guru, you can skip this chapter.
_VERBOSE_({ If you think you are,
you can read this chapter and discover you are not.})
This chapter is about pervading concepts and
how tools are used, conceptually.
@section Orthogonality
The concept of orthogonality is central to this effort.
Orthogonality means that different aspects of configuration
(in this case)
are made independent of each other.
For example, ciforth can be bootable or started by MSDOS,
it can be assembled by forthprog({nasm}) or by forthprog({MASM.EXE}) .
@pindex nasm
@pindex MASM.EXE
These two choices can be made independently from each other,
and every combination ought to work.
Each choice is associated with file with macros for forthprog({m4}) ,
so ideally if you need to specify that the assembler source
is extracted in a form suitable for forthprog({nasm}) you
only need to use the file forthfile({nasm.m4}).
This is, of course, as far as it goes.
Try as you may to separate all information about header layout
in the forthfile({header.m4}) configuration file,
a change to the order of the fields in a header will certainly have
it impact at certain places in the source.
(This applies to early versions of forthprog({gas}), and it required
forthprog({sed}) scripts on top of forthprog({m4}) )
@section Metacompilation a remark
Meta compilation, the generation of a new version of a Forth system
by ``similar tools as compilation'', was already done
with the cassette based computer system of the late seventies.
Metacompilation is mainly used to generate
a similar forth for a forthemph({different}) processor or system.
This would properly be called cross-compilation, by the way.
On a half-decent (or better) disk operating system like MSDOS the use
of meta-compilation works more smoothly.
We want our Forth to be able to generate standalone programs anyway.
(a forthdefi({turnkey}) facility.)
So what do we need
forthenumerate
forthitem
A facility to save a running system with all what is loaded on it,
in the configuration it currently has.
forthitem
A facility to remove parts of a running system, that are not needed
for an application after it has been build. (E.g. the assembler.)
forthitem
A facility to optimize some parts of a system. (Then remove the,
possibly large, optimizer.).
forthendenumerate
If you have the first facility,
you can build a powerful Forth from a small kernel and regular source code.
If you have all of them, you can build a truly,
optimal Forth from a small kernel.
The forthcodeni({SAVE-SYSTEM}) facility of course requires
in depth knowledge of the operating system.
This doesn't mean it is cumbersome or difficult.
Under Linux (a.d. 2000) we need
forthexample(
dnl NOTE! All @ must be doubled for tex.
{HEX
\ The magic number marking the start of an ELF header
CREATE MAGIC 7F C, &E C, &L C, &F C,
\ Return the START of the ``ELF'' header.
: SM BM BEGIN DUP @@ MAGIC @@ <> WHILE 1 CELLS - REPEAT ;
\ Return the VALUE of ``HERE'' when this forth started.
: HERE-AT-STARTUP ' DP >DFA @@ +ORIGIN @@ ;
\ Save the system in a file with NAME .
: SAVE-SYSTEM
\ Increment the file and dictionary sizes
HERE HERE-AT-STARTUP - DUP SM 20 + +! SM 44 + +!
U0 @@ 0 +ORIGIN 40 CELLS MOVE \ Save user variables
\ Now write it. Consume NAME here.
SM HERE OVER - 2SWAP PUT-FILE ; DECIMAL
})
Actual code as per 2018 may be different, but is hardly more involved.
@section How m4 is used.
The Unix macroprocessor forthprog({m4}) is very powerful indeed.
@pindex m4
Testimony is that the description of its usage in here
is longer that its man-pages.
You know
forthprog({m4}) is a text substitution tool.
A macro is like a function. In the macro call the text is replaced by
the text present in the function.
Within the text the placeholders for the parameters are replaced
by the actual parameters.
In forthprog({m4}) the placeholders are forthsamp({$}{1}) ... forthsamp({$}{9}).
Parameters can be passed, and any (even multiline)
text can be given as a parameter, provided it is quoted.
We will use forthsamp({_lbracket_}) and forthsamp({_rbracket_}) (braces) throughout.
This is convenient, because they are
not used in a Basic Forth system
and they are
special anyway (e.g. for TeX).
The use of quotation is very critical at times,
and the fine points are not covered in the following.
@subsection Customization
Plain use of m4 is the use of macro's to just define a string
that is different under different circumstances.
Simple customization can be done by forthprog({m4}) as follows:
forthsamp({define(_lbracket_version_rbracket_,2.149)})
Within the text treated the version number is substituted.
Among those we also have segment definitions, defined in the m4 file
belonging to the assembler used.
_TEXT_ : This introduces the part of the Forth with the definitions.
It is supposed to be stored on disk.
Normally it is modifiable, so e.g. VARIABLE's can be in here.
In the circumstance that there is separation of code and data,
there is the another segment
_DATA_ : modifiable data, not machine-executable code
This is important if the underlying operating system doesnot
allow the modifiation of code.
In that case the _TEXT_ segment is non-writable.
Most Forth's to date can get by with this data in the _TEXT_
segment.
This macro is conditional on _SEPARATED_.
So mostly _DATA_ is a void statement.
_BSS_ : This part of the Forth imposes a layout without
initialising the data.
It agrees with a traditional bss segment in e.g. an a.out format.
_IDATA_ : As an exception extra segments has to be introduced.
@subsection Selection
Selection, often one of alternatives, is in general done as follows
forthsamp({_BITS16_(32)_BITS32_(64)_BITS64_(128)}) ,
which gives, of course, the size of a double number.
This is accomplished by
forthsamp({define(_lbracket__BITSxx_ _rbracket_,_lbracket_$1 _rbracket_)})
for the actual bitsize and
forthsamp({define(_lbracket__BITSxx_ _rbracket_,)})
for others.
This is made easier with the forthsamp({_yes}) and forthsamp({_no}) macro's,
see forthfile({prelude.m4}) .
Selections can be nested within other forthprog({m4}) macro constructs.
As in
forthexample(
{{_VERBOSE_}_lbracket_({_BITS64_}(_lbracket_The possibility to cycle through all (64-bit)
numbers by {forthsamp}(_lbracket_0 0 DO ... LOOP_rbracket_) is very useful indeed._rbracket_)_rbracket_)})
Here you see at work, apart from forthmacro({_BITS64_}) , the macro forthmacro({_VERBOSE_})
that allows (if turned on)
verbosity that can help understanding but is not always appreciated.
You also see forthmacro({forthsamp}) that is in fact
a markup to indicate we have a piece of Forth code there.
Selections can be used to throw out a block of
word definitions and their documentation as a whole.
For example words interfacing with an MS-Windows o.s. make
no sense in a Linux Forth.
The braces are essential here.
Without it the introduction of a comma somewhere in the text
results in forthprog({m4}) interpreting the remainder as a second parameter,
which it will ignore.
@subsection A postponed markup language.
In documentations files you
just say forthsamp({forthcode(_lbracket_+LOOP _rbracket_)}) to indicate that you want
formatting as for ``code'' words.
Later you can decide to use
forthbreak
forthsamp({define(_lbracket_forthcode _rbracket_,_lbracket_@@code_lbracket_$}{1_rbracket__rbracket_)})
forthbreak
for forthsamp({texinfo}) or
forthbreak
forthsamp({define(_lbracket_forthcode _rbracket_,_lbracket_<B>$1</B> _rbracket_)})
forthbreak
for forthsamp({html})
.
@subsection Defining structures
Some macro calls must be considered to define a structure, in particular
forthsamp({worddoc}) .
Suppose we have a list of structures, meaning that the first person is
a child of the second and third person:
parents(_lbracket_Alice_rbracket_,_lbracket_Mary_rbracket_,_lbracket_John_rbracket_)
parents(_lbracket_Fred_rbracket_,_lbracket_Mary_rbracket_,_lbracket_Henry_rbracket_)
parents(_lbracket_Aayilah_rbracket_,_lbracket_Sjantil_rbracket_,_lbracket_Bodaji_rbracket_)
...
With
forthsamp({define(_lbracket_parents_rbracket_,_lbracket_$2_rbracket_)}) we get a list of (you guessed) the mothers.
The usage of forthmacro({divert()}) can best be explained with an example in this context.
forthexample(
{{define(_lbracket_parents_rbracket_,
_lbracket__lbracket_divert(3)dnl_rbracket_
$}{2
_lbracket_divert(6)dnl_rbracket_
$}{3
_rbracket_)}})
will give out the mothers on channel 3 and fathers on channel 6.
The output will be concatenated,
but all mothers and all fathers stay together.
For forthsamp({dnl}) see the forthprog({m4}) man-page.
@subsection Defining lists
By using an extra pair of braces you can have a list in forthprog({m4}) .
So forthsamp({_lbracket__lbracket_A_rbracket_,_lbracket_B_rbracket_,_lbracket_C_rbracket_,_lbracket_D_rbracket__rbracket_}) is
a single parameter to a macro and can
be passed to other macro's as a whole.
The outer braces are removed and
without special measures (reinstalling extra braces again)
the macro called forthemph({sees})
the comma's and concludes there are four parameters.
This is put to good use in the ``See also'' and ``Test''
fields of the forthsamp({worddoc}) structure.
These fields may have zero or more parts.
The ``Test'' field contain the tests in the odd fields, and the
expected outcome in the following even fields.
@subsection Defining aliases
Sometimes you need aliases, i.e. other names for macro's,
Although it doesn't properly belong here as a technique, I want
to mention it, because the amount of brackets is hard to sort
out. An alias is useful in a transition period, where you want
to rename something, but where you want to be able to do that
gradually on a file by file basis.
forthsamp({define(_lbracket__OLDNAME__rbracket_,_lbracket__NEWNAME_(_lbracket_$1_rbracket_,_lbracket_$2_rbracket_)_rbracket_)})
After forthvar({_OLDNAME_}) is phased out everywhere this definition can be deleted.
Note that for this to work all parameters applicable to forthvar({_NEWNAME_}) must be
taken into account, the two shown here are just an example.
@subsection Impress the crowd
By using macro's to define other macro's, then pass the result through
forthprog({m4}) another time, severe stress can be laid upon the intelligence
of the everyday person.
The very inconvenient way nodes must be linked in texinfo even forced
me to define part of the macro in one macro and the remainder in
another.
@section How the glossary is ordered.
The sorting tool forthprog({ssort}) can order multiple field records, with
different sorting criteria for each field.
The fields can be defined by regular expressions, such that
the forthsamp({worddoc}) structures can be sorted by name, or by wordset
then by name, or in about any way you want.
Because such a tool didn't exist, I had to write it.
This tool was in use until release 5.3.
From that time on ciforth itself is used for ordering,
using a small program forthfile({sortworddoc.frt}) .
@subsection Analyzing forthsamp({worddoc}) using forthcode({ssort})
forthprog({ssort}) captures the structure of a forthsamp({worddoc}) as follows:
forthsamp({^worddoc(_lbracket_@@_rbracket_,_lbracket_@@_rbracket_.*\n$worddoc})
The part between forthsamp({^}) and forthsamp({$}) matches the record.
The part after the last forthsamp({$})
is for synchronization, to make sure the record doesn't end early.
This would result in an error ``not according to structure'': the next
line doesn't start with ``worddoc'' and so it just doesn't match the record
description.
The forthsamp({$}) is merely a separation, (newlines are indicated by forthsamp({\n}) ).
The forthsamp({.*}) matches anything, including new lines.
But it isn't greedy as in ordinary regular expressions,
because not being stopped by forthsamp({\n}) ,
it would match the whole file.
Here it tries to match as little as possible.
forthsamp(@@_rbracket_) is shorthand for forthsamp([^_rbracket_]*_rbracket_$)
so a ``sequence of anything except
right braces followed by a right brace''.
It also contains the forthsamp({$}) to
mark the end of a field.
@subsection Sorting fields
Once we know what the fields are,
forthsamp({-M 1S2S }) sorts on the first field
and within that field on the second. We just use the ordinary ASCII collating
sort, indicated by forthsamp({S}) .
The program forthfile({sortworddoc.frt}) does the same,
but it is coded in Forth.
@chapter Structures and processes
@section The generic source file
The generic source file forthfile({ci86.gnr}) mostly
consists of Intel assembly code, with which, I assume,
you are familiar. All macro's in the following are forthfile({m4}) macro's.
Words are divided in small (<20) groups of cooperating words,
the forthdefi({wordset}). See also ``thinking Forth''.
The things that differ among assemblers, are taken care of by
macro's, e.g. forthmacro({_COMMENTED},$1) brackets a comment.
One of the most important is forthmacro({DC}) that lays done a cell
in memory. Not only is this different between 32 and 64 bit Forths's,
even such a simple command differs among assemblers.
Most of the time they don't have parameters.
The selection of parts that go or don't go into a particular configuration
is done by multiline macro's, generally with a call on a separate line.
Such as:
forthexample({
{_HIGH_BUF_}(_lbracket_
BUF1 EQU EM-(KBBUF+2*2)*NBUF ;_lbracket_ FIRST DISK BUFFER_rbracket_
STRUSA EQU BUF1-US ;_lbracket_ User area_rbracket_
_rbracket_);{_END_}(_lbracket_ _HIGH_BUF__rbracket_) })
Note how comments are protected from macro expansion by quotation.
The forthmacro({_END_}) is an adornment. It expands to nothing.
So it doesn't show up in the output,
but it helps to keep the generic source organized.
The forthmacro({worddoc}) macro defines a structure with additional information
of a word.
Generally it is placed in front of the word.
The same word can be found several times in the input file,
but only one is selected in a particular configuration.
The same goes for the corresponding forthmacro({worddoc}) .
forthbreak
Its fields are:
forthenumerate
forthitem
Wordset name.
forthitem
Word name.
forthitem
Pronunciation.
This is a pure textual and pronounceable identification of the word.
It is also used in forthfile({texinfo}) that doesn't handle special characters well.
forthitem
Stack effect.
The stack effect obeys all the conventions put forth in the user manual.
forthitem
Properties.
Properties are i.a. immediate and such, and the standards with which
this word complies.
Again this is described in the user manual.
forthitem
Description.
forthitem
References.
This is a list of names of other Forth words,
that can be studied to better understand this one.
forthitem
Tests.
This is a list.
The first and all other odd members is a test,
code that can be passed to Forth.
The second and all other even members is the expected outcome of the
preceding test.
forthendenumerate
forthmacro({worddoc}) are such that a structure starts with forthsamp({worddoc( }) and
end with a forthsamp({_rbracket_)}) at the end of a line.
This means that a worddoc
can be simply skipped if it occurs in Forth code,
by defining a word forthcode({worddoc(}) that reads and ignores source up to the
end sentinel.
The forthmacro({worddocchapter}) macro defines a wordset.
It has the same fields as a forthmacro({worddoc}) macro,
but most are left empty.
It is primarily used for its ``description'' field,
that is used as an overview description for the wordset in glossaries.
These macro's can be put anywhere,
but take care to exclude macro's for wordsets that are not present.
@section The process
The ultimate information about how a ciforth is generated are the makefile's :
forthfile({Makefile}) and forthfile({test.mak}) .
The process of generating a program proceeds along the following steps:
forthenumerate
forthitem
Extract the assembler source from the generic source via a configuration file.
The file suffix indicates which assembler to use.
forthitem
Generate an object file.
forthitem
Link the object file.
forthendenumerate
Once you have an assembler file, you can do what you want with it.
Proceeding from an assembler source file to a binary is in general
straightforward.
The process of generating program's documentation
(TeX and info)
proceeds along the following steps:
forthenumerate
forthitem
Generated the raw glossary documentation
from the generic source via a configuration file.
The file suffix is forthfile({.rawdoc}) .
forthitem
Sort the forthfile({.rawdoc}) file,
such that words of a wordset appear together,
and are preceded by a wordset documentation.
The file suffix is forthfile({.mig}) .
forthitem
Generate the glossary documentation from the forthfile({.mig})
by expanding the forthmacro({worddoc}) 's into glossary entries by
forthfile({gloss.m4}) or forthfile({glosshtml.m4}) .
This, for a second time (!), takes into account the configuration file{}_VERBOSE_(
{, to generate exactly fitting information}).
forthitem
Expand the ``postponed markup's '' in forthfile({ciforth.mi})
by the macro's from forthfile({manual.m4}) to
generate the texinfo commands.
This file include all the other forthfile({.mi}) files with postponed markup's.
forthendenumerate
The process for generating html has the postponed markups and the
expansion into glossary entries in the same file forthfile({glosshtml.m4}).
Only the documentation of the glossary enters into html ,
and the forthfile({ciforth.html}) is generated from the intermediate
file forthfile({.mig}).
Generating documents is made more complicated by the requirements for special tables.
For forthsamp({html}) we want an extra alphabetic list of all the words where we click on to
get at the glossary entry immediately.
forthbreak
In forthfile({texinfo}) we need to build complicated menu structures, that refer back and forth.
forthbreak
This is done by separate passes over the forthfile({.rawdoc}) or forthfile({.mig})
files, with other macro's.
@chapter Production,
After all the explanations of tools, we're now in a position to actually
build a Forth compiler.
@section Overview
There is one generic source file, where documentation, test and code
of a Forth definitions are kept together. m4 selects and extracts the
assembler code, a test, and documentation paragraphs, meanwhile making
slight adaptations. The assembler file is turned into a runnable program.
The texinfo file is turned into whatever documentation format.
The testfiles consists of runnable code, and the expected outcome.
@section Design decisions
Is has been explained why this Forth is restricted to the Intel 86.
Not much more can be said for such a highly configurable system,
but in this section we will try to illuminate major design decisions.
@subsection Assembler language
ciforth borrows the distribution strategy philosophy from the old figforth.
It is in fact based on it, and its documentation in first draft
copied from it.
The Forth's are build from an assembler source, and it is in general an
indirect threaded Forth.
FIG produced a first source for a new processor as
a text file, using an existing Forth,
with many parts just copied from that Forth.
In this way the FIG working group produced a printed copy
of an assembler source for every processor in existance at 1980.
Strictly adhering to the assembly language as defined by the chip
manufacturer, allowed to type in from hardcopy, then build the
Forth using any conforming assembler and little, or nothing, more.
ciforth tries to reproduce this desirable situation.
An engineer might balk at the description of how to use a meta
compiler, which means specifying a Forth at an abstract level,
but feels at ease with a (much larger) assembler source and can
easily make modifications, an empowerment much talked about
in the era of the Free Software Movement,
but not much practiced.
@subsection Indirect threaded
Although speed is currently in fashion, using subroutine threaded Forth's
with optimizers, indirect threading is the preferred choice for some
applications.
I did this work, because I planned to use it
for work in artificial intelligence.
A forth has the property like lisp that a running system can
add definitions to itself.
An indirect threaded Forth has the same property than is present
in some lisps that even very basic word can be rewritten
on a running system and take effect immediately.
Furthermore the current trend of subroutine threaded Forth's may very
well be unsuitable for 64-bits processors like the Alpha.
@subsection bit-size indifferent
It is unusual for a forth to be configurable as 16, 32 bit or 64 bits.
It turned out that the addition of forthcode({CELL+}) goes a long way
toward allowing
utilities like a decompiler to be 16/32/64 bit clean.
In the documentation mostly reference to cells can be made.
But the macro's
forthsamp({_BITS_})
forthsamp({_BIT16_}) and
forthsamp({_BIT32_})
forthsamp({_BIT64_})
can be used to signify the actual number of bits and parts to refer
to 16, 32 and 64 bits only respectively.
@section System requirements
The host requirements, for building, are different from the
target requirements, for running.
In practice the generic system must be run on a GNU Linux system.
This generated Forth can target almost anything.
At least it runs on industry standard hardware
("PC's") : standalone, under Linux and under MSDOS/MSWINDOWS.
To build, you need a version of forthprog({nasm}) , forthprog({TASM.EXE}) or forthprog({MASM.EXE}) on your system.
As of 2018 forthprog({fasm}) is the assembler of choice as it allows
the building of binaries without a linker.
Also it can build MS-Windows executables on a linux system.
@section Different assemblers
The difference between high level languages and assembler can be summarized as follows.
Two compilers for e.g. forthprog({ADA}) accept the same source and produce different binaries,
that however are functionally equivalent programs.
Two assemblers produce byte-for-byte same binaries, but the
functionally equivalent sources may exhibit substantial difference.
A diff file may be as long as the source itself, on the account
of differing comment symbols, or the habit to prefix register names
with '%', or the order of operands.
The ciforth system has a provision that during extraction of the
actual assembler source such things as the comment symbols and the directives
are adapted to the actual assembler.
@subsection Actual assemblers used
To build, you need a version of forthprog({nasm}) , forthprog({TASM.EXE}) or forthprog({MASM.EXE}) on your system.
As of 2018 forthprog({fasm}) is the assembler of choice as it allows
the building of binaries without a linker.
Also it can build MS-Windows executables on a linux system.
An other commendable assembler is forthprog({nasm}) , it is an open source assembler and available on different
platforms, at least MSDOS and Unix, and most importantly OSX.
It solves a lot of the design errors I
find in the Intel ways of forthprog({MASM.EXE}) .
It is easier to use than the GNU forthprog({as}) because it adheres more
to the Intel syntax.
Another advantage is that it can generate a binary without a linker.
On the opposite side, e.g. Borland's forthprog({TASM.EXE}) you can buy (as per 2000)
only as part of a giant C++ package.
If you want to use the generic possibilities you will need a Unix system
with all of its tools.
I have successfully used GNU-Linux (RedHat , Suse, Debian) to
do the makes and version
control on that. If you want your bootable floppies made from Linux to be
MSDOS-compatible you need mtools.
@section example of how sources are extracted.
The file forthfile({alone.asm}) can be assembled using forthprog({nasm})
which can be concluded from the extension forthdefi({asm}) .
It includes a boot
sector such that it can boot from a standard floppy on a industry standard
Intel PC.
If you have the mtools set (most Linux's have it) the Makefile
shows you how to make the floppy.
On MSDOS you can use forthprog({DEBUG.EXE}) .
If you run on Linux with
forthsamp({mtools}) , forthsamp({make boot}) will do it.
The resulting floppy will even be recognized by
MSDOS, such that you can copy block sources to it.
forthsamp({make moreboot})
will do this from Linux, then you will have forthfile({forth.lab})
available.
forthsamp({make allboot})
will do it all, but it needs a working forth
on Linux for doing some calculations.
Otherwise on MSDOS (I recommended version 3.3, the most stable MSDOS ever)
adapt the example forthfile({genboot.asm}) .
The file forthfile({msdos.msm}) can be assembled
using forthprog({TASM.EXE}) and forthprog({MASM.EXE}).
The resulting Forth
executable can be run off hard disk and respects the file system on it.
It uses the file forthfile({forth.lab}) .
@section customizations
In this section the different ways the generic ciforth system can
be used to produce a Forth of your liking. They require different
expertise and target a different audience.
@subsection Overview
As was mentioned before, ciforth has one single source file: the generic forthfile({ci86.gnr}) .
All advantages of assembler source would be gone, if an engineer were
confronted with conditional compilation and lots of code for other systems
he doesn't want to learn or assemblers he doesn't want to use.
So we proceed in two steps. First a clean assembler source is generated from
the generic Forth using configuration files. Then the assembler source is
processed in one of a number of ways, each way familiar to one brand of
engineers.
The clean assembler source is the preferred form of publication,
because it is the most usable.
You can customize at a number of levels.
forthenumerate
forthitem
Configuration files have extension forthsamp({.cfg}) , these are files with forthprog({m4})
commands. They are intended to use at the highest and easiest level of
configuration. The usage is simple. If you want a Linux Forth use forthfile({lina.cfg}) .
forthitem
forthprog({m4}) files have extension forthvar({.m4}), and control one aspect of genericity, such
as which assembler or the protection mode. You definitely need to know forthprog({m4})
to adapt or create these.
forthitem
Assembler files can be customized in the traditional way by adopting
constants, or commenting in source lines. The assembler files are distinct
from the one generic source. No forthprog({m4}), you need only cope with the directives
of your assembler, and you will not see any code applicable to other operating
systems or I/O systems. (It is not commented out, it is just not there.)
forthitem
You can adapt the generic system.
This is difficult, but can accomplish a lot.
If you manage to adapt it to the ARM,
the result is a lot of ARM Forth's.
One of those will run on ARM Linux, but, try as you may,
none will run on MS-Windows.
forthendenumerate
@subsection Level 1 customization.
This is assuming you run on some sort of Unix.
This amounts to using, maybe adapting or creating one of the configuration
files with extension forthcode({.cfg}) .
Sensible choices will in general lead to a usable Forth's.
A source file that is extracted may need to be assembled on a target
system. Fortunately Unix in particular GNU/Linux can emulate
almost everything.
By specifying what you want in a configuration file you can generate a host
of assembler listings. This is as simple as replacing ``_yes'' with ``_no'' in
configuration files.
See the examples forthfile({msdos.cfg}) and forthfile({alone.cfg}) and the Makefile.
You can find out what the options are by inspecting forthfile({prelude.m4}) .