Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting 8086-toolchain and cross-compiling for ELKS #2159

Open
ghaerr opened this issue Dec 29, 2024 · 77 comments
Open

Porting 8086-toolchain and cross-compiling for ELKS #2159

ghaerr opened this issue Dec 29, 2024 · 77 comments

Comments

@ghaerr
Copy link
Owner

ghaerr commented Dec 29, 2024

This issue is a continuation of #2112, which was getting very long.

The topics covered are getting the 8086 toolchain compiled on the host, using it to cross-compile applications for ELKS, and native compiling of programs on ELKS itself.

For newcomers, getting just the first two done above, compiling the toolchain, then cross-compiling an example, require some setup, as there are three compilers involved:

  • ia16-elf-gcc (GCC): used to compile ELKS and some ELKS applications. Setup with . env.sh in ELKS root.
  • Open Watcom (OWC): used to compile most of the 8086 toolchain. Setup with . wcenv.sh in ELKS libc.
  • 8086 Toolchain (C86): used to both cross-compile and natively compile ELKS apps. Setup with . c86env.sh in ELKS libc.

First, the location of the OWC installation directory (WATCOM=) must be set by editing libc/wcenv.sh.
Second, the location of the C86 repo (C86=) must be set by editing libc/c86.sh.
Then the following steps are used to build each piece in order:

$ cd ELKS
$ . env.sh    # setup TOPDIR= needed for using GCC
$ cd libc
$ . wcenv.sh  # setup WATCOM= need for using OWC
$ . c86env.sh # setup C86= needed for using C86 (when built)
(TOPDIR=, WATCOM= and C86= now set)

$ cd ELKS
$ make        # normal ELKS full build
$ cd 8086-toolchain
$ make host   # make host version of C86 using host OWC and GCC
(host c86 cross compiler toolchain now in 8086-toolchain/host-bin)

$ cd ELKS
$ make owc    # make native OWC library libc/libc.a using OWC
$ cd 8086-toolchain
$ make elks   # make ELKS version of C86 using host OWC and native OWC libc
(native c86 compiler toolchain now in 8086-toolchain/elks-bin)

$ cd ELKS
$ make c86    # make native C86 library libc/libc86.a using host C86
$ cd 8086-toolchain/examples
$ make        # make ELKS example apps using host C86 and native C86 libc
(native chess and test examples are now in 8086-toolchain/examples)

After all this, in the 8086-toolchain directory, you will have the host C86 toolchain executables in 8086-toolchain/host-bin,
and the ELKS native C86 toolchain executables in 8086-toolchain/elks-bin. The native C library is in ELKS/libc/libc86.a.

After the three environment variables are setup and all repos have been made at least once, the update cycle is quite a bit simpler, since one doesn't need to bootstrap the process as above.

When either repo is updated and the three environment variables set, only the following needs to be done:

$ cd ELKS
$ git pull ...
$ make
$ make owc c86        # builds OWC and C86 libraries

$ cd 8086-toolchain
$ git pull ...
$ make clean
$ make                # builds both host and elks C86
$ cd examples; make   # builds examples
@toncho11
Copy link
Contributor

toncho11 commented Dec 29, 2024

Thanks @ghaerr !
One might need to modify: #add_path "$WATCOM/binl64" # for Linux-64 (Intel CPU) in wcenv.sh
It is better to install the full binary version of OWC. I use open-watcom-2_0-c-linux-x64 installer on Ubuntu.
All the above should be in a wiki page as well.

Also a binary release in the form of zip file should be released. This way anyone can start compiling.

@floriangit
Copy link
Contributor

One might need to modify: #add_path "$WATCOM/binl64" # for Linux-64 (Intel CPU) in wcenv.sh

I stumbled over that one as well. Seems the l is for linux and the o is for OSX?
Otherwise things are building as described 👍. Other things I noted:

  • Without all this (OWC and C86), i.e. using stock ELKS as-is cloned from the repo, make clean in ELKS is now broken, since it depends on a watcom build file.
  • The 8086-toolchain compiles with -O3, which I find a bit scary 😃. Could it be -O2?

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 29, 2024

@toncho11 and @floriangit, thanks for your comments. I've fixed "make clean" and added a note about add_path in #2160.

The 8086-toolchain compiles with -O3, which I find a bit scary 😃. Could it be -O2?

I hadn't worried about -O3 for host builds, have you had issues with -O3 with various software? It is nonetheless can be easily changed.

Also a binary release in the form of zip file should be released. This way anyone can start compiling.

I'll leave it to @rafael2k to continue building binary dev kits with the binary libc86.a and more testing on ELKS itself, but am considering an option to build an HD image that contains everything prebuilt for ELKS, discussed in #2157. We won't be able to easily build any host binary distribution of the cross-compiler(s) though.

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 29, 2024

All the above should be in a wiki page as well.

Added https://github.com/ghaerr/elks/wiki/Setting-up-the-8086-toolchain-(C86-compiler-and-tools) to Wiki.

@floriangit
Copy link
Contributor

have you had issues with -O3 with various software?

I had years ago issues with -O3 unexpectedly doing (a) a re-ordering of code and (b) optimizing out instructions, that were not supposed to be deleted. Linus sticks with -O2 for various reasons for his kernel, too ;-) I would only use O3 for contained modules that really need that hot path optimized and look at the assembler after, with the 8086 toolchain it looks like it's in a top Makefile. Anyhow, that may only be my experience...

@toncho11
Copy link
Contributor

@ghaerr Can you make a binary native release please? Just a tar or zip file with everything needed as Rafael did with the elks-devdisk? For example I am not sure which header files it should contain on top of the bin and examples folder. Also there should be .sh file that configures some variables? A binary release will help with testing. I will try the Makefile.elks in the examples and check if it works.

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 29, 2024

Can you make a binary native release please? Just a tar or zip file with everything needed

Its going to be a lot of work to produce that, as we're now talking about the entire C library header files, which also include the ELKS kernel header files, in both linuxmt/ and arch/, etc, etc. I don't think it has much chance to fit on a floppy, and I don't have a native ELKS machine setup to even test or try out such a thing. That's why a brought up the subject of how to best think of distributing all the stuff we have now in #2157. Since @rafael2k's last dev kit, the number of files has increased enormously, since were now providing the ability to compile anything with the full ELKS C library, which isn't small.

If you just want to play around with c86, you can copy the 8086-toolchain/examples and elks-bin/ folders to your hard drive. But the bigger problem is that as soon as you include something as seemingly simple as <stdio.h>, the tangled mess of include files means lots of directory organization on the target in order to work. We don't yet have a script in ELKS that can create floppy or tar file from a list of other files easily like we do in elkscmd/Applications.

Since I'm still very knee deep in getting the tools themselves working, with a huge list of enhancements needed, I will probably leave a binary distribution to @rafael2k for when he returns. I'm not sure whether he was manually creating the dev disk or had a script to do it, but its not checked in.

In the meantime, I appreciate your desire to test and play with things - perhaps more cross-compilation of programs can be played with the existing tools by adding more sample C programs of your own choosing. This would be easily done by just copying them into 8086-toolchain/examples and updating 8086-toolchain/examples/Makefile. Then run "make" and continue testing with the cross-compilation environment. We still need to run lots more programs through the process to see if they'll work, even with host-based cross-compilation. I'm sorry but things are getting very complicated and time consuming and I'm doing the best I can!

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 30, 2024

I had years ago issues with -O3 unexpectedly doing (a) a re-ordering of code and (b) optimizing out instructions, that were not supposed to be deleted. Linus sticks with -O2 for various reasons for his kernel, too

Yes, I understand what you're saying. With ia16-elf-gcc and -Os, its a total optimizing compiler with code re-ordering and unused code deletion, in the ELKS kernel now. So I've gotten quite used to it. But yes how simple it is to watch C86 spit out decent but definitely non-optimized code, and you can almost see the ASM output per C statement. But nowadays I view it as a balance, that is, its a good idea to turn on heavy code optimization and see whether your program still works, as in almost all cases, the optimization is actually correct and matches the C virtual execution machine specification, its just your expectation that isn't matching. This is especially true of asserts and "noreturn"-marked functions. This kind of thing has also forced me to jump much deeper into the "undefined behavior" compiler and execution issues lots of people have been talking about these days.

and look at the assembler after

Definitely a good idea! I have object file disassemblers for GCC (ia16-elf-objdump) and OWC (wdis), but not yet AS86 (objdump86 will dump hex .text/.data w/relocations but not actual disassembly). I'm working on getting objdump86 to disassemble AS86 .o files right now for the very reason you are describing. It's been a learning process with no documentation on AS86 .o format, but pretty much have that now figured out.

@floriangit
Copy link
Contributor

floriangit commented Dec 30, 2024

Thanks for the -O3/-Os clear-up, I understand.

Now, I poked a bit more after following the instructions from the WIKI.
I can compile on linux with host-bin binaries the examples and they do work on the target after transferring them.
I cannot compile on ELKS with elks-bin binaries, only few are working, it seems they themselves got compiled incorrectly (OS/2)? First image is linux and second is ELKS..

20241230_133526

20241230_133117

@toncho11
Copy link
Contributor

toncho11 commented Dec 30, 2024

I not sure but these that are OS/2 are probably compiled with OWC and this might be normal. It says OS/2 but these are still ELKS binaries.
Nice. You are doing what I wanted to do!
But how do you manage the headers? In order to compile something inside ELKS to final binaries you need headers and the libc86.a to link with. If you use only the compiler, you still need headers. I am looking at:

C86LIB=$(TOPDIR)/libc
INCLUDES=-I$(TOPDIR)/libc/include -I$(TOPDIR)/elks/include -I$(C86LIB)/include/c86

So you need to copy all these headers (from the 3 folders specified above) in one or several folders and adjust the above in the Makefile.elks to point to them. TOPDIR is elks base folder when you git clone ELKS. Also the C86LIB must contain the libc86.a that you previously compiled on your Linux host. And you do:

make -f Makefile.elks

@toncho11
Copy link
Contributor

Are you also using the latest source of ELKS? @ghaerr has done memory allocation optimization recently.

@floriangit
Copy link
Contributor

Thanks toncho11,
I was aware that for the early stage compiler it needs to include the header files and for the later state, it needs to link with libc. But the error also occurs when providing no file at all. I would expect that first thing c86 checks argv and bails out with an error message if you don't provide anything. That lead me into checking the file magic and then was worried about the OS/2 thing.

That said, your 2nd comment may be spot on! D'oh, I have copied some things around, but might as well be on 0.8.0 kernel-wise, lol. Let me check!

thanks.

@toncho11
Copy link
Contributor

toncho11 commented Dec 30, 2024

Some of the @ghaerr memory allocation changes are just 1-2 days old. Example: #2152
Also @ghaerr did something about argv recently related to the toolchain: #2150
These are all in ELKS souce code, not the toolchain.

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 30, 2024

@floriangit,

I cannot compile on ELKS with elks-bin binaries, only few are working,

There are a number of problems here. See that "MZ" on the 2nd screenshot? That's the OS/2 (DOS) MZ executable header. The shell thinks the executable is a shell script since the kernel isn't configured to run OS/2 binaries. Set CONFIG_EXEC_OS2=y to fix that.

The second problem is that the toolchain "make" doesn't support ifdef, ifndef. That may be your problem. I think the supplied examples/Makefile.elks works, but I've been testing with another one, but I haven't actually been testing with that one.
Here's the Makefile I've been using. (As I previously mentioned, I don't actually have the devdisk setup on my system, since I'm running off of QEMU with standard ELKS images for the moment):

C86LIB=.

CPP=./cpp86
CC=./c86
AS=./as86
LD=./ld86

INCLUDES=-I.
DEFINES=

CPPFLAGS=-0 $(INCLUDES) $(DEFINES)
CFLAGS=-g -v -O -bas86 -separate=yes -warn=4 -lang=c99 -align=yes -stackopt=minimum -peep=all -stackcheck=no
ASFLAGS=-0 -O -j -w-
LDFLAGS=-0 -i -L$(C86LIB)

all: chess test

test: test.o cprintf.o
    $(LD) $(LDFLAGS) test.o cprintf.o -lc86 -o test

test.o: test.asm
    $(AS) $(ASFLAGS)  test.asm -o test.o

test.asm: test.i
    $(CC) $(CFLAGS) test.i test.asm

test.i: test.c nanoprintf.h
    $(CPP) $(CPPFLAGS) test.c -o test.i

chess: chess.o
    $(LD) $(LDFLAGS) chess.o -lc86 -o chess

chess.o: chess.asm
    $(AS) $(ASFLAGS)  chess.asm -o chess.o

chess.asm: chess.i
    $(CC) $(CFLAGS) chess.i chess.asm

chess.i: chess.c
    $(CPP) $(CPPFLAGS) chess.c -o chess.i

cprintf.o: cprintf.asm
    $(AS) $(ASFLAGS)  cprintf.asm -o cprintf.o

cprintf.asm: cprintf.i
    $(CC) $(CFLAGS) cprintf.i cprintf.asm

cprintf.i: cprintf.c
    $(CPP) $(CPPFLAGS) cprintf.c -o cprintf.i

clean:
    rm -f *.o *.i *.asm  test chess

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 30, 2024

The above Makefile requires that libc/include/c86/stdarg.h and stddef.h be copied to /root also. It is not setup for the devdisk, sorry.

Also the C86LIB must contain the libc86.a that you previously compiled on your Linux host.

[EDIT: Also, libc/libc86.a must be copied to /root.]

@floriangit
Copy link
Contributor

OK, I can confirm the executables (c86, as86) are now executing normally after I added CONFIG_EXEC_OS2 (how much other-OS history you wanna put into ELKS, Greg? :-P) and installed 0.9.0-dev on my HD. As for make I don't care yet, it feels already great to be able to natively compile a C file into assembly for me!

Without any header it's quite entertaining to see the poor compiler cope, or not:

20241230_173137

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 30, 2024

I can confirm the executables (c86, as86) are now executing normally after I added CONFIG_EXEC_OS2

Nice! I'm pretty sure that CONFIG_EXEC_OS2 is set default ON, but I'll double check. It probably picks up the settings from your existing .config so didn't get it.

how much other-OS history you wanna put into ELKS?

Well we needed a new executable format that allowed for any number of code and data segments, since ELKS and MINIX a.out didn't do that... and whaddya know, Open Watcom supported the format along with large model, so ... "a child was (re)born".

Without any header it's quite entertaining to see the poor compiler cope

I really like our new C86 compiler. Its very well engineered internally and has lots of potential, I'm very surprised I had never heard of it until @rafael2k found it. But yes, it's error handling isn't the greatest...

It is strange to see the ".byte ..." output after max error count termination - that's supposed to stay in the output file (unless you were running c86 without a second argument, in which case it just writes ASM output to stdout which I bet is happening).

You are aware that you have to run cpp86 before running C86, right? C86 doesn't have an internal preprocessor.

@floriangit
Copy link
Contributor

floriangit commented Dec 30, 2024

(unless you were running c86 without a second argument, in which case it just writes ASM output to stdout which I bet is happening).
You are aware that you have to run cpp86 before running C86, right? C86 doesn't have an internal preprocessor.

I was not aware of both your inputs, since I was merely tinkering a bit. But (having to) provide two arguments to a C compiler and thinking that CPP means a C-preprocessor (I thought of course it's C++) surely makes me feel a bit younger today! :-) I grew up with gcc already, lol (where pre-processing, checking the C language conformance, optimizing, linking and all in-between is done by one executable hiding all those executables doing the grunt work).

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 30, 2024

(I thought of course it's C++)

LOLOL!!! We're talking about running on ELKS here! Amazing to even get a C compiler going :)

where pre-processing, checking the C language conformance, optimizing, linking and all in-between is done by one executable hiding all those executables doing the grunt work).

Well, that's the plan here too - writing 'cc' to run em all... but its further down on the huge list of things to be done. On the host, you can use ecc (a shell script) to compile random files, and it now accepts multiple .c files or -c, and will link using l86 if no -c given. When -c is specified, the preprocessed .c file is left as .i and the output asm file is .as. For your viewing pleasure, since you like to look at assembly :)

[EDIT: I'll update the examples/Makefile.elks so that it runs on ELKS with ELKS make].

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 30, 2024

@floriangit, you've come this far, we definitely need to have you try building the example chess program so you can actually run it on ELKS. I've updated examples/Makefile.elks in ghaerr/8086-toolchain#23. Pull that down, and copy examples/Makefile.elks to your ELKS /root directory as Makefile. Then type "./make".

@floriangit
Copy link
Contributor

floriangit commented Dec 30, 2024

Disclaimer: I never learnt how to play chess. But maybe I learnt something else. :-D

libc.a was already there, now I copied all libc/include headers into /root. Of course make (well: c86) complained that stdio.h lines 5 and 6 include other headers that get included system-wide. I stopped there (it's getting late to construct a fuller system like linux, haha) and commented the #include "/root/stdio.h" altogether in chess.c...And then again ./make:

20241230_202856

My system has a total of 1M, can the compiler diagnostic message be meaningful/trusted?
chess.c looks not overboarding, but then again, I talked about a C++ compiler in this thread, so I better let you guys take over. :)

edit: Yes, I pulled your Makefile.elks and put that into /root/Makefile

@floriangit
Copy link
Contributor

I'm pretty sure that CONFIG_EXEC_OS2 is set default ON, but I'll double check

It was not for me. And pretty sure I never touched that setting.

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 30, 2024

Of course make (well: c86) complained that stdio.h lines 5 and 6 include other headers that get included system-wide.

I just fixed that.

My system has a total of 1M, can the compiler diagnostic message be meaningful/trusted?

Are you running networking? Perhaps turn that off. The message is saying that there's not enough memory in the 640k address space to compile. You can do that with 'net stop'.

It was not for me. And pretty sure I never touched that setting.

If you have previously installed ELKS, the setting won't get updated unless you copy ibmpc-1440.config to .config

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 30, 2024

I just create an automated way to both build C86 and copy it to /root in #2163. It has the stdio.h fix just mentioned above.

This will allow users to (hopefully) build and test C86 more conveniently, although I can see we're already very right on RAM.

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 30, 2024

@floriangit:

You can do that with 'net stop'.

If you still get the out of memory message after turning off most things, including perhaps an "init 1" to stop multiuser mode, run "meminfo" and take a screenshot. That'll allow me to look at what's happening in more details. Thanks!

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 30, 2024

You can also try running "./make" without "time" which will run with slightly more RAM.

Also, if chess.c is too big for reasons yet unknown on your system, try "./make test".

[EDIT: On my system, neither "./make" or "./make test" will work with networking running. After "net stop", "./make" works].

@toncho11
Copy link
Contributor

toncho11 commented Dec 30, 2024

Ok so in test.i: test.c nanoprintf.h nanoprintf.h must be removed in Makefile

And I get:
toolchain
for test

This is my ELKS image with the toolchain in root, but without the nanoprintf.h correction.
toolchain.zip

You can load it here online: https://copy.sh/v86

@floriangit
Copy link
Contributor

What do you mean "blasted a floppy"... do you have a limited supply of working floppies?

Yes, exactly, I have around 20 floppies here, and around 10 have already died. Today I was greeted while booting the ELKS floppy (don't remember exactly):

ELKS: ..******1 Press any key to reboot

Tried some more times, but then changed the floppy and that one is working again. You can buy those floppies "new", but they are 5-20 years old in the sense of production date. So there is luck involved. They got rather expensive, too!

This is a 286 AMD/Intel and I think it's clocked at 8MHz (so mabye 4MHz w/o turbo?). I like the slow feeling and the grumbling HDD sound (see the only other repo on my github profile), but the power supply is so old and noisy that I keep the PC only running when fiddling with ELKS :-)

I'll revert the "enhancement" after we get a proper dev disk put together that puts the binaries somewhere other than /root. :)

Thanks! If ELKS ever aims at POSIX compliance, we gotta play by the rules, lol!

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 31, 2024

see the only other repo on my github profile

hdd_sound, huh?! Nice. Back when Doom was being ported to ELKS, I noticed a very cool source file that implements PC speaker sound effects for Doom WAV files. Its very basic but well written - I was thinking how nice it would be to be able to play WAV files on ELKS. The big problem is actually creating or finding any "WAV" files in this particular format (basically just time-sliced speaker frequencies). There's supposedly a program "Muse" that creates them, but I've been unable to find how that worked or whether its still available. Of course, my other big problem is actually listening to the PC speaker for development, when all I've actually got is a MacBook Pro running QEMU!

Totally off-topic, but hey, who wants to fix FAT filesystem problems right now anyways...

I have around 20 floppies here, and around 10 have already died.

Let me know if you get near to running out. Are these 5 1/4" or 3 1/2? I think I've got several boxes of old 3.5" floppies around here I'll never use, I could send some of them to you since I'l probably never get a chance to use them.

@floriangit
Copy link
Contributor

Thank you so much, but no worries, the shipping to Europe would not be worth it, I assume.
I like to order my FujiFilm floppies - and if in worry, I'll ask @tyama501, since he is in Japan and his/her country still seems to have real floppy manufactures! :-) Albeit I read, the JP government has banned floppies for internal affairs now.

Sorry, also Offtopic ;-)
A good start into 2025!

@tyama501
Copy link
Contributor

Happy New Year!

Unfortunately Japan manufacturers also
have quit making floppies long time ago.
(Still some available in amazon but getting less)

It is only for PC-98, but I have seen in X post
who making WAV speaker play on ELKS using OpenWatcom and inline assembler :)

@toncho11
Copy link
Contributor

toncho11 commented Jan 1, 2025

Happy new year!

I just tested on the 86Box emulator emulating an Amstrad 8086, 8 Mhz and it took 3m20 seconds to compile test and chess.
Only test is 1m17s
Only chess is 2m17s
Only test and not using cprintf (only printf) 38s (2 x faster)

I did clean each time.
86box is cycle accurate emulator, so it should be closed to real hardware.

OK, so the as86 is the main bottleneck I think. If you do time as86 only , it takes 21 seconds! So the moment you need to call it a second time as in the case of cprintf you add at least 21 seconds.

Also optimizations can be disabled by default to speed up the compile time. I mean commented out, so that they can be enabled easily.

@toncho11
Copy link
Contributor

toncho11 commented Jan 1, 2025

Ok so this is taking a lot of time in as:

    init_heap();
    initp1();
    initp1p2();
    inst_keywords();
    initbin();
    initobj();
    initsource();
    typeconv_init(INT_BIG_ENDIAN, LONG_BIG_ENDIAN);

@ghaerr
Copy link
Owner Author

ghaerr commented Jan 1, 2025

@toncho11, how do you know, are you measuring between printf statements to learn that's where lots of time is being spent?
I'm looking at those routines and they don't really do much.

so the as86 is the main bottleneck I think.

AS86 is much faster than NASM, and we are purposely testing with a "large" (at least for ELK's purposes) chess.c file that produces a 39k .as ASM file. We also have had to turn on the "automatic" jump statement handling that requires multiple passes - this is probably taking quite a bit of time. (Jump handling is required because 8086 conditional jumps only allow a +/- 127 byte hop to another instruction; if the code in between is longer, then the jump can't assemble, so AS86 reverses the condition code and issues a direct (+/- 32k) jump instead).

There is a possibility to turn on C86 jump reversal as standard. It isn't implemented yet for AS86 output, but would make the code quite a bit larger, but assembly time shorter. I will look into that. We can probably put in a display that shows how long AS86 is taking within each pass to get an idea of what's happening.

Also optimizations can be disabled by default to speed up the compile time.

If you mean C86 optimizations, I've tried that - and the resulting code is terrible. So we really need to turn it on. If you're talking about the AS86 -O, it may be that can be turned off, but I think I tried that already and it needs to be on with -j (jump optimizations).

@ghaerr
Copy link
Owner Author

ghaerr commented Jan 1, 2025

it takes 21 seconds! So the moment you need to call it a second time as in the case of cprintf you add at least 21 seconds.

I see, what you're saying is that AS86 is taking 21 seconds regardless of the input .as file size? Wow. I will look into that.

@ghaerr
Copy link
Owner Author

ghaerr commented Jan 1, 2025

@toncho11,

Taking from your suggestions, I removed "-O -w-" from the ASFLAGS= flags line in Makefile and that cut the AS86 build time down by 40%!

We need the -j option for jump handling, but it seems the -O goes further and tries to reduce any long jumps to short jumps by running another compiler pass. The -w- option will normally produce a warning to that affect without -O, so that gets removed to.

It looks like the code file size increase is very small without these options. I'll make your idea the standard, and remove -O -w- from the standard Makefile(s).

I see that it is taking a bit of time on my speedy QEMU to run through the init statements in AS86. I'm still looking into why these are slow.

Thank you!

@ghaerr
Copy link
Owner Author

ghaerr commented Jan 2, 2025

so the as86 is the main bottleneck I think. If you do time as86 only , it takes 21 seconds!

@toncho11, your comments and debugging were spot on, and helped to find a major bug in CPP86 that had been lurking inside the toolchain since the beginning. In addition, the speed issues you pointed out were caused by a bad bug in the debug malloc routine. Both have been fixed in #2169, and running "make" is several times faster than before.

Thank you!

@toncho11
Copy link
Contributor

toncho11 commented Jan 2, 2025

Thank you @ghaerr!

Now the ./make takes 2m20 seconds in 86Box. Now it is exactly 1 min less: 3m20s -> 2m20s.
Only test takes 50s instead 77s. This is with cprintf always there (original source).
And test with removed cprintf in both source and Makefile results in 26s instead of 38.

This means that we can not go less than 26s whatever we compile. All tested on 86Box.
Again awesome job @ghaerr !

And here is the ELKS image to test:
elks-toolchain6.zip

@toncho11
Copy link
Contributor

toncho11 commented Jan 2, 2025

If we were to remove all console messages we will probably gain a few seconds, but for now seeing what happens is more important. I think printing on the console is slow in general and on ELKS. This is just a thought, not a request to disable them.

@toncho11
Copy link
Contributor

toncho11 commented Jan 2, 2025

Does as86 have a code that verifies the input assembler? Is it possible to turn this on/off?

@ghaerr
Copy link
Owner Author

ghaerr commented Jan 2, 2025

Does as86 have a code that verifies the input assembler?

Do you mean does AS86 verify its input to see if it is correct? That always occurs.

I'm still looking for ways to speed the assembly up. It seems that most of the last speedup is actually occurring because of the removal of the -O option, although I am seeing some other strange speed-related behavior. For instance, when running "time make" vs "make", the speed of the entire operation is different. Specifically, the "Pass 1..." displays run quite a bit quicker sometimes. The really strange thing is that some builds happen faster with "time make", and others with just "make". I'm not sure if this is a QEMU issue, still tracking it down.

If we were to remove all console messages we will probably gain a few seconds

The console is nowhere near that slow, I don't think. Removing the -v option in CFLAGS will stop the display of the c86 version number and memory used. I plan on moving the AS86 "Pass" output to a -v option, but that actually ended up being more complicated. Eventually there won't be any extraneous messages.

This is with cprintf always there (original source).
And test with removed cprintf in both source and Makefile results in 26s instead of 38.

Yes, we could remove cprintf but I've left it all in as this has been very useful for testing the CPP86 preprocessor (otherwise there aren't actually any #defines to process!). The chess.c program is useful since its a larger .c file and useful for our timing tests. When we move to having to preprocess more C library header files that may slow things down. I may try to add that to the examples/ dir to get a more realistic example of general compilation times.

@toncho11
Copy link
Contributor

toncho11 commented Jan 2, 2025

No problem with 86Box with or w/o time. 86Box is cycle accurate, while I think QEMU is not cycle accurate. So it is a QEMU issue I think. Also qemu is 386 and above?

The whole idea of using two programs one that generates an assembly and the second that compiles it looks heavy. The first program needs to generate a specific text format, next this is saved to HDD (takes time), next it must be loaded from HDD (takes time) and then it needs to be parsed by as86 and validated. Here we have a toolchain, but I always imagined a compiler where one program generates the .o directly from .c files.
So one optimization could be to skip the validation in as86 if we know that the generated assembly is always correct. OK. There is nothing 100% sure, but one can enable it once the final executable needs to be generated (similar to the optimizations) or in the case the generated program crashes.

@ghaerr
Copy link
Owner Author

ghaerr commented Jan 2, 2025

while I think QEMU is not cycle accurate. So it is a QEMU issue I think. Also qemu is 386 and above?

QEMU is definitely not cycle accurate. It does emulate 386+, but that of course includes compatible real-mode 8086.
Thanks for testing on cycle-accurate 86box, I'll assume the differing speed issues are related to QEMU for the time being, since there's no difference on 86box.

The whole idea of using two programs one that generates an assembly and the second that compiles it looks heavy.

Yep. And don't forget the first program CPP86, which pre-processes the .c file before compiling it. But producing assembly output simplifies the compiler greatly, and seperating each process into separate executables lowers the memory requirements greatly. This very reason is why most C compilers can't be made to run on ELKS - they do everything in one pass - way too large for us.

If you can find a C compiler that produces compatible .o files for an ELKS-runnable linker, let us know! So this is what we have for now, and I'm pretty happy with it, since compilation speed on ELKS is a much smaller issue than having a toolchain that actually runs on ELKS in the first place. We're very tight on RAM and its somewhat amazing we've even got this running. I was actually thinking of writing a "historical" post about all the things that have had to be done right in order to get to this point.

There is an option to pipe the output into AS86, but that likely won't work well since that would require C86 and AS86 to be running at the same time, and we very likely don't have memory for that on any PC running real mode.

There's an unbelievable amount of complexity under the hood to get all this working - and I actually like the ASM output, as it is very easy to see what the C86 compiler is generating (which brings up a whole other set of issues in itself). I'm working on a .o disassembler, since we don't actually have the capability (yet) to disassemble .o files.

I always imagined a compiler where one program generates the .o directly from .c files.

Try to find one that will run on ELKS! That's been the whole problem for two years now until @rafael2k finally succeeded with IMO a great selection of tools.

So one optimization could be to skip the validation in as86 if we know that the generated assembly is always correct.

There's no way to "turn off validation" since the whole function of AS86 is to turn assembly into object files. It has to recognize the large number of instruction names to do it. BTW, that's the reason why the "Init" startup took a while, which you previously pointed out - hundreds of mallocs then added to a hash table, which get increasingly slow to execute as the allocation list became large. The current memory allocator has to a linear search of the entire allocation list in order to find best-fit each time an allocation is requested, which is taking too long. That's another item on the long list of enhancements for this project.

@toncho11
Copy link
Contributor

toncho11 commented Jan 2, 2025

I see. There are tons of work done already and compilation speed would gradually improve in the next months to come.

Thank you!

@ghaerr
Copy link
Owner Author

ghaerr commented Jan 2, 2025

Yes - the current approach is to get something that actually works, then try to optimize it. I'm still very interested in comments regarding speed though, your last testing and comments sparked finding some big previously unfound bugs. And the early comments about NASM proved so bad that we had to ditch NASM in favor of AS86. There might be some very fast assemblers out there, but we're still limited by having to produce Introl-format (AS86) .o files for our LD86 linker. The link phase isn't super fast, but in looking more at it, it now has to read the entire 77k C library during the process of linking, which makes things slower.

That all said, I think for 8088 systems, most of the time is being spent in the actual calculation of C conversion to ASM, and then a lot more seemingly in ASM conversion to .o, with not so much time spent in reading and writing disk files. That's helped because in many cases, depending on your buffer settings, the output files may not even get written to disk, but instead stay inside system buffers. So there's a lot of variables and tuning that ultimately will affect real time throughput. I'm also hoping to find other ways to speed up the assembler, such as differing memory allocation algorithms. I'm still working on that.

@ghaerr
Copy link
Owner Author

ghaerr commented Jan 2, 2025

This means that we can not go less than 26s whatever we compile.

I missed this comment. 26s is very slow. I wonder which portion of this is just reading make, cpp86, c86, as86 and ld86 from disk? I don't have a cycle accurate emulator, but that's almost 250k of executables before doing anything.

I also just found that the AS86 source says having a "very fast" memcmp function is needed for the hash table lookup of all the instructions. Ours is currently written in C rather than ASM, so it can be sped up. I might also be able to (temporarily) disable the 80386 and 8087 instructions from the AS86 opcode tables, that would probably help.

@ghaerr
Copy link
Owner Author

ghaerr commented Jan 3, 2025

@toncho11, I decreased the size of the AS86 initial hash table by removing 386+ and floating point instructions which aren't used in our toolchain, and then wrote a fast, inlined memcmp for AS86 to speed up hash comparisons in ghaerr/8086-toolchain#29.

This seems to be working well, and the AS86 portion of the toolchain should be sped up again by quite a bit. Thanks for your comments! I agree that 2.5 minutes or 26 seconds is way too slow for the toolchain.

Let me know how much it speeds up execution on 86Box. We may need to start looking at individual timings for CPP86, C86 and LD86 as well, to see what else is really slow and see what can be done about it.

@toncho11
Copy link
Contributor

toncho11 commented Jan 3, 2025

@ghaerr 86Box has a Mac OS version https://github.com/86Box/86Box/releases/tag/v4.2.1 both Arm-64 and x86-64.

I usually select "HD Controller" to "PC/XT XTIDE" and then I select an ELKS HDD image in "Hard disks".

This is my config file: 86box.zip

You also need to download https://github.com/86box/roms/releases and save it as "roms" in 86box folder.

@toncho11
Copy link
Contributor

toncho11 commented Jan 3, 2025

Time reduced from 2m20s to 1m56s.

elks-toolchain7.zip

@toncho11
Copy link
Contributor

toncho11 commented Jan 3, 2025

So chess has gone from 2m17 seconds initially to 1m30 seconds. This is a total of 47 seconds improvement. Very good.
With NASM it was more than 30 minutes, maybe even 45 minutes :)

@ghaerr
Copy link
Owner Author

ghaerr commented Jan 3, 2025

Thanks for your testing @toncho11, that has helped move from NASM to AS86, and now further decreasing the assembly time. I'm not sure what else can be done to increase speed on AS86. I have a few other ideas, but I don't think they'll amount to as big a savings as we have got recently.

2m17 seconds initially to 1m30 seconds.

Is this for just AS86, or for the entire make? Maybe we can pinpoint other tools that are running slowly and try to speed them up.

@toncho11
Copy link
Contributor

toncho11 commented Jan 3, 2025

Entire make.

@toncho11
Copy link
Contributor

toncho11 commented Jan 5, 2025

Currently I can "edit" chess.c which already very good. There is enough memory. I tried chess.as which is a very big file and it failed. Maybe if "edit" can be compiled to support bigger files? It is just to avoid future problems where we have a toolchain, but we can not edit the files needed to use it. It is just an idea for a future improvement.

@ghaerr
Copy link
Owner Author

ghaerr commented Jan 6, 2025

Maybe if "edit" can be compiled to support bigger files?

Good idea, but looking at it, it's already built with a max default heap. So recompiling using OWC and far data still won't solve its problem. I'm not sure what to do about it, other than we may need to find an WYSIWYG editor that uses disk for holding edited contents when there's not enough RAM. Our current vi does that, and works with chess.as.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants