Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out the exact set of compiler flags used to produce Diablo.exe (1.09b) #111

Open
mewmew opened this issue Jul 1, 2018 · 33 comments

Comments

@mewmew
Copy link
Contributor

mewmew commented Jul 1, 2018

This issue tracks the effort to figure out the complier flags used to produce Diablo.exe (version 1.09b). It may be considered a subtask of #11, as given information about compiler flags, we can ensure that the same input source code produce the same output object code.

From what I can tell, it seems like /O1 is used rather than /O2. This is based on the padding between functions.

/O2 produces (given that #110 has been merged):

  401000:	e9 0b 00 00 00       	jmp    0x401010
  401005:	90                   	nop
  401006:	90                   	nop
  401007:	90                   	nop
  401008:	90                   	nop
  401009:	90                   	nop
  40100a:	90                   	nop
  40100b:	90                   	nop
  40100c:	90                   	nop
  40100d:	90                   	nop
  40100e:	90                   	nop
  40100f:	90                   	nop
  401010:	a1 28 b1 40 00       	mov    eax,ds:0x40b128
  401015:	a3 84 1a 41 00       	mov    ds:0x411a84,eax
  40101a:	c3                   	ret    
  40101b:	90                   	nop
  40101c:	90                   	nop
  40101d:	90                   	nop
  40101e:	90                   	nop
  40101f:	90                   	nop
  401020:	8b 44 24 04          	mov    eax,DWORD PTR [esp+0x4]
  401024:	85 c0                	test   eax,eax
  401026:	74 0f                	je     0x401037
  401028:	6a 00                	push   0x0
  40102a:	6a ff                	push   0xffffffff
  40102c:	68 40 d0 40 00       	push   0x40d040
  401031:	50                   	push   eax
  401032:	e8 69 0c 00 00       	call   0x401ca0
  401037:	c3                   	ret    

While /O1 produces:

  401000:	e9 00 00 00 00       	jmp    0x401005
  401005:	a1 28 c1 40 00       	mov    eax,ds:0x40c128
  40100a:	a3 84 2a 41 00       	mov    ds:0x412a84,eax
  40100f:	c3                   	ret    
  401010:	83 7c 24 04 00       	cmp    DWORD PTR [esp+0x4],0x0
  401015:	74 12                	je     0x401029
  401017:	6a 00                	push   0x0
  401019:	6a ff                	push   0xffffffff
  40101b:	68 40 e0 40 00       	push   0x40e040
  401020:	ff 74 24 10          	push   DWORD PTR [esp+0x10]
  401024:	e8 8f 09 00 00       	call   0x4019b8
  401029:	c3                   	ret    

This is to be compared to the original version of Diablo.exe (1.09b):

  401000:	e9 00 00 00 00       	jmp    0x401005
  401005:	a1 00 94 47 00       	mov    eax,ds:0x479400
  40100a:	a3 30 79 4b 00       	mov    ds:0x4b7930,eax
  40100f:	c3                   	ret    
  401010:	83 7c 24 04 00       	cmp    DWORD PTR [esp+0x4],0x0
  401015:	74 12                	je     0x401029
  401017:	6a 00                	push   0x0
  401019:	6a ff                	push   0xffffffff
  40101b:	68 b0 30 48 00       	push   0x4830b0
  401020:	ff 74 24 10          	push   DWORD PTR [esp+0x10]
  401024:	e8 8d 87 06 00       	call   0x4697b6
  401029:	c3                   	ret    
@heinermann
Copy link
Contributor

It's worth noting that individual compilation units can have different optimization settings (just to keep in mind).

Another comparison with Diablo.exe 1.09b:

I actually want to highlight in particular the line and byte ptr [ecx], 0. This is actually an assignment, but it was optimized into an and, which is only produced under O1.

Under O1:

  • *ptr = 0 becomes and byte ptr [ecx], 0, even though assignment was used
  • *ptr &= 0 becomes and byte ptr [ecx], 0

Under O2:

  • *ptr = 0 becomes mov byte ptr [eax], 0
  • *ptr &= 0 becomes mov byte ptr [eax], 0, even though AND was used

@ghost
Copy link

ghost commented Jul 1, 2018

The project now uses /O1 for both debug and retail builds. I figured that's what was used, since the output size is much closer (760 KB). The main reason I never changed it was in the early days optimizations broke the code entirely. It took a few months of fixing up before it was stable enough to change it.

@heinermann
Copy link
Contributor

heinermann commented Jul 1, 2018

You may also be able to find out other compiler flags by looking at the binary header (MZ/PE) and seeing what differs. Imports too probably.

@ghost
Copy link

ghost commented Jul 1, 2018

Good call, I noticed the original binary has about 2x as many import descriptors. Which is weird, since it there's only a few differences in the KERNEL32 imports. We'll have to look at that sometime. Also, the release build seems to scramble the code a little more, where as the debug build is closer to the original. Setting aside debug sections of course.

It could also be a good idea to "trace" the developers steps. We know the game was originally written in VC 4.20, but then upgraded to 5.10. So perhaps generating the project using 4.20's default settings could get us closer.

@mewmew
Copy link
Contributor Author

mewmew commented Jul 2, 2018

So perhaps generating the project using 4.20's default settings could get us closer

Sounds reasonable. Report back the compiler flags used :) I still don't have a Windows box to play with.

@seritools
Copy link
Contributor

seritools commented Jul 15, 2018

Warning: Big writeup ahead! :)

As @mewmew wrote up over here, more often than not, the compiler we currently use (5.0 SP3) generated slightly different instructions than the original 1.09b binary.

The first thing I checked is the linker versions. For that I grabbed every version ever, basically, and confirmed that the linker used must be from 5.0 SP3:

SP0: 5.00.7022
SP1: 5.02.7132
SP2: 5.02.7132
SP3: 5.10.7303

Diablo 1.09b PE Header: 5.10

While browsing for compiler option info, I stumbled over the #pragma optimize directive, allowing to toggle optimization settings on a per-function basis. Playing around with disabling size-based code and other optimizations, the generated instructions were even further away from the original, so I knew that was probably not what was happening.

For some reason, the compiler used by Blizzard was just more "clever" than the one we use...

After going off-track by checking the #pragma optimize directive, I went back to linker stuff. Just to make sure, I checked VS6's link.exe (starting from SP6, for some reason).

The output was:

C:\Program Files\Microsoft Visual Studio\VC98\Bin>link /?
Microsoft (R) Incremental Linker Version 6.00.8447
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.

usage: LINK [options] [files] [@commandfile]

   options:

      /ALIGN:#
      /BASE:{address|@filename,key}
[...]
      /LINK50COMPAT

Looking at that options list it instantly clicked. See that /LINK50COMPAT? Googling it revealed that link.exe as well as lib.exe both have that flag in version 6. It seems that with VC++ 6, they changed the structure of lib files, so the lib files generated by lib.exe 5.x and lower are not compatible with link.exe 6.x and vice versa. Microsoft implemented that switch so that older library files can be used with VC++ 6, and code compiled by VC++ 6 can generate lib files compatible with the linker of VC++ 5.

So I just fired up VS6SP6, upgraded the project file, compiled, checked in IDA, and voilà:
image
(left is Diablo.exe 1.09b, right is devilution compiled/linked with VC++ 6 SP6)

The instructions are identical (at least for the function from PR #135)!

So, what I guess happened:

  • With the time, Blizzard upgraded to VS/VC++ 6
  • Hovever, something prevented them to use the VC++ 6 linker.
  • Thus, they compiled newer versions of Diablo with VC++ 6, generated the .lib files from the .obj files via VC++ 6's lib.exe with the /LINK50COMPAT, generating library files compatible with VC++5's linker
  • Then, they linked those with the linker of 5.0SP3, leading to the PE header containing 5.10 as linker version, but optimizations only possible with a V6 compiler!
  • It seems like they literally only used /O1 :D

The 1.09b .exe has a creation date of 2001-05-12. This means that the newest VC++6 version they could have used it Service Pack 5 (released April 2001). Service Pack 4 (released January 2001) would be a good contender as well.

Anyways, that was my journey today. :)

What's left to do:

  • Find out which service pack was used (only needed if we still find significant changes where there shouldn't be any)
  • Build a VC++ 6 workspace that does a custom build generating lib files compatible with link.exe v5.10, and call that linker to generate the output file.

In retrospective, all of this makes sense seeing that Storm.dll and the .SNP files are even linked with v6 of the linker! 🙂

@StephenCWills
Copy link
Member

That's way cool! Excellent work.

@fearedbliss
Copy link
Contributor

@seritools Wow that's a great read and find, the asm is exactly the same (excluding different memory locations)!

@ghost
Copy link

ghost commented Jul 15, 2018

Beautiful, now we just have to figure out how they automated everything 😉 It seems tedious to have used two different toolchains, so maybe there's another option we're missing or they modded their VC6 installation.

Awhile back I documented the versions they used for D2:

Version          Released   Patch
MSVC++ 6.0 SP 3  5/21/1999  1.00-1.03
MSVC++ 6.0 SP 4  6/15/2000  1.04-1.05
MSVC++ 6.0 SP 5  2/26/2001  1.06-1.10
MSVC++ 6.0 SP 6  3/29/2004  N/A
VS .NET 2003     ?/??/2003  1.11+

The linker for both SP 4 and 5 generates code exactly the same, I didn't test SP 6 since it was made long after those patches. We can assume that the D2 devs used their current setup to compile D1 at the time, so 1.09b would've been compiled with SP 5 and 1.08 SP 3.

@mewmew
Copy link
Contributor Author

mewmew commented Jul 15, 2018

@seritools Wow Dennis, that's great! Hats off. Very happy that you managed to pin this down.

@ghost
Copy link

ghost commented Jul 15, 2018

Do we get similar sizes with the optimization? I am curious what a bindiff would result in. It might be useless to know, but I think that it would tell us how close we are.

@seritools
Copy link
Contributor

seritools@6f5f573
(please excuse my crappy makefile skills)

Managed to link the exe with 5.10 while compiling with 6 :) However in this constellation, PDBs cannot be generated, since the linker exits mentioning that the pdb format is not supported. A release build is possible though, and reports the correct linker version.

For working at getting the code accurate I'd recommend just linking with v6 for now (as long as we don't see differences too big to tolerate). Generating release builds with full PDB info makes it a lot easier imho.

What do you think?

@seritools
Copy link
Contributor

seritools commented Jul 16, 2018

A few more observations:

  • SP5 is definitely ruled out: The release binary is way too big, and one jump in CheckSpell (.text:0045759A jz short loc_4575A0) is generated as a jnz, with a slightly different jump pattern, and a few other slight differences in the function beginning
  • SP3: The release binary (linked with VC6) is 744KB, only 4KB bigger than the target, and the jump is generated just like the original one. However, the function beginning is still different:
    image

@galaxyhaxz has noted that the VC6 linker starts the program code at 0x1000, while the VC5 linker starts at 0x400, which is a difference of 3072 bytes, so the resulting binary could be down to 741KB already!

I'm going to check SP2, SP1, SP0 next and try linking with VC5 instead to see if I can get that simple function to generate correctly and maybe generate a perfectly-sized binary :O

EDIT:

  • SP2: File size is way bigger at 756KB, ruling that one out. (It is not generating the correct function beginning either.)
  • SP0: File size is again down at 744KB. (It is not generating the correct function beginning either.)
  • SP1: Same size as SP0.
  • SP4: File size is again bigger at 756KB.

Checked the exact byte counts:

1.09: 757760 bytes

SP0: 761856 bytes (diff. 4096 bytes, at 3072 of which are because I currently link with VC6)
SP1: 761856 bytes
SP3: 761856 bytes

@seritools
Copy link
Contributor

seritools commented Jul 16, 2018

Alright, SP2, SP4, SP5 are excluded now, since all three of those generate binaries that are too big.

For some reason, with SP3, the filesize went down again. I reinstalled SP3 just to make sure, but it stayed the same.

So for now, VC6 SP0, SP1 or SP3 in conjunction with the linker from VC5 SP3 is the closest we can get.

Copied over the SP3 installation to my buildtools folder, then built it while linking with VC5SP3... well, I accidentally beat Blizzard 😆
image

So, what I gathered:

  • Programs compiled usually increase in 1024 byte chunks
  • VC6 SP3 + Linker from VC5SP3 seem to save just a few bytes too much, resulting in stripping a whole KB from the executable

@ghost
Copy link

ghost commented Jul 17, 2018

Awesome work!

I think I might try this, because I am not thinking the pdb info to be too useful at this point considering we literally can cross reference the binary. 1024 bytes are still a lot though, but it could be worse.

I am curious if we can get closer with pdb info .

Also @seritools for that binary that is really close, can you upload it ? I would like to do a bindiff on it.

@seritools
Copy link
Contributor

PDB info is nice while comparing functions, since you can just jump to the function in question in the new binary, without remembering adresses, and getting all the type info that we already have for the original binary as well. (Generating with pdb info results in a much bigger binary btw)

Since the binaries are rounded up to the next 1024bytes, maybe the original binary was just 739k + a few bytes? Every instruction we missed could be the one dropping us down to 439KB. :)

Like there was/is (already reported to @galaxyhaxz) a missing function NewCursor in devilution, which did nothing but call to SetCursor in the original binary. A couple of bytes "saved". Also every differing optimization contributes to that.

I'm gonna be at my pc tomorrow and will upload the binary then.

@seritools
Copy link
Contributor

And here is the exe :)
devilution.zip

@ghost
Copy link

ghost commented Jul 18, 2018

Comparing the size with @seritools binary:
size

                            .text    .rdata    .data    .rsrc        Total
Size change in Devilution:  +0x2400  -0x7600   +0x4E00  Identical    -0x400

So the Devilution binary actually has 0x2400 (9kb) more in the code section than the vanilla. This is surely due to many decompiled functions having excessive variables. Interesting though is that most data is stored in .data instead of .rdata. This is likely because many of these variables were declared as static. Devilution has a total of 0x2800 bytes less in the data section. I noticed the compiled binary has a much smaller CRT section than compiled plainly with VC5, which could attribute to that. It could also be that the compiler is discarding certain global variables or data that isn't used.

@seritools
Copy link
Contributor

Yeah, I only got the size way down when using the VC6 libs, the file was quite a bit bigger with the VC5 ones. (Tbh it was surpriting to me that the VC6 libs were working at all, since it supposedly has a different lib format, but maybe that's just not the case for the standard libs)

@nomdenom
Copy link
Contributor

.rdata is bigger in the original binary because the decompiled code is missing const annotations on the various data tables (arrays). const gets placed in a separate section.

@ghost
Copy link

ghost commented Aug 19, 2018

See issue #13. Many functions and data are declared as static but are missing that. It seems you're right that const was probably used as well.

@mewmew
Copy link
Contributor Author

mewmew commented Aug 20, 2018

At least information about static is present in the debug info from the SYM files; e.g. https://github.com/sanctuary/psx/blob/master/easy-as-pie/globals/global_0.cpp#L4522

@seritools
Copy link
Contributor

seritools commented Aug 28, 2018

https://github.com/diasurgical/devil-nightly/issues/15

We're pretty much done 🎉

In the end, Blizzard made sure to keep their software up to date it seems 😄

@seritools
Copy link
Contributor

We can probably be 100% sure now :) Found this by change while browsing reddit:

http://bytepointer.com/articles/the_microsoft_rich_header.htm

image

@AJenbo
Copy link
Member

AJenbo commented Oct 1, 2018

Woha that is exiting :D

But does that also say that the linker was VC6 SP3? (probably wasn't updated in the later versions)

@mewmew
Copy link
Contributor Author

mewmew commented Oct 1, 2018

We can probably be 100% sure now :) Found this by change while browsing reddit:

That is such a great find! Wow, what a buried treasure!!

@nomdenom
Copy link
Contributor

nomdenom commented Oct 1, 2018

Funny, I actually knew about the Rich header but had forgotten about it (my first written reference to it is from 2007)... well, good ol' brute-force search and a bit of luck still got us there 💪

@ghost
Copy link

ghost commented Oct 2, 2018

@AJenbo Linker was the same from SP3-SP6. However SP4/SP5 can generate different code if the Processor Pack is installed.

@seritools Once again, a great discovery! Oh, it's so great to be working with code from 2001. Could you imagine if Diablo 1 was released now with all those micro-optimizations and stripping the Rich header+file names? :P

After comparing the Rich header with 3 different tools, below is a list documenting each value in the header. Interesting is that "Utc12_2_C" is a C compiler and not C++. So it's possible all the files were treated as C even with .CPP extensions. Also, there doesn't appear to be any asm/MASM references, which I guess implies render.cpp was indeed a .CPP file with inlined calls.

Id  Build  Count  Name       Description
 0      0    155  Unknown    [---] Number of imported functions (old)
 1      0    229  Import0    [---] Number of imported functions
 6   1668      1  Cvtres500  [RES] VS97 (5.0) SP3 cvtres 5.00.1668
 2   7303     29  Linker510  [IMP] VS97 (5.0) SP3 link 5.10.7303
 3   7303      1  Cvtomf510        VS97 (5.0) SP3 cvtomf 5.10.7303
 4   8447      2  Linker600  [LNK] VC++ 6.0 SP3,SP4,SP5,SP6 link 6.00.8447
48   9044     72  Utc12_2_C  [---] VC++ 6.0 SP5 Processor Pack
19   9049     12  Linker512        Microsoft LINK 5.12.9049

@seritools
Copy link
Contributor

So it's possible all the files were treated as C even with .CPP extensions.

That was one of my very first assumptions about the project haha

@squidcc

This comment has been minimized.

@mewmew

This comment has been minimized.

@AJenbo
Copy link
Member

AJenbo commented Nov 19, 2018

Small update, @mewmew and I spend some time this weekend testing things out and by adding the /TC flag to the compiler we where able to get the RICH header to reflect Utc12_2_C instead of Utc12_2_CPP.

As /Tc did not appear to change thing we're fairly sure that the whole project needs to be written in C instead of C++.

I have started gradually rewriting the parts the aren't C compliant, there are a few thing that I don't know exactly how to deal with, but I hope to get it down to just a few issues.
First PR is here #465 (converting player.cpp), I expect subsequent PR's to be much smaller since a lot of the general stuff was handled in this one.

@AJenbo
Copy link
Member

AJenbo commented Dec 30, 2018

As a quick update @mewmew has figured out most of the pieces needed to produce a correct rich header. Also /Tc does work and we now have 54% of the projects files compiling as C :)

For details on what issues are blocking the remaning code from being compiled as C, can be seen here: #528

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants