Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce byte-identical executable using the same compiler #11

Closed
mewmew opened this issue Jun 11, 2018 · 18 comments
Closed

Produce byte-identical executable using the same compiler #11

mewmew opened this issue Jun 11, 2018 · 18 comments

Comments

@mewmew
Copy link
Contributor

mewmew commented Jun 11, 2018

This issue tracks a very ambitions goal of Devilution, the production of a byte-identical executable to the 1.09b original. To achieve this goal, the exact same compiler has to be used as was used to produce the original executable.

For diablo.exe version 1.09b, this corresponds to Visual C++ 5.10, and for the debug release diablo.exe version 1.00 (1996-12-21) this corresponds to Visual C++ 4.20 compiled in debug mode; as based on PEiD output.

PDiD 109b

PEiD 100 dbg

Edit: reference discussion: https://github.com/galaxyhaxz/devilution/pull/10#issuecomment-396211436

@ghost
Copy link

ghost commented Jun 11, 2018

Oh that's great! I totally forgot about PEiD. So now we know for sure it was 5.10. Thankfully VC 5/6 are very similar and still run on modern systems. I went ahead and made a chart for all versions so we have an easy reference. I'll download 5.10 this next week and configure the project accordingly.
SDK Versions

@fearedbliss
Copy link
Contributor

fearedbliss commented Jun 11, 2018 via email

@fearedbliss
Copy link
Contributor

fearedbliss commented Jun 11, 2018 via email

@mewmew
Copy link
Contributor Author

mewmew commented Jun 11, 2018

I don't know much about this but from efforts in the Linux community (Debian and maybe others) to produce byte identical executables, you will need to remove any type of code that may cause the compiler to generate output that is variable. Things like timestamps used in the compiler and other things will need to be taken into account

Indeed. Timestamps would have to be removed and other parts that are variable from build to build as well. Good thing is those are easily identified.

So one approach would be to simply extract the contents of the .text section and dump that, then compare between the binaries. Of course, initially the output will be very different, and thus hashes can only be used as a final measurement. But before then we can use something like Hamming distance to get a score for how many edits have been made.

We can also keep track of which offsets correspond to which function, and in this way check off one function at the time.

Note, relative offsets to addresses will be variable if the output binary rearranges where code and data is stored, and also depending on the size of these parts.

It most definitely won't be an easy challenge. But, at least to me, that's what makes it fun!

Also, this is not goal number one. That would be to fix crashes, improve stability, fix builds across the main platforms (Windows, Linux, Mac), etc. Rather, this can be thought of as an aspirational goal that Devilution may one day achieve, or perhaps more likely not. But I for one would definitely want to be part of making it happen!

Cheers,
/u

@ghost
Copy link

ghost commented Jun 20, 2018

I personally don't a byte for byte copy is possible.

You would literally have the exact code any extra variables or optimizations or anything would completely throw it off.

Mewmew , I respect your energy )

@janisozaur
Copy link

There are already projects like https://github.com/pret/pokeruby or https://github.com/MimicYou/pokeredbeta that build hash-perfect recreations of their originals.

@mewmew
Copy link
Contributor Author

mewmew commented Jun 20, 2018

There are already projects like https://github.com/pret/pokeruby or https://github.com/MimicYou/pokeredbeta that build hash-perfect recreations of their originals.

Wow, that is really cool! Thanks for pointing out these projects.

@gp-alex
Copy link

gp-alex commented Jun 21, 2018

Hey, Diablo is already decompiled and refactored, the project is called The Hell and source code was hosted on Assembla at least in 2013

@mewmew
Copy link
Contributor Author

mewmew commented Jun 21, 2018

Hey, Diablo is already decompiled and refactored, the project is called The Hell and source code was hosted on Assembla at least in 2013

@gp-alex That's great! Do you know where this source code is hosted?

@ghost
Copy link

ghost commented Jun 21, 2018

You are correct, Hellfire was decompiled as early as 2006 IIRC, and The Hell released their sources a few years ago at the Khandurus network. However, it seems to have completely disappeared from the internet, same with The Dark/Khandurus.

The Hell 2 creator still has a copy. He reached out to me a few days ago and said he might pop in a make a contribution or two.

@ghost
Copy link

ghost commented Jun 21, 2018

Just make sure this stays as original as possible. I have seen The Hell mod and I thought it was grindingly unbalanced and a weird distortion of what Diablo is.

@mewmew
Copy link
Contributor Author

mewmew commented Jun 21, 2018

Just make sure this stays as original as possible. I have seen The Hell mod and I thought it was grindingly unbalanced and a weird distortion of what Diablo is.

No worries, this is tracked by #11.

@AJenbo
Copy link
Member

AJenbo commented Jun 22, 2018

What compiler was used to generate the exe in the 0.2 release? I patch between it and Diablo.exe seams to indicate a 60% correlation.

@ghost
Copy link

ghost commented Jun 22, 2018

I used VC++ 5.10 for all release builds. The GNU makefiles currently don't properly add the Icon and resource files.

@mewmew
Copy link
Contributor Author

mewmew commented Jun 22, 2018

The GNU makefiles currently don't properly add the Icon and resource files.

Added an issue to track this #48.

@seritools
Copy link
Contributor

See this comment for a major update!

@ChaosMarc
Copy link
Contributor

I think this issue can be closed. It's topic should be exactly the same as #111 which is newer.

PS: If I should stop looking for these housekeeping tasks, please tell me ;)

@mewmew
Copy link
Contributor Author

mewmew commented Nov 27, 2018

PS: If I should stop looking for these housekeeping tasks, please tell me ;)

It's great you are looking for housekeeping tasks :) Please continue.

As for this issue, it is similar but slightly different from #111. For instance, there are other issues to consider even when we have exactly the same compiler, such as time stamps being included in the binary. This issue mentions some of those aspects. However, I'd be fine with closing this issue as the primary work now is to get bin perfect assembly, which is tracked by #111 and the bin exact milestone. Later on when we want to figure out specific issues, such as how to handle time stamps, we can open new dedicated issues for those.

Closing for now. We can re-open at a later time, should we feel like.

@mewmew mewmew closed this as completed Nov 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants