Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some functions emitted as UNREACHABLE even when reachable/keep-unreachable-funcs is used #169

Closed
csnover opened this issue Feb 7, 2018 · 5 comments

Comments

@csnover
Copy link

csnover commented Feb 7, 2018

I’ve attempted a decompilation of an older Win32 DLL which was compiled using Visual C++ 4 and detected appropriately by RetDec. 553 of 4015 detected functions are being decompiled as UNREACHABLE. For example, this function (simple C++ object constructor):

; function: function_100532ef at 0x100532ef -- 0x100533cb
0x100532ef:   55                     	push ebp
0x100532f0:   8b ec                  	mov ebp, esp
0x100532f2:   6a ff                  	push -1
0x100532f4:   68 b3 33 05 10         	push 0x100533b3
0x100532f9:   64 a1 00 00 00 00      	mov eax, dword ptr fs:[0]
0x100532ff:   50                     	push eax
0x10053300:   64 89 25 00 00 00 00   	mov dword ptr fs:[0], esp
0x10053307:   83 ec 04               	sub esp, 4
0x1005330a:   53                     	push ebx
0x1005330b:   56                     	push esi
0x1005330c:   57                     	push edi
0x1005330d:   89 4d f0               	mov dword ptr [ebp - 0x10], ecx
0x10053310:   8b 4d f0               	mov ecx, dword ptr [ebp - 0x10]
0x10053313:   e8 80 ef 01 00         	call 0x10072298
0x10053318:   c7 45 fc 00 00 00 00   	mov dword ptr [ebp - 4], 0
0x1005331f:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x10053322:   c7 00 e8 16 08 10      	mov dword ptr [eax], 0x100816e8
0x10053328:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x1005332b:   c7 40 30 00 00 00 00   	mov dword ptr [eax + 0x30], 0
0x10053332:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x10053335:   c7 40 20 00 00 00 00   	mov dword ptr [eax + 0x20], 0
0x1005333c:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x1005333f:   c7 40 34 00 00 00 00   	mov dword ptr [eax + 0x34], 0
0x10053346:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x10053349:   c7 40 24 00 00 00 00   	mov dword ptr [eax + 0x24], 0
0x10053350:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x10053353:   c7 40 28 00 00 00 00   	mov dword ptr [eax + 0x28], 0
0x1005335a:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x1005335d:   c7 40 2c 00 00 00 00   	mov dword ptr [eax + 0x2c], 0
0x10053364:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x10053367:   c7 40 38 00 00 00 00   	mov dword ptr [eax + 0x38], 0
0x1005336e:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x10053371:   c7 40 3c 00 00 00 00   	mov dword ptr [eax + 0x3c], 0
0x10053378:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x1005337b:   c7 40 40 00 00 00 00   	mov dword ptr [eax + 0x40], 0
0x10053382:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x10053385:   c7 40 44 00 00 00 00   	mov dword ptr [eax + 0x44], 0
0x1005338c:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x1005338f:   c7 40 48 00 00 00 00   	mov dword ptr [eax + 0x48], 0
0x10053396:   e9 00 00 00 00         	jmp 0x1005339b
0x1005339b:   c7 45 fc ff ff ff ff   	mov dword ptr [ebp - 4], 0xffffffff
0x100533a2:   8b 45 f0               	mov eax, dword ptr [ebp - 0x10]
0x100533a5:   e9 13 00 00 00         	jmp 0x100533bd <function_100532ef+0xce>
0x100533aa:   8b 4d f0               	mov ecx, dword ptr [ebp - 0x10]
0x100533ad:   e8 66 ea 01 00         	call 0x10071e18
0x100533b2:   c3                     	ret 
0x100533b3:   b8 80 65 08 10         	mov eax, 0x10086580
0x100533b8:   e9 a3 f1 01 00         	jmp 0x10072560 <unknown_10072560>
0x100533bd:   8b 4d f4               	mov ecx, dword ptr [ebp - 0xc]
0x100533c0:   64 89 0d 00 00 00 00   	mov dword ptr fs:[0], ecx
0x100533c7:   5f                     	pop edi
0x100533c8:   5e                     	pop esi
0x100533c9:   5b                     	pop ebx
0x100533ca:   c9                     	leave 
0x100533cb:   c3                     	ret 

Currently decompiles to:

int32_t function_100532ef(void) {
    // entry
    abort();
    // UNREACHABLE
}

This seems like it could very well be the same bug as #41, but I am not sure since RAII does not appear to be involved in all of these failures, so decided to open a separate issue.

I ran RetDec with the command retdec-decompiler.sh -k file.dll. I also tried with the extra --backend-no-opts flag to rule out a backend optimiser causing the problem, with the same result.

Unfortunately I don’t have the rights to upload the original binary here, but I am willing to perform additional testing. Please let me know if there is any more information that I can provide, or if you can suggest a specific area of the RetDec code which I might look into to try to find or fix the root cause myself.

Thanks for this great software!

Git revision 1db0cb6
macOS 10.12.6, Apple Clang 900.0.39.2

@PeterMatula
Copy link
Collaborator

PeterMatula commented Feb 8, 2018

This is very likely a case when LLVM works agains us (see this comment). Our flow: binary -> decoding -> LLVM IR -> our optimizations -> LLVM optimizations -> llvmir2hll -> C. What likely happened here is that LLVM IR that came out of our optimization passes have some dubious constructions in it (e.g. something that causes undefined behaviour). Because LLVM passes are designed primarily for compilation, they often discard such code. We need to either hack LLVM to not do it, prevent our passes to generate such code (not always possible with reversed assembly), or write some kind of sanitizer. I like the sanitizer solution, because it would allow us to solve this on our end, without hacking LLVM (we could use vanilla LLVM).

This is a major problem and I added it to our roadmap. We do not really need your binary, I think this is unfortunately happening quite often.

@palant
Copy link
Contributor

palant commented Sep 10, 2018

I did some digging, documented in #41 - these two issues are certainly the same. I could trace this back to a TODO comment in RetDec. Maybe I'll manage to figure out the data structures involved to fix this properly.

@palant
Copy link
Contributor

palant commented Sep 10, 2018

Pull request: #388

PeterMatula pushed a commit that referenced this issue Sep 20, 2018
* Fix #41, #169 - capstone2llvmir: Correctly import pointers with segment override

* Fixed typo, converting SS segment to the correct address space

* Handle both cases where segment override is relevant

* Made sure that bin2llvmir knows how to convert pointers with different address spaces

* Made sure that llvmir2hll can handle address space casts

* Made sure all LLVMIR2BIR converters support address space casts for all the relevant scenarios

* Added llvmir2hll test for address space cast handling
@PeterMatula
Copy link
Collaborator

Comment related to this: #391 (comment).

@PeterMatula
Copy link
Collaborator

Code from #391 was merged to master. This should hopefully fix the issue here. If you can/want to share the sample you tested on, I can add it to our regression tests to make sure this is (and stays) fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants