Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIPS Extending supported instructions #393

Open
MagnificentS opened this issue Sep 12, 2018 · 13 comments
Open

MIPS Extending supported instructions #393

MagnificentS opened this issue Sep 12, 2018 · 13 comments

Comments

@MagnificentS
Copy link

How do I extend the current instruction set in mips?

@PeterMatula
Copy link
Collaborator

Take a look at capstone2llvmir library, MIPS module in this case. To support more instructions, their semantics needs to be modeled in LLVM IR = write a routine that takes a Capstone instruction and generates LLVM IR sequence with the same semantics. Also see this discussion. We do not want to have semantics for all the possible instructions, only those that can be represented in C reasonably well (simple and clear code). At the moment, all unhandled instructions are ignored, but we are already working on implementation that will generate pseudo asm calls for them.

What kind of instructions are you interested in? Would pseudo asm calls be enough for you, or do you want complete semantics?

@MagnificentS
Copy link
Author

I was thinking of PS2 Mips. I could have sworn I saw a paper by you guys using ps2dev ( A homebrew kit) build and decompile custom elf

@s3rvac
Copy link
Member

s3rvac commented Oct 6, 2018

You are right. For creating MIPS ELF binaries via GCC, we were using Minimalist PSP homebrew SDK. However, we were interested only in regular MIPS instructions and not in PS2-specific extensions.

@MagnificentS
Copy link
Author

My memory is a bit muddled lol. Thanks

@PeterMatula
Copy link
Collaborator

Here is the relevant map of instructions that we translate, and those we don't. I'm sure there are many that are simple enough to reasonably represent in LLVM IR and C. As I wrote above, all the others should not be ignored but pseudo asm intrinsic calls should be generated. If you want, you can add semantics for more instructions and send a pull request. But as I also wrote, we are not interested in semantics for complex instructions, so we will not accept PR that would go into implementing such instructions. We can discuss specific instructions (families) that could be added here. And if you are up to it, you can add them, or I can look into it when I will have the time.

@MagnificentS
Copy link
Author

Wow thank you so much. I was going to start in a few weeks but this might help me get started sooner

@PeterMatula
Copy link
Collaborator

I just merged branch solving #115 and written some info about the translation process on our wiki: https://github.com/avast-tl/retdec/wiki/Capstone2LlvmIr.
The most important change is that unhandled assembly instructions are not ignored anymore. Calls to assembly pseudo functions are auto-generated based on info provided by Capstone. This does not have to be 100% precise, but it is better than nothing. If you have a sample that contain such instructions, try to decompile it with the current master and compare it with the old output. It should be better now, but if you encounter some problems, please report it. I would be very interested how it works on real binaries other than x86 - I did not test it on MIPS much.

@nihilus
Copy link

nihilus commented Dec 17, 2018

I see that you handle unaligned stores (swr/swl) but not unaligned loads (ldl/ldr). It is pretty common in code and would definitely break the decompilation. Any thought on fixing it with an instrincis or something like that?

@nihilus
Copy link

nihilus commented Dec 20, 2018

Feel free to check this as it is based on Capstone as well: https://github.com/nihilus/snowman/tree/master/src/nc/arch/mips

@nihilus
Copy link

nihilus commented Dec 20, 2018

For binaries to test against use https://github.com/nihilus/snowman-tests

@PeterMatula
Copy link
Collaborator

Well, even swr/swl are translated using translatePseudoAsmFncOp0Op1() at the moment. Which is generating a generic pseudo call, which is not ideal. Since loads and stores are pretty important, I will add these instructions (swr/swl, ldl/ldr) to my todo list and look into it - try to write proper translation routines for them if possible.

Thanks for the links, I will look at them.

@PeterMatula
Copy link
Collaborator

I went over that Snowman MIPS translation and added semantics for few more MIPS instructions that we were missing. Now we should have pretty much everything that is in Snowman.

However, RetDec translates unaligned load/store instructions (MIPS_INS_LWL, MIPS_INS_LWR, MIPS_INS_SWL, MIPS_INS_SWR, MIPS_INS_LDL, MIPS_INS_LDR, MIPS_INS_SDL, MIPS_INS_SDR) using pseudo function calls (intrinsic-like functions) at the moment. Snowman has full semantics for these. I looked into it and:

  • It should be possible to translate these reasonably well using load/store instructions and and/or operations.
  • I'm not sure how beneficial this would be, RetDec would gain knowledge that load/store operation is going on, but user would see potentially confusing bit operations.
  • In any case, this is not a priority at the moment, and since it would be quite time consuming to model it correctly I decided pseudo functions are good enough for now. If someone comes across a sample where this is a problem, create a dedicated issue.

Contributions extending supported instructions are welcomed, but keep in mind that we don't want to model instructions that are too complicated and their LLVM IR representation would be too complex.

@nihilus
Copy link

nihilus commented Jan 7, 2019

It would be good to translate the pairs to a pseudo-function like "ulw" / "usw" for clarity and then implement that to start with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants