Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting from SSA to UnSSA gives redefinitions of SSA variables #1508

Open
user1342234 opened this issue Feb 14, 2025 · 3 comments
Open

Converting from SSA to UnSSA gives redefinitions of SSA variables #1508

user1342234 opened this issue Feb 14, 2025 · 3 comments

Comments

@user1342234
Copy link

Hello! A little background: I am currently using Miasm for deobfuscation of virtualized functions. These functions have many CMOV's and Loop based constant obfuscation. In my example below, I've been iteratively reconstructing the control flow graph one block at a time (not using dis_multiblock) and applying do_simplify_loop on the ircfg. As you know, do_simplify_loop eventually calls ssa_to_unssa which produces a new ircfg after optimizations. For some reason, the new UnSSA'd graph (ircfg) now has R14.0 defined twice! (Note: the variable defined twice will change sometimes)

In this SSA graph. At the first block R14.2=R14 and at the last block, R14.1=(RDI.5 & 0x1) | (R14.0 ^ R8) . Here, everything is correct.
Image

After the UnSSA pass, we have our ircfg:
In the first block, R14.0=R14
But in the last block, R14.0=(RDI.4 & 0x1) | (R14.0 ^ R8.2)
So it's defined twice!
Image

Since they're defined twice, it causes an assert in outofssa.py

Here is my code for producing this issue along with the ssa file (incorrect_ssa.dat) that, when put through ssa_to_unssa, produces the incorrect redefinitions of SSA variables.

from miasm.analysis.simplifier import IRCFGSimplifierSSA
from miasm.core.locationdb import LocationDB
from miasm.analysis.machine import Machine
import dill
from graphviz import Source

if __name__ == '__main__':
    machine = Machine('x86_64')
    loc_db = LocationDB()
    lifter = machine.lifter_model_call(loc_db)
    simplifier = IRCFGSimplifierSSA(lifter)

    fl = open("incorrect_ssa.dat", "rb")
    ssa = dill.load(fl)
    loc_key_0 = list(ssa.graph.blocks.keys())[0]
    ircfg = simplifier.ssa_to_unssa(ssa, loc_key_0)


    print("Done")

incorrect_ssa -> https://filebin.net/12aiter0nj97sgbh

@user1342234
Copy link
Author

user1342234 commented Feb 14, 2025

Just a sidenote, pickle was unable to dump the ircfg because of the nbsi in cpu.py. _pickle.PicklingError: Can't pickle <class 'miasm.core.cpu.nbsi'>: attribute lookup nbsi on miasm.core.cpu failed
Also, Dill doesn't save the ircfg correctly but still allows me to demonstrate the bug so proceed with caution when performing analysis on the incorrect_ssa.dat

@serpilliere
Copy link
Contributor

Hi @user1342234
It normal that the 'unssa' can produce multiple write to the same register, even if it's a R14.0.
When you do un ssa algorithm, any register is selected selected to represent the 'representing' register of an equivalent class, and that can be a R14.0.

But if you recall ssa algorithm on such a graph, there may be interferences between the new ssa and the old unssa'ed register.

A way to 'survive' this is to keep the metadata of ssa'ed register (in the ssa class) and pass this to the future ssa pass.

@user1342234
Copy link
Author

Hi @serpilliere
Thanks for the response and clarification!
I just realized that I can use the ssa.graph as the ircfg and not continuously call ssa_to_unssa. Is this a better way for iteratively reconstructing the CFG of the function? If so, what would be ssa_to_unssa's main purpose?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants