-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loader: ELF non-zero based PIC binaries support #67
Loader: ELF non-zero based PIC binaries support #67
Conversation
That's bad. Didn't even passed cle tests. |
The CLE tests passed. The problem is you have code quality regressions so linting failed. You should make your changes pass pylint with the pylintrc in the angr-dev repository. |
@amttr Can you please rename |
Travis failed to build, despite that everything goes ok on a local machine 'No matching distribution found for %s' % req
DistributionNotFound: No matching distribution found for identifier (from patcherex==1.1)
[!!] pip failure (-e patcherex). Check /tmp/setup-10851 for details, or read it here:
The command "( curl https://raw.githubusercontent.com/angr/angr-dev/$TRAVIS_BRANCH/tests/travis-install.sh | grep -v 404 || curl https://raw.githubusercontent.com/angr/angr-dev/master/tests/travis-install.sh ) | bash" failed and exited with 1 during . |
@amttr That is our CI issue. Someone did not remove I appreciate your effort in making this PR! |
@fish, that specific issue is ours, but the CI fails hardcore with this fix. Errors and failures across almost every repo. I'm re-trying the tests, but something seems broken for sure, @amttr :-( |
Are you referring to our GitLab CI? |
@zardus, unfortunately this PR is only the part of the solution, I'm working on angr fixes now, and as I saw angrop is also needed to be fixed. |
(Unlike someone), I'm not going to merge in such a high-stake PR without thorough testing! @amttr Take you time :-) |
Ok, one more thing: there is an inconsistency between PE and ELF formats such that PEs mostly use RVAs internally, which is more convenient in case of rebasing, but ELFs use LVAs. Eventually, it can lead to situations when CLE user doesn't know what type of address he have got from CLE, for example, when user requested some information indirectly from pefile/pyelftools via CLE. P.S.: As I understand by design CLE should provide only flat virtual addresses which can be applied directly to the P.P.S.: Think CLE should be fixed in such way that angr has no need to fix the addresses, it can be done by Regions initialization with already fixed base addresses, but maybe that's not correct in general. Will wait for some comments. |
I think that we should absolutely be providing a consistent API, preferably of fully rebased virtual addresses. My opinion is that it's completely ok to touch massive amounts of code in order to support this change. One API change that comes to mind is that several CLE objects have |
inb4: Currently I haven't found a simple way to make |
Tried to rebase, but there are many unrelated errors on testing, checked out with ./git_all.sh checkout wip/the_end_times got: ERROR: test_cfgaccurate.test_string_references
ERROR: test_fauxware.test_pickling('i386',)
ERROR: test_fauxware.test_pickling('x86_64',)
ERROR: test_fauxware.test_pickling('ppc',)
ERROR: test_fauxware.test_pickling('armel',)
ERROR: test_fauxware.test_pickling('mips',)
ERROR: test_identifier.test_comparison_identification
ERROR: test_vfg_path.test_vfg_paths
ERROR: test_oppologist.test_cromu_70
ERROR: test_unicorn.test_longinit_i386
ERROR: test_unicorn.test_longinit_x86_64
FAIL: test_inspect.test_inspect
FAIL: test_oppologist.test_fauxware_oppologist
FAIL: test_unicorn.test_stops
FAIL: test_unicorn.test_fauxware
FAIL: test_unicorn.test_fauxware_aggressive
FAIL: test_unicorn.test_unicorn_pickle
FAIL: test_unicorn.test_concrete_transmits
FAIL: test_veritesting.test_veritesting_b('x86_64',)
FAILED (errors=11, failures=8) Is it ok to rebase there? P.S.: I've fixed angrop in my fork, but it also based on master branch. Think it also should be rebased on wip/the_end_times. Not sure because tests are failing there. |
the end times branch is in a state of extreme flux, so yeah, tests are supposed to be failing. You can compare your branch to the current state of the testcases and try to fix stuff to get it back to here. |
Oh, it occured to me that your unicorn-related failures are probably because you haven't rebuilt angr. Just jump into angr and run |
I have pushed all the changes at:
Think its ready (for testing at least) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty good!!!!! Mostly just style stuff that needs to be fixed, with the exception of the comment on rebase()
cle/backends/__init__.py
Outdated
@@ -497,10 +528,14 @@ def set_arch(self, arch): | |||
self.memory = Clemory(arch) # Private virtual address space, without relocations | |||
|
|||
@property | |||
def image_base_delta(self): | |||
return - self.linked_base + self.mapped_base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to blink a couple of times before I realized this wasn't a syntax error. Can you remove the space between the - and self.linked_base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can replace with just self.mapped_base - self.linked_base
. I saw something like that in the codebase, just thought it's more clear.
cle/backends/__init__.py
Outdated
Rebase backend's regions to the new base where they were mapped by the loader | ||
""" | ||
if self.sections: | ||
self.sections._rebase(self.image_base_delta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we throw an error of some sort if we try to rebase things twice? Currently it'll just keep adding the delta to each of the sections, which is... weird
cle/backends/elf.py
Outdated
self.mapped_base = self.linked_base = min( | ||
map(lambda x: x['p_vaddr'], | ||
filter(lambda x: x.header.p_type == 'PT_LOAD', self.reader.iter_segments())) or [0] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python's functional style is only really useful when you're using generators to prevent the whole list from ever being created. Can you break this expression into several more readable statements, like
seg_addrs = [x['p_vaddr'] for x in self.reader.iter_segments() if x.header.p_type == 'PT_LOAD']
self.mapped_base = min(seg_addrs) if seg_addrs else 0
Also take into account sections as well, if segments are not present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also take into account sections as well, if segments are not present.
I think there is a naming inconsistency between PE and ELF -- entities which are sections in PE are segments in case of ELF. There are also so called directories described in PE's optional header which seems equivalent to ELF sections.
Maybe I am wrong, but loadable sections (in a case of ELF) should be always a part of loadable segments, so I don't understand how I can take sections into account and think the code is correct.
If I am not then it is possible for valid ELF, that loadable sections exist without a loadable segment. Is it so? Because this is not in a case of valid PE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From specification:
To compute the base address, one determines the memory address associated with the lowest p_vaddr value for a PT_LOAD segment. One then obtains the base address by truncating the memory address to the nearest multiple of the maximum page size.
Depending on the kind of file being loaded into memory, the memory address might or might not match the p_vaddr values.
So resulting minimum should be only additionally aligned on a page size, which is actually the same what you're doing at ELF._load_segment_memory
:
def _load_segment_memory(self, seg):
# see https://code.woboq.org/userspace/glibc/elf/dl-load.c.html#1066
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. Looks like this is an additional issue: get_min_addr
/get_max_addr
rely on metadata which now is not consistent with Clemory state.
cle/backends/pe.py
Outdated
rebased_bytes = struct.pack('<I', rebased_value % 2**32) | ||
self.owner_obj.memory.write_bytes(self.dest_addr, rebased_bytes) | ||
rebased_value = AT.from_lva(org_value, self.owner_obj).to_mva() | ||
rebased_bytes = struct.pack('<I', rebased_value % 2 ** 32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our current style is to not put spaces around the **
operator when it's just 2 to the power something. Makes it more readable imo but if you feel strongly about it it can stay it the codebase as a statement :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's ok. That was PyCharm, i'll fix that
self.characteristics = pe_section.Characteristics | ||
def __init__(self, name, offset, vaddr, size, chars): | ||
super(PESection, self).__init__(name, offset, vaddr, size) | ||
self.characteristics = chars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you feel strongly about this change, could you please update ELFSection and ELFSegment to do the same thing? Right now they both just take readelf sections and parse them out into the constructors for the normal region arguments; this class was designed to mirror that. If there are complications with that for some reason, please revert this change. The interfaces should be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think interfaces should be updated, because of vaddr
semantics for sections/segments, they are not the same for PE/ELF, but should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, Mach-O's Section/Segment interfaces are already the same )
cle/loader.py
Outdated
@@ -119,8 +121,7 @@ def close(self): | |||
def __repr__(self): | |||
if self._main_binary_stream is None: | |||
return '<Loaded %s, maps [%#x:%#x]>' % (os.path.basename(self._main_binary_path), self.min_addr(), self.max_addr()) | |||
else: | |||
return '<Loaded from stream, maps [%#x:%#x]>' % (self.min_addr(), self.max_addr()) | |||
return '<Loaded from stream, maps [%#x:%#x]>' % (self.min_addr(), self.max_addr()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this really necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disabled this particular pylint check (no-else-return
) as soon as I upgraded my pylint, so if pylint told you to do this, grab the new pylintrc from angr-dev!
I didn't complete interface fixing ) |
😬 |
I can push it with separate PR, i think |
I just realized that we broke angr:master with this, so I pushed these changes to cle:wip/the_end_times and then force-pushed to revert to before the merge on master. Can you give me a quick overview of what still needs to be done? It's possible I can knock it out faster. |
Only interface fixing is left. Tests should work without this. |
Just a heads up - I just merged a huge refactor to angr. I made sure to rebase it on top of your changes first, but if you've got any of these interface fixes in progress, you should rebase those on top of the current |
Watch these fixes please.
I think they should resolve an issue #65. Unfortunately, the tests from angr/angrop will not pass. There some work left, but not so much. Don't know if the solution will fit. Didn't find anything less invasive.