Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loader: ELF non-zero based PIC binaries support #67

Merged
merged 18 commits into from
Jul 14, 2017
Merged

Loader: ELF non-zero based PIC binaries support #67

merged 18 commits into from
Jul 14, 2017

Conversation

symflood
Copy link
Contributor

@symflood symflood commented Jun 19, 2017

Watch these fixes please.
I think they should resolve an issue #65. Unfortunately, the tests from angr/angrop will not pass. There some work left, but not so much. Don't know if the solution will fit. Didn't find anything less invasive.

@symflood
Copy link
Contributor Author

That's bad. Didn't even passed cle tests.

@rhelmot
Copy link
Member

rhelmot commented Jun 19, 2017

The CLE tests passed. The problem is you have code quality regressions so linting failed. You should make your changes pass pylint with the pylintrc in the angr-dev repository.

@ltfish
Copy link
Member

ltfish commented Jun 19, 2017

@amttr Can you please rename address.py to address_translator.py? Other than that and the linting fixes, the code looks perfect to me. Thank you :-)

@symflood
Copy link
Contributor Author

Travis failed to build, despite that everything goes ok on a local machine

    'No matching distribution found for %s' % req
DistributionNotFound: No matching distribution found for identifier (from patcherex==1.1)
[!!] pip failure (-e patcherex). Check /tmp/setup-10851 for details, or read it here:

The command "( curl https://raw.githubusercontent.com/angr/angr-dev/$TRAVIS_BRANCH/tests/travis-install.sh | grep -v 404 || curl https://raw.githubusercontent.com/angr/angr-dev/master/tests/travis-install.sh ) | bash" failed and exited with 1 during .

@ltfish
Copy link
Member

ltfish commented Jun 20, 2017

@amttr That is our CI issue. Someone did not remove identifier as a testing dependency for Patcherex. We will be able to merge this PR once our CI is fixed.

I appreciate your effort in making this PR!

@zardus
Copy link
Member

zardus commented Jun 22, 2017

@fish, that specific issue is ours, but the CI fails hardcore with this fix. Errors and failures across almost every repo. I'm re-trying the tests, but something seems broken for sure, @amttr :-(

@ltfish
Copy link
Member

ltfish commented Jun 22, 2017

but the CI fails hardcore with this fix

Are you referring to our GitLab CI?

@symflood
Copy link
Contributor Author

symflood commented Jun 22, 2017

@zardus, unfortunately this PR is only the part of the solution, I'm working on angr fixes now, and as I saw angrop is also needed to be fixed.
I've provided this incomplete PR just to show the additional interface for CLE to manipulate (translate) addresses inside binaries. As for now any project, which is using .rebase_addr or .requested_base directly, will fail. Because the semantics is changed. I am sorry)

@zardus
Copy link
Member

zardus commented Jun 22, 2017

@ltfish, I was referring to github. Look at the travis logs.

@amttr, no worries about the failures. I was just making sure @ltfish was aware of them, so he wouldn't merge it before it's ready :-)

@ltfish
Copy link
Member

ltfish commented Jun 22, 2017

(Unlike someone), I'm not going to merge in such a high-stake PR without thorough testing!

@amttr Take you time :-)

@symflood
Copy link
Contributor Author

symflood commented Jun 27, 2017

Ok, one more thing: there is an inconsistency between PE and ELF formats such that PEs mostly use RVAs internally, which is more convenient in case of rebasing, but ELFs use LVAs.
As an example - PE sections addresses (PE:Section Table (Section Headers):VirtualAddress) versus ELF segments addresses (ELF:Program Header:v_addr) and also entry points addresses.

Eventually, it can lead to situations when CLE user doesn't know what type of address he have got from CLE, for example, when user requested some information indirectly from pefile/pyelftools via CLE.
Is that correct?

P.S.: As I understand by design CLE should provide only flat virtual addresses which can be applied directly to the loader.memory without any modification, but the other side is that there are accesses to Region.min_addr/max_addr everywhere.

P.P.S.: Think CLE should be fixed in such way that angr has no need to fix the addresses, it can be done by Regions initialization with already fixed base addresses, but maybe that's not correct in general. Will wait for some comments.

@rhelmot
Copy link
Member

rhelmot commented Jun 27, 2017

I think that we should absolutely be providing a consistent API, preferably of fully rebased virtual addresses. My opinion is that it's completely ok to touch massive amounts of code in order to support this change. One API change that comes to mind is that several CLE objects have .addr and .rebased_addr, while it should probably be .local_addr (better name?) and .addr. Similarly, it would probably be the best for Region to have a backreference to the object that contains it, so that it can apply the base address to its API as well.

@symflood
Copy link
Contributor Author

symflood commented Jul 11, 2017

inb4: Currently I haven't found a simple way to make symbols <-> addresses mapping consistent with other API.

@symflood
Copy link
Contributor Author

symflood commented Jul 11, 2017

Tried to rebase, but there are many unrelated errors on testing, checked out with

./git_all.sh checkout wip/the_end_times

got:

ERROR: test_cfgaccurate.test_string_references
ERROR: test_fauxware.test_pickling('i386',)
ERROR: test_fauxware.test_pickling('x86_64',)
ERROR: test_fauxware.test_pickling('ppc',)
ERROR: test_fauxware.test_pickling('armel',)
ERROR: test_fauxware.test_pickling('mips',)
ERROR: test_identifier.test_comparison_identification
ERROR: test_vfg_path.test_vfg_paths
ERROR: test_oppologist.test_cromu_70
ERROR: test_unicorn.test_longinit_i386
ERROR: test_unicorn.test_longinit_x86_64
FAIL: test_inspect.test_inspect
FAIL: test_oppologist.test_fauxware_oppologist
FAIL: test_unicorn.test_stops
FAIL: test_unicorn.test_fauxware
FAIL: test_unicorn.test_fauxware_aggressive
FAIL: test_unicorn.test_unicorn_pickle
FAIL: test_unicorn.test_concrete_transmits
FAIL: test_veritesting.test_veritesting_b('x86_64',)
FAILED (errors=11, failures=8)

Is it ok to rebase there?

P.S.: I've fixed angrop in my fork, but it also based on master branch. Think it also should be rebased on wip/the_end_times. Not sure because tests are failing there.

@rhelmot
Copy link
Member

rhelmot commented Jul 11, 2017

the end times branch is in a state of extreme flux, so yeah, tests are supposed to be failing. You can compare your branch to the current state of the testcases and try to fix stuff to get it back to here.

@rhelmot
Copy link
Member

rhelmot commented Jul 11, 2017

Oh, it occured to me that your unicorn-related failures are probably because you haven't rebuilt angr. Just jump into angr and run python setup.py build.

@symflood
Copy link
Contributor Author

symflood commented Jul 12, 2017

I have pushed all the changes at:

  • angr/cle (master)
  • angr/angr (wip/the_end_times)
  • angr/angrop (wip/the_end_times)

Think its ready (for testing at least)

Copy link
Member

@rhelmot rhelmot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty good!!!!! Mostly just style stuff that needs to be fixed, with the exception of the comment on rebase()

@@ -497,10 +528,14 @@ def set_arch(self, arch):
self.memory = Clemory(arch) # Private virtual address space, without relocations

@property
def image_base_delta(self):
return - self.linked_base + self.mapped_base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to blink a couple of times before I realized this wasn't a syntax error. Can you remove the space between the - and self.linked_base?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can replace with just self.mapped_base - self.linked_base. I saw something like that in the codebase, just thought it's more clear.

Rebase backend's regions to the new base where they were mapped by the loader
"""
if self.sections:
self.sections._rebase(self.image_base_delta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we throw an error of some sort if we try to rebase things twice? Currently it'll just keep adding the delta to each of the sections, which is... weird

self.mapped_base = self.linked_base = min(
map(lambda x: x['p_vaddr'],
filter(lambda x: x.header.p_type == 'PT_LOAD', self.reader.iter_segments())) or [0]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python's functional style is only really useful when you're using generators to prevent the whole list from ever being created. Can you break this expression into several more readable statements, like

seg_addrs = [x['p_vaddr'] for x in self.reader.iter_segments() if x.header.p_type == 'PT_LOAD']
self.mapped_base = min(seg_addrs) if seg_addrs else 0

Also take into account sections as well, if segments are not present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also take into account sections as well, if segments are not present.

I think there is a naming inconsistency between PE and ELF -- entities which are sections in PE are segments in case of ELF. There are also so called directories described in PE's optional header which seems equivalent to ELF sections.

Maybe I am wrong, but loadable sections (in a case of ELF) should be always a part of loadable segments, so I don't understand how I can take sections into account and think the code is correct.
If I am not then it is possible for valid ELF, that loadable sections exist without a loadable segment. Is it so? Because this is not in a case of valid PE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From specification:

To compute the base address, one determines the memory address associated with the lowest p_vaddr value for a PT_LOAD segment. One then obtains the base address by truncating the memory address to the nearest multiple of the maximum page size.
Depending on the kind of file being loaded into memory, the memory address might or might not match the p_vaddr values.

So resulting minimum should be only additionally aligned on a page size, which is actually the same what you're doing at ELF._load_segment_memory:

    def _load_segment_memory(self, seg):

        # see https://code.woboq.org/userspace/glibc/elf/dl-load.c.html#1066
        ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. Looks like this is an additional issue: get_min_addr/get_max_addr rely on metadata which now is not consistent with Clemory state.

rebased_bytes = struct.pack('<I', rebased_value % 2**32)
self.owner_obj.memory.write_bytes(self.dest_addr, rebased_bytes)
rebased_value = AT.from_lva(org_value, self.owner_obj).to_mva()
rebased_bytes = struct.pack('<I', rebased_value % 2 ** 32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our current style is to not put spaces around the ** operator when it's just 2 to the power something. Makes it more readable imo but if you feel strongly about it it can stay it the codebase as a statement :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok. That was PyCharm, i'll fix that

self.characteristics = pe_section.Characteristics
def __init__(self, name, offset, vaddr, size, chars):
super(PESection, self).__init__(name, offset, vaddr, size)
self.characteristics = chars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel strongly about this change, could you please update ELFSection and ELFSegment to do the same thing? Right now they both just take readelf sections and parse them out into the constructors for the normal region arguments; this class was designed to mirror that. If there are complications with that for some reason, please revert this change. The interfaces should be the same.

Copy link
Contributor Author

@symflood symflood Jul 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think interfaces should be updated, because of vaddr semantics for sections/segments, they are not the same for PE/ELF, but should be.

Copy link
Contributor Author

@symflood symflood Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, Mach-O's Section/Segment interfaces are already the same )

cle/loader.py Outdated
@@ -119,8 +121,7 @@ def close(self):
def __repr__(self):
if self._main_binary_stream is None:
return '<Loaded %s, maps [%#x:%#x]>' % (os.path.basename(self._main_binary_path), self.min_addr(), self.max_addr())
else:
return '<Loaded from stream, maps [%#x:%#x]>' % (self.min_addr(), self.max_addr())
return '<Loaded from stream, maps [%#x:%#x]>' % (self.min_addr(), self.max_addr())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this really necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disabled this particular pylint check (no-else-return) as soon as I upgraded my pylint, so if pylint told you to do this, grab the new pylintrc from angr-dev!

@rhelmot rhelmot merged commit e3c565b into angr:master Jul 14, 2017
@symflood
Copy link
Contributor Author

I didn't complete interface fixing )

@rhelmot
Copy link
Member

rhelmot commented Jul 14, 2017

😬

@symflood
Copy link
Contributor Author

I can push it with separate PR, i think

@rhelmot
Copy link
Member

rhelmot commented Jul 14, 2017

I just realized that we broke angr:master with this, so I pushed these changes to cle:wip/the_end_times and then force-pushed to revert to before the merge on master.

Can you give me a quick overview of what still needs to be done? It's possible I can knock it out faster.

@symflood
Copy link
Contributor Author

symflood commented Jul 14, 2017

Can you give me a quick overview of what still needs to be done?

Only interface fixing is left. Tests should work without this.

@rhelmot
Copy link
Member

rhelmot commented Jul 14, 2017

Just a heads up - I just merged a huge refactor to angr. I made sure to rebase it on top of your changes first, but if you've got any of these interface fixes in progress, you should rebase those on top of the current wip/the_end_times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants