Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fileinfo tool reports strange section names #451

Closed
olii opened this issue Jan 3, 2019 · 2 comments
Closed

Fileinfo tool reports strange section names #451

olii opened this issue Jan 3, 2019 · 2 comments

Comments

@olii
Copy link

olii commented Jan 3, 2019

fileinfo reports strange section names inside the unpacked binary from the original sample hash:
4383FE67FEC6EA6E44D2C7D075B9693610817EDC68E8B2A76B2246B53B9186A1. This seems to be related to unpacked sample only.

Objdump crashes with the following output (running with unpacked sampe):

$ objdump -h /tmp/4383FE67FEC6EA6E44D2C7D075B9693610817EDC68E8B2A76B2246B53B9186A1-n_nido4o-unpacked 
objdump: /tmp/4383FE67FEC6EA6E44D2C7D075B9693610817EDC68E8B2A76B2246B53B9186A1-n_nido4o-unpacked: bad string table size 808464432
objdump: /tmp/4383FE67FEC6EA6E44D2C7D075B9693610817EDC68E8B2A76B2246B53B9186A1-n_nido4o-unpacked: bad value

fileinfo reports section names with filled with zeros with a huge size:
image

@metthal metthal assigned PeterMatula and unassigned metthal Jan 7, 2019
PeterMatula added a commit to avast/retdec-regression-tests that referenced this issue Jan 31, 2019
@PeterMatula
Copy link
Collaborator

PeterMatula commented Jan 31, 2019

@s3rvac please note this fix changes detected section names in some samples. This might be relevant for the sample clustering - if it is done based on section names themselves, or related hashes.

The old algorithm for PE section names:

  1. Get 8 byte long name from section header.
  2. If it is an offset to string table (i.e. /<offset>), read 8 bytes from this offset.
  3. If there is a non-printable character in these 8 bytes, do not use string table. I.e. section name is /<offset>, e.g. /106.
  4. Else read a null-terminated string of unlimited length from this offset. Use it as a section name.

This is obviously not ideal:

  • No safety mechanism against huge strings - like the case reported here.
  • Checking only 8 bytes for non-printable characters and then reading until null character (including any non-printable chars after 8 bytes) is probably an oversight. It rejects/accepts names based on chance - position of the first non-printable character.

Solutions:

  • Length: Always limit the max length of read string. This limit is set on 1024 characters - see discussion below.
  • Non-printable characters - several possible solutions:
    • Read whatever is on the offset, including non-printable chars, until null byte or max length. Not used because it would increase an amount of garbage in many samples.
    • Like 1. solution, but filter out non-printable characters - this behavior was observed in some tools. Not used because I think it is confusing - it is not apparent how the section name was created.
    • Strict solution - use string from string table only if all characters are printable and it is not too long (above 1024). Implemented solution - prevents garbage from our output.

On section name length

We collected section names from huge amount of samples and analyzed them.
Results: section-names.zip
Structure:

  • sorted name lengths (number of names of this length)
    • name @ one sample hash containing this name

Analysis:

  • The longest name that makes sense is 418 characters long - compiler can create section for every function, if this function name is mangled then the name can be quite long.
  • 1024 max length should be a reasonable limit.
  • There is a lot of non-printable garbage names. I don't think it makes sense to use them. If we removed the 8 byte printable test and read all strings, the matter would be much worse.

The only problem that comes to my mind is a situation, where 2 samples have the same garbage name in string table, but these are on different offsets. In such a case, clustering might have put them together before based on the same garbage, but won't do now because of different offsets (e.g. /105, /237). No idea if this might be a problem.

PeterMatula added a commit that referenced this issue Feb 1, 2019
@s3rvac
Copy link
Member

s3rvac commented Feb 5, 2019

@PeterMatula Thank you for the fix and explanation 👍

PeterMatula added a commit to avast/retdec-regression-tests that referenced this issue Feb 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants