Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC/WIP: Support DAP disassemble request #627

Closed
wants to merge 2 commits into from

Conversation

puremourning
Copy link
Contributor

@puremourning puremourning commented Jan 25, 2022

Opening this up as a 'request for comments'. I had a quick go at implementing the dap disassemble request in CodeLLDB as I wanted something reliable to test Vimspector's disassemble view with.

Clearly this is a prototype. Would you be interested in a proper patch to support the DAP disassemble request?


CodeLLDB currently suppports a custom disassembly view and provides
disassembly as "source" when debugging into objects with no sourceline
info.

DAP now also has a disassemble request which, given a memory refernce
from the stack trace, produces a set number of instructions from that
address.

This is simple to implement based on the existing DisassembledRange.

WIP.

  • we don't return the exact number of instructions
  • we don't populate a lot of the optional fields
  • my-first-rust(TM)
  • no tests yet

CodeLLDB currently suppports a custom disassembly view and provides
disassembly as "source" when debugging into objects with no sourceline
info.

DAP now also has a `disassemble` request which, given a memory refernce
from the stack trace, produces a set number of instructions from that
address.

This API is awkward and annoying, but it's simple to implement based on
the existing DisassembledRange.

WIP.
 - we don't return the exact number of instructions
 - we don't populate a lot of the optional fields
 - my-first-rust(TM)
 - no tests yet
@vadimcn
Copy link
Owner

vadimcn commented Jan 25, 2022

Thanks! Yes, I'd like to implement native DAP disassembly support at some point, and gave it a try a while back, in fact. However, I was not able to satisfactorily resolve the question of how to handle disassembling backwards, and then I got busy, so that stuff is on hold for now. If you'd like to think about it, here's my branch.

@puremourning
Copy link
Contributor Author

Thanks I’ll take a look

@puremourning
Copy link
Contributor Author

puremourning commented Feb 3, 2022

not able to satisfactorily resolve the question of how to handle disassembling backwards

I'm not sure I fully followed this. Are you referring to something like a negative instructionOffset in the disassemble request? I can see how that can be tricky, especially on intel/cisc systems with variable length instructions.

One idea springs to mind:

  • map the current address to a source line
  • disassemble from the previous source line's load_addr up to the current one, count the instructions
  • repeat until we have enough, or the source's symbol (function) changes... or something
  • if we get to the end and more were requested, pad with NOPs

This seems like it might be possible in theory. Need to look at the api for the practice though. I think it might be possible by using the SBCompileUnit directly. WDYT?


That aside, for now, I took your branch and added source line info to the disassembled instructions and tat seems to work with my (extremely limited) client implementation. I'll try and dig through the LLDB api to see if there's anything we can do about negative instruction offsets, but would just bailing out and not supporting that be an option?

@vadimcn
Copy link
Owner

vadimcn commented Feb 4, 2022

Are you referring to something like a negative instructionOffset in the disassemble request?

Yes, that.

but would just bailing out and not supporting that be an option?

Don't think so. First and foremost, this is a VSCode extension, and VSCode's implementation of disassembly view uses negative offsets extensively.

disassemble from the previous source line's load_addr up to the current one, count the instructions

This will likely break in release builds: the optimizer may rearrange instructions such that they are not longer in line order. Also, disassembling must be able to function without any debug info whatsoever.

I can think of two methods:

  1. If the binary has debug info, find which function the current PC address is in, then disassemble starting from beginning of that function until PC is reached. Not sure what corner cases are there... one that comes to mind is that functions don't have to occupy a continuous range. For example profile-guided optimization may split a function into hot/cold parts and put these in different code sections.
  2. If current PC does not belong to any function, one can simply start disassembling at PC-x bytes, see if any invalid instructions were encountered, and whether PC ended up at the beginning of an instruction. Otherwise, try again at PC-x-1, and so on.
    But it's probably possible to start in the middle of an instruction and get a bogus, but valid-looking instruction stream between PC-x and PC, though likelihood of that goes down the larger x is.

I expect that a robust implementation will require quite a bit of research and experimentation.

...I bet there is a blog post or a mailing list discussion somewhere on the internet which has all the tips and tricks, because the problem is definitely not new. However so far I've been unsuccessful in locating it 🤷‍♂️

@micwoj92
Copy link

This branch has conflicts that must be resolved

@puremourning
Copy link
Contributor Author

I still have this on my TODO list by the way. I notice that vscode-cpptools seems to support a negative offset so it might be possible to reverse engineer what they do and pick it up again. Just need that "free" time people keep talking about :)

@puremourning
Copy link
Contributor Author

OK, so this is what MIEngine does:

        private async Task<DisasmInstruction[]> VerifyDisassembly(DisasmInstruction[] instructions, ulong startAddress, ulong endAddress, ulong targetAddress)
        {
            if (startAddress > targetAddress || targetAddress > endAddress)
            {
                return instructions;
            }
            var originalInstructions = instructions;
            int count = 0;
            while (instructions != null && (instructions.Length == 0 || Array.Find(instructions, (i)=>i.Addr == targetAddress) == null) && count < _process.MaxInstructionSize)
            {
                count++;
                startAddress--;         // back up one byte
                instructions = await Disassemble(_process, startAddress, endAddress); // try again
            }
            return instructions == null ? originalInstructions : instructions;
        }

So basically:

  • try to disassemble MaxSizeOfOneInstruction * instructionCount+1 bytes starting address - MaxSizeOfOneInstruction * -instructionOffset
  • if that results in a set of addresses that does not include an instruction starting at address (presumably, because it's invalid), then:
    • add 1 to the number of bytes in the range
    • decrement the start address by 1 byte
    • and repeat.

I don't love it, but I also don't hate it. What do you think?


FWIW this is what they do to calculate the "MaxSizeOfOneInstruction", which was my next question :)

        public void SetTargetArch(TargetArchitecture arch)
        {
            switch (arch)
            {
                case TargetArchitecture.ARM:
                    MaxInstructionSize = 4;
                    Is64BitArch = false;
                    break;

                case TargetArchitecture.ARM64:
                    MaxInstructionSize = 8;
                    Is64BitArch = true;
                    break;

                case TargetArchitecture.X86:
                    MaxInstructionSize = 20;
                    Is64BitArch = false;
                    break;

                case TargetArchitecture.X64:
                    MaxInstructionSize = 26;
                    Is64BitArch = true;
                    break;

                case TargetArchitecture.Mips:
                    MaxInstructionSize = 4;
                    Is64BitArch = false;
                    break;

                default:
                    throw new ArgumentOutOfRangeException(nameof(arch));
            }
        }

@puremourning
Copy link
Contributor Author

well, believe it or not, it works.

I'll tidy it up a bit and push a new PR.

@vadimcn
Copy link
Owner

vadimcn commented Oct 7, 2022

What happens if startAddress lands in the middle of an instruction, such that the trailing bytes just happen to encode a valid instruction?

@puremourning
Copy link
Contributor Author

puremourning commented Oct 7, 2022

If the start address happens to be mid-instruction and that resolves to a valid instruction then one of a few things might happen:

  1. After reading the first "bogus" instruction, the stream is no longer interprettable and the likelihood of a valid instruction appearing in the stream at the exact requested base address is very low, so we would reject it and move back a byte.
  2. After reading the first "bogus" instruction, the "new" interpretation happens to end on the same byte location as the next valid instruction run the stream. we would then return one bogus instruction followed by N valid (correct) instructions. In all likelihood, this invalid instruction would then be chopped off the front. The reason for this is that we must return the exact number of requested instructions, and due to seeking backwards M * the MAX instruction size, we always overshoot and have to re-centre the result.

I need to craft some careful test cases around this. Sorry if the above explanation is not very clear. My WIP commit message is below and the change is here - it's still WIP and the code is terrible, but hopefully you get the idea:

Disassembly for negative instruction offsets

For a negative instruction offset, we have a challenge: what _byte_
position should we start disassembling at? For ARM this seems
fairly simple (all instructions are 4 bytes), but is complicated by
thumb which uses a mix of 2 and 4 byte instructions. X64 on the other
hand has technically unlimited instruction size (though in practice 15
bytes is the maximum).

We therefore can't just assume that we can offset the base address by
some fixed number of bytes and get the exact number of instructions we
want. Instead, we have to attempt to find a valid address, then
re-center the resulting instruction list around the requested base
address.

The way this works is as follows:

* If the instruction offset is positive or zero, LLDB gives us a
  specific call to read a set number of instructions, so we use that,
  padding with invalids if we underflow.
* Otherwise, for a negative instruction offset:
  1. Guess a start address as base_address - instruction_offset * 16
  2. Disassemble from there for instruction_count * 16 bytes
  3. Check to see if the resulting set of instructions contains an
     instruction whose address matches our base_address. If not, move 1
     byte further back and try again. Do this up to 16 times and we
     should find an address which is the start of an instruction
     (assuming we're actually still in a code segment...)
  4. Pad or truncate the start of the instruction list so that the
     base_address instruction is at the expected location in the list.
* Slice and pad the disassembled instructions so that we have exactly
  instruction_count entries, as required by the protocol.

@eloparco
Copy link

eloparco commented Dec 4, 2022

Hello, I was taking a look at these changes to enable disassemble requests.
I was wondering, what's the difference between read_instructions() (used for positive offsets) and read_memory() + get_instructions() (used for negative offsets)?
Couldn't we use read_instructions() in both cases?
Sorry for the dumb question. Thanks

@puremourning
Copy link
Contributor Author

puremourning commented Dec 4, 2022

LLDB API doesn’t provide a way to do a negative offset read. This is also more complex due to the variable length of instructions in x86 (hence the read memory gymnastics)

see explanation here #627 (comment)

@puremourning puremourning deleted the dap-disassembly branch December 4, 2022 19:13
@eloparco
Copy link

eloparco commented Dec 5, 2022

I'm writing a custom extension and your implementation is being a good guidance!

Still, I'm having problems when VS Code asks for a large offset (e.g. -200) for my small program and disassemble_byte_range() attempts to read memory outside the current stack frame. When that happens, a few initial instructions (outside the current stack frame) are read but then it exits in advance (https://github.com/vadimcn/vscode-lldb/blob/master/adapter/src/disassembly.rs#L256) without retrieving the instructions from the current stack frame.

Is there anything I'm missing?

@puremourning
Copy link
Contributor Author

Could be a bug. Please can you raise an issue with steps to repro using codelldb and I can take a look.

@eloparco
Copy link

eloparco commented Dec 6, 2022

Actually, in codelldb it works fine. That's why I was wondering where that case is handled in codelldb code.

I'm trying to do something similar but using the VS Code embedded Open Disassembly View. I implemented a similar logic to what you've done but I'm bumping into problems since, after apply the negative offset requested by VS Code, I end up outside the current stack frame (i.e. disassemble_byte_range() returns some instructions outside the current stack frame).

I need to add a check on the start address but I didn't find any easy way to retrieve the stack frame start address from lldb.

@puremourning
Copy link
Contributor Author

The code for handing the disassemble request is here https://github.com/vadimcn/vscode-lldb/blob/master/adapter/src/debug_session.rs#L1134. I don’t know anything about vscode.

@eloparco
Copy link

eloparco commented Dec 6, 2022

Yes, that's what I was already looking at and using as a reference.
Maybe in your case you receive valid offsets, so my issues (i.e. offset resulting in reading memory out of current stack frame) doesn't show up. I'll dig into it.

@puremourning
Copy link
Contributor Author

I'm really struggling to understand what you're asking for. If you think the above code doesn't work in some scenario, I'm happy to look into that. I can assure you that I tested negative offsets that go outside the definition of the current "function". Even outside the binary image. I'm not sure what "stack frame" per se has to do with it. Disassembly is just taking a chunk of memory and trying to interpret that byte stream as instructions. Often the memory isn't actually instructions and you get various forms of invalid (or NOP) instruction instead. The idea of the above code is that it tries to determine a valid start address by heuristically disassembling various bytes (up to one instruction width back from the calculated start address) and looking to see if it "looks" valid. The stack is really not involved unless the code location happens to be very close to where the stack is in memory.

@eloparco
Copy link

eloparco commented Dec 8, 2022

Your implementation works perfectly, I was having a problem on my side. Thanks for the reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants