Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assembling instructions with unknown/don't care context bits #7195

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

plucia-mitre
Copy link
Contributor

Without this change, if unspecified context bits are provided to the assembler they are defaulted to 0 and the resulting context is used to filter for valid assembly instructions. After this change unspecified bits are kept as unspecified through the assembly process possibly providing additional valid assembly results.

This change is needed to help support Pickled Canary's support for context.

Submitting on behalf of a colleague: Will R

Without this change, if unspecified context bits are provided to the assembler they are defaulted to 0 and the resulting context is used to filter for valid assembly instructions. After this change unspecified bits are kept as unspecified through the assembly process possibly providing more valid assembly results.
@plucia-mitre plucia-mitre marked this pull request as draft November 15, 2024 18:06
@ryanmkurtz ryanmkurtz added the Status: Triage Information is being gathered label Nov 15, 2024
@plucia-mitre
Copy link
Contributor Author

We've just pushed some updated test cases which illustrate (at least some of) the current limitations of this implementation.

The proposed solution in this pull request currently may produce unexpected results when the assembler resolves ^instruction phases/root recursion. The restore instruction in MIPS16 illustrates this issue, as it may be encoded with 16-bits or extended to 32-bits depending on the operands. This extension/prefix is implemented with an ^instruction phase across the board in mips16.sinc. For example, with restore 0x1b8,ra,s0-s1, we only expect to receive 32-bit encodings from the assembler (due to the value of the literal operand).

However, when the context provided to the assembler only has ISA_MODE and RELP set to one (all other context bits unknown), half of the encoding is lost. The result is 0x6477, when we expect something akin to 0xf0306477. Additionally, the same 0xf0306477 disassembles to restore 0x38,ra,s0-s1, which is not what we gave the assembler. We suspect this issue has to do with the implementation and usage of the context transition graph required for pure recursive resolution (AssemblyContextGraph.java). It appears that this class was designed for complete input contexts (see computeOptimalApplications()), and would require some alterations to support unknown context bits.

We have started to experiment with possible fixes, but we'd greatly appreciate any tips about the right way to proceed.

@nsadeveloper789
Copy link
Contributor

It's been a while since I've read through this code, but yes, the troubles you're having around ^instruction sound very familiar. If I could have my way, I'd ban ^instruction (or any production of the form A->A) and require Slaspec authors to specify an alternative start symbol, but alas.

I think part of the core issue is that context bits get (ab)used for all sorts of interesting purposes, and believe me, I'm no stranger to abusing context bits.... That said, I suspect what you intend to get by leaving the context unspecified is "all possible encodings that might appear in a binary." Though that context may change wildly during disassembly of a single instruction, there's actually only so many possible input contexts when starting an instruction. We have a notion of "default context" and "global sets". The latter is also sometimes called "context commits." When the disassembler starts, it uses the default context for the processor as input. Some instructions in the Slaspec may use the "globalset" function in its context transitions. Those will place context changes that persist at a given address. When the disassembler encounters that address it applies the change to the input context for that instruction. So, what I might suggest:

  1. Scrape all the .pspec files that go with the same .slaspec file (you may have to "go up" to the .ldefs files). I think you can avoid scraping the actual files by just querying the DefaultLanguageService, and accessing the various attributes on each language returned. You should then be able to collect all the possible "default contexts." For that part, I think you can use ProgramContextImpl.
  2. This part may not be necessary, but perhaps scrape for possible globalset-based transitions that could change the input context. It looks like ContextCommit is a subclass of ContextChange, and those are accessible via Constructor.getContextChanges(). Consider whether you must be cognizant of the target address. To try to get a handle on how its fields are interpreted, examine SleighParserContext.addCommit and applyCommits.
  3. Once you have those, you should be able to collect all the possible input contexts. You'll probably just have to provide each one of them, fully defined, in a separate call to the assembler. It's maybe possible you can compress the set of contexts by leaving some bits undefined. Or maybe you can short-circuit the exhaustion in (2) above by just saying "if there's a globalset that can affect this bit, I'll just leave it undefined." In this way, only select bits are undefined. Maybe that would improve things?

Hope this helps.

@plucia-mitre
Copy link
Contributor Author

We gave your suggestions a try; here’s an overview of our scraping attempts and the issues we encountered:

Initially, we attempted to use possibleVals() from AssemblyPatternBlock to generate all the possible input contexts based on a given default context. Unsurprisingly, this resulted in an enormous number of possible input contexts, as the default context may contain many unknown bits (which seemingly correspond to context variables left unset in the .pspec). This issue is exacerbated by the requirement for contextreg to be at least 32-bits long. Additionally, some unknown bits may be for purely local context variables, while others could belong to context variables passed to globalset (the ones we care about). It appears there is no guarantee that a context variable that's globalset will have a default value specified in the .pspec.

We improved this approach by scraping for context variables used in globalset directives and ignoring context variables that were only set locally. While this significantly reduced the number of possible input contexts (generated by possibleVals()) for some languages, the number was still quite large for languages that make liberal use of context. In some cases, context variables passed to globalset were many bits wide, which exponentially increased the number of possible combinations (most of which were invalid). We have also observed abuse of extra wide context variables and globalset to propagate encoding fields from one instruction to another.

To mitigate these issues, we began scraping the bit pattern section of each constructor to check the constraints of context variables we know get globalset. Our goal is to eliminate the globalset context variables that do not affect constructor selection, and to determine every valid value for the variables that do (not every possible value). However, we are not certain that scraping a language in this way is proper or sufficient.

While our scraping approach appears to work for trivial cases (e.g., ARM/Thumb), an approach that doesn't scrape or call resolveTree() multiple times would likely be superior.

There's a remaining TODO regarding how to best ensure that AssemblyResolutionResults have the proper context. There's a simple function to force this at the bottom of WildSleighAssembler but we'd appreciate any thoughts regarding how to ensure the context isn't lost in the first place, or where makes the most sense to save & restore it.

@plucia-mitre plucia-mitre marked this pull request as ready for review January 2, 2025 20:17
Copy link
Contributor

@nsadeveloper789 nsadeveloper789 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm good with everything except the setContext method being added. My recommendations below are pretty thorough, but perhaps a bit prescriptive. I'll probably take a hard line on withContext and withContextBuilder, but I'll try to be flexible with the other stuff.

public AssemblyResolutionResults resolveTree(
AssemblyParseResult parse, Address at, AssemblyPatternBlock ctx) {

if (inputContexts == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the scraping should be done by WildSleighAssemblerBuilder. Adding a mutable field could break thread safety, not that it matters, but conventionally, this is definitely a builder step. I can appreciate the lazy computation, but building already takes some time, so I don't think that's necessary.


assertTrue("Expect to have one valid encoding", allValidEncodings.size() == 1);
// In this case, wildcard assembler now returns identical encodings with different contexts
assertTrue("Expect to have at least one valid encoding", allValidEncodings.size() > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedantic, but if the English says "at least one", the code should say >= 1.

* @param that the other block
* @return the new combined block
*/
public AssemblyPatternBlock combinePrecedence(AssemblyPatternBlock that) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combinePrecedence works. I might suggest assign as a shorter name, though.

* Write mask bits from context commit to mask array of block
*
* <p>
* This is used when scraping for valid input contexts to determine which context variables are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tag this with @implNote instead of <p>.

@@ -57,7 +57,7 @@ public class DefaultAssemblyResolvedPatterns extends AbstractAssemblyResolution

protected final Constructor cons;
protected final AssemblyPatternBlock ins;
protected final AssemblyPatternBlock ctx;
protected AssemblyPatternBlock ctx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to retain the immutability of this object. Please put final back. See comment below re/ setContext.

AssemblyResolutionResults results = super.resolveTree(parse, at, combinedCtx);

setContexts(results, combinedCtx);
allResults.absorb(results);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call to absorb becomes redundant if you take the suggestion below re/ absorbWithContext.

}

/**
* Get set of all all valid input contexts that affect constructor selection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't own this package, so this'll need additional review by others. I don't anticipate issues, but it's possible after I Approve this, I'll come back with additional comments from internal review.

/**
* Get set of all all valid input contexts that affect constructor selection.
*
* <p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if <p> is necessary before <ol>....

AssemblyPatternBlock defaultCtx = new AssemblyDefaultContext(language).getDefault();

// Erase the values for posterity; we don't care about them at this point
System.arraycopy(new byte[defaultCtx.getVals().length], 0, defaultCtx.getVals(), 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider Arrays.fill instead, to avoid allocating a temporary array of 0s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Information is being gathered
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants