-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinct YulHexStringLiterals from common Yul string literals while reproducing the source code #11324
Comments
P.S.: You can also look at the original source and produced source (and the logic, if you would prefer) in the referenced package. |
Both the Solidity and the Yul AST lose information when they are generated from source and we do not see the reverse operation (generating source from AST) as a use-case of the AST. Since you lose information anyway, is it that important to re-generate the correct kind of string? You also won't be able to determine if certain characters were escaped in the original string, for example. The internal Yul AST does not contain the distinction anymore because it is not needed for code-generation, so I would be hesitant to add it. |
@chriseth The "correct kind of string" is the exact wording for what was asked for. So far, there were no issues with code reproduction from AST, as all of the nodes were containing all the necessary compatible information to produce source that would result the same bytecode. It does not really matter if some details are changed while they are still readable and producing same results. With current Yul AST I was not able to achieve this. I, sure, understand that AST is an abstract representation that loses some details (whitespace and some other, that may be expressed differently), however it is also widely used to analyze, instrument and modify code. It additionally serves for the purpose to indicate certain meta-details of the code structure. These are the nice features of compiler-provided AST. Would these arguments be enough motivation to keep such AST quality level? |
@chriseth I think the problem is that the Yul AST seems ambiguous in this particular case. Perhaps if you clarify what the semantics of the string inside the Is it always just the string itself utf-8 encoded? Or does it change depending on whether the original string was a normal string or a hex string? In the case of normal Regarding your second point about seeing generating source from AST as a use-case for the AST, I guess we (https://github.com/consensys/scribble) are the first consumer that use it in that way. However I think the exact semantics of the |
I currently cannot see any different behaviour about the AST output in the Solidity case. For Solidity Literal has the fields For Yul literals, the mechanism to detect whether it is valid utf8 is missing, so the string just contains (hex) escapes in that case. Or is this what you meant by "different behaviour"? So as far as I can see, neither in the Solidity case nor in the Yul case is there any different encoding of the value depending on whether or not the original literal was specified as a hex literal. In the Solidity case, the |
Let me clarify something, that I probably missed in the starting post (and sorry if it caused too much confusion). Lets add few code lines to the reproduction sample script ( // console.log(JSON.stringify(parsed, undefined, 4)); // Comment compiler output
const literal = parsed.sources[fileName]
.ast // SourceUnit
.nodes[1] // ContractDefinition
.nodes[1] // FunctionDefinition (testAssemblyHexLiterals)
.body // Block
.statements[0] // InlineAssembly
.AST // YulBlock
.statements[1] // YulVariableDeclaration (let b := hex"112233445566778899aabbccddeeff6677889900")
.value; // YulLiteral
console.log(literal);
console.log("Original value", literal.value);
const bytes = new TextEncoder().encode(literal.value);
console.log("Decoded bytes", bytes);
console.log("Decoded bytes (as HEX)", Array.from(bytes, (byte) => byte.toString(16)));
const hex = Array.from(bytes, (byte) => byte.toString(16).padStart(2, "0")).join("");
console.log("HEX string", hex); The output:
While in the original input source has If there would be an original value from source and the Thanks in advance. |
Guys, would some of you be able to look into a post above and explain the behavior? |
This is certainly weird. I think we should at least add a hex value, so that we do not get into some json coding mess. |
It seems that the fix is part of Solidity 0.8.5 release. I will check and close the issue in near future. Thanks. |
Issue is fixed. Closing. |
Description
Since Solidity 0.8.4 release there is a support for
YulHexStringLiteral
s (let x := hex"11"
, the commit for reference). While making attempts to reproduce the original source from compiler-provided AST, I met two issues with new literals:hex""
ones, as there are no markers in Yul AST. It is hard to decide if Yul string literal should produce the source code as common string orhex""
notation. In Solidity AST there is a"kind": "hexString"
and/or"hexValue"
entries that was helping with that.hex""
literals from JSON output and get the original bytes, that was supplied in input source. See common Solidity hex string literal and Yul hex string literal examples forhex"112233445566778899aabbccddeeff6677889900"
(link):I understand that the intention was "to support syntax at all", but the AST is also "kind of" important and I guess "producing source from AST" is somehow the expected usage. I believe it may look like a feature request, but the other part that the source code is not producible from AST itself properly looks like a bug. Anyway, the Solidity team should decide if this should be a bugfix or feature implementation. If this would be a feature, can the team suggest a temporary workaround?
Environment
Steps to Reproduce
package.json
sample.sol
index.js
Run command
npm install # run once prior to execution to setup dependencies node index.js
Regards and thank you for your time.
The text was updated successfully, but these errors were encountered: