Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[osh] Fix ${unset_variable@Q} wrongly producing '' #2186

Merged
merged 8 commits into from
Dec 11, 2024

Conversation

akinomyoga
Copy link
Collaborator

@akinomyoga akinomyoga commented Dec 8, 2024

#2185 (comment)

The current master produces '' for unset variables with the construct ${var@Q}.

$ bash -c 'echo ${unset_variable@Q}'

$ bin/osh -c 'echo ${unset_variable@Q}'
''
$

This PR fixes ${var@Q} so that it produces no quoted words for unset variables.

Note: There was a test case with OK osh STDOUT that explicitly requires '' for unset variables, but I removed the OK section so that it requires the behavior to match Bash.

@andychu
Copy link
Contributor

andychu commented Dec 8, 2024

OK, this is probably one of those judgement calls I made, and we should probably roll it back to be more compatible with bash

However I also wonder if bash ever documented that empty string produces '', while unset produces nothing

would bash ever change this?


I guess it sorta makes sense if you want to do

echo a ${arg@Q} b

in that case empty string can be useful


I am wondering if there is any rationale for having a shopt for this -- maybe not, as that just increases the test matrix / complexity

It might be useful for us to document it in doc/ref/chap-word-lang.md and so forth, which is linked from doc/ref/toc-osh.md

@andychu
Copy link
Contributor

andychu commented Dec 8, 2024

OK I looked at the code more ...

The thing that bugs me about this is that it is clearly a special case ? we have a special boolean only for the undef case

I guess the problem is we do val = _EvalBracketOp(val) , so we lose the fact if it's Undef

There is this vsub_state thing that is meant to track booleans across recursive calls, let me see


I remember this word eval code is very hairy (and I think it is also the biggest file in the bash codebase)


I think this change is a good idea, but I want to see if we can do it by transforming val into Undef, and if it applies to any other ops ??

@akinomyoga
Copy link
Collaborator Author

It might be useful for us to document it in doc/ref/chap-word-lang.md and so forth, which is linked from doc/ref/toc-osh.md

I added commit a898e4a.

OK I looked at the code more ...

The thing that bugs me about this is that it is clearly a special case ? we have a special boolean only for the undef case

Actually, there are many boolean switches as you mentioned below for vsub_state.

I guess the problem is we do val = _EvalBracketOp(val) , so we lose the fact if it's Undef

Yes. I actually think we should call _EvalBracketOp(val) in each branch if necessary.

There is this vsub_state thing that is meant to track booleans across recursive calls, let me see

I think this change is a good idea, but I want to see if we can do it by transforming val into Undef, and if it applies to any other ops ??

I can try that later, though it might be in the next weekend if it turns out to require larger changes.

@andychu
Copy link
Contributor

andychu commented Dec 8, 2024

Yeah I think perhaps calling _EvalBracketOp in each branch is the right thing!

Also there is this old idea

I realized the code can be made simpler and faster by doing some stuff at parse time, i.e. to "reorder" the ops inside ${}

But I guess we shouldn't bother with that yet, for this relatively small change


(although maybe making those decisions faster is not that big a win, because our profiles tend to show GC and memory allocation as the slower things. We are creating many List<T> objects too)

@akinomyoga
Copy link
Collaborator Author

akinomyoga commented Dec 9, 2024

I tried to refactor the calls of self._EvalBracketOp(...), but there are questions.

Importing in .asdl does not work in the Python version

If I naively modify VarSubState in core/runtime.asdl to include Token as a member, the generated VarSubState.CreateNull() fails in the Python version because of the missing identifier Token. The C++ version works fine because Token is imported at the beginning of core/runtime.asdl:

# import from frontend/syntax.asdl
use frontend syntax {
loc Token
expr word command

The problem is that, at the beginning of the generated _devbuild/gen/runtime_asdl.py, the import's are conditioned by if TYPE_CHECKING:

if TYPE_CHECKING:
  from _devbuild.gen.syntax_asdl import loc_t, Token, expr_t, word_t, command_t, CompoundWord, DoubleQuoted, ArgList, re_t, redir_loc_t, proc_sig_t, Func
  from _devbuild.gen.value_asdl import value_t, Obj

Even with core/runtime.asdl in the master branch, the same error happens when one tries to call AssignArg.CreateNull(). The problem has not happened so far just because AssignArg.CreateNull() hasn't been used in the codebase.

I think the generation of _devbuild/gen/runtime_asdl.py should be fixed by either removing the condition if TYPE_CHECKING or modifying the generated CreateNull() (if it's possible).

Array attributes

$ bash -c 'a=(1 2 3); printf "<%s>\\n" "${a[@]@a}"'
<a>
<a>
<a>
$ bin/osh -c 'a=(1 2 3); printf "<%s>\\n" "${a[@]@a}"'
<a>

The behavior of OSH differs from that of Bash.

The current OSH behavior makes the processing of variable substitutions non-trivial because one needs to know a[@] should be treated as a single entity at the point of processing a[@]; i.e., a[@] and @a are not orthogonal in the current OSH behavior. Do I need to keep the current OSH behavior?

Namerefs for array elements do not work

$ bash -c 'declare -n ref="a[@]"; a=(1 2 3); printf "<%s>\\n" "$ref"'
<1>
<2>
<3>
$ bash -c 'declare -n ref="a[0]"; a=(1 2 3); printf "<%s>\\n" "$ref"'
<1>
$ bash -c 'declare -n ref="a"; a=(1 2 3); printf "<%s>\\n" "$ref"'
<1>
$ bin/osh -c 'declare -n ref="a[@]"; a=(1 2 3); printf "<%s>\\n" "$ref"'
<a[@]>
$ bin/osh -c 'declare -n ref="a[0]"; a=(1 2 3); printf "<%s>\\n" "$ref"'
<a[0]>
$ bin/osh -c 'declare -n ref="a"; a=(1 2 3); printf "<%s>\\n" "$ref"'
<1>

OSH just returns the string for the name references of the form a[...]. What is the intended behavior in OSH? In particular, the first case is tricky, and we need to reconsider the array processing in variable substitutions if we would support it.

@andychu
Copy link
Contributor

andychu commented Dec 10, 2024

  • Hmm let me check out the ASDL issue tomorrow ...
  • The array divergence seems like a bug in OSH ?
  • nameref could also be a bug ... I think the only other shell that has it is mksh, and we should probably agree with bash

I only did the minimal things to make a few shell scripts work, and make additional tests pass

In general if something does not work for nameref, ${x@....}, or arrays, and there is no test coverage, then it's probably an unintentional difference


However I still want to think about the bash behavior here ... Thanks for documenting it in OSH

The problem is that I am not confident bash doesn't change this behavior later

If you turn on compare_shells: bash in spec/assoc, which is bash 5.2, then there are number of differences

i.e. bash has changed over time and presumably broken existing scripts

@andychu
Copy link
Contributor

andychu commented Dec 10, 2024

namerefs are another feature of bash that I don't really think is well-specified or documented

It seems like mostly an accident of the implementation to me

@p @q etc. are all under topic 'op-format', linked in doc/ref/toc-osh
@andychu
Copy link
Contributor

andychu commented Dec 11, 2024

OK I reproduced the ASDL bug you're talking about, thank you for noticing that!

@andychu
Copy link
Contributor

andychu commented Dec 11, 2024

OK I fixed the ASDL bug

951ca37

Python 2 has an issue where sometimes you have to do cast('Token', x) instead of cast(Token, x) ! We were inconsistent about the generated code

@andychu
Copy link
Contributor

andychu commented Dec 11, 2024

OK I simplified the logic like this:

  • add has_nullary_op to VarSubState

then change the test to

           if not vsub_state.has_test_op and not vsub_state.has_nullary_op:
                val = self._EmptyStrOrError(val, part.token)

and then for @Q and @P, handle case(value_e.Undef) in the normal way, producing empty string


I also added a test to show that some other ops in bash behave the same way, like @A, so I guess it's consistent


Thanks for finding this issue!

@andychu andychu merged commit 37c6b8d into soil-staging Dec 11, 2024
18 checks passed
@akinomyoga akinomyoga deleted the unsetvar-Q-transformation branch December 11, 2024 20:27
@akinomyoga
Copy link
Collaborator Author

OK I fixed the ASDL bug

951ca37

Thank you!


then change the test to

           if not vsub_state.has_test_op and not vsub_state.has_nullary_op:
                val = self._EmptyStrOrError(val, part.token)

and then for @Q and @P, handle case(value_e.Undef) in the normal way, producing empty string

This change breaks osh -uc "${undef@P}" and osh -uc "${undef@Q}". The nounset check is skipped.

@andychu
Copy link
Contributor

andychu commented Dec 11, 2024

Ahh we are missing test coverage for -u ? OK thank you, I will look at it

@akinomyoga
Copy link
Collaborator Author

As I mentioned, in my local repository, I already have commits to fix everything by refactoring the calls of _EvalBracketOp.

@andychu
Copy link
Contributor

andychu commented Dec 11, 2024

Ah OK, yes definitely feel free to fix it along with the other issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants