Skip to content
andychu edited this page May 1, 2019 · 73 revisions

Blogged about many of these:

  • parsing bash is undecideable -- arrays vs. associative arrays with "${a[i+1]}"
  • word splitting as a hack for lack of arrays
  • ${} language ambiguity with ${####} and ${x///}, etc.
  • exec {fd}<& input.txt is a terrible syntax for fd = open('input.txt')
  • test builtin ambiguity

Other:

  • programming errors are confused with runtime errors:
    • trying to assign to or unset a readonly variable just causes a status 1, which can be ignored. Need errexit to make it a hard failure.
  • dynamic scope
  • errexit problems -- subshell/command sub, local -- two different issues
  • getopts builtin is implemented in all shells, but OPTIND is a global variable with wildly diverging behavior. There's no reliable way to tell when it should be reset, because getopts is called in a loop. This is a fundamental design flaw.
    • it also sets globals OPTARG and the second opt argument
  • issue #3, arithmetic parsing at runtime
    • this is actually ShellShock-like behavior in not just bash, but bash and all ksh derivatives!
  • eval and echo shouldn't implicitly join multiple args -- this is a confusion of strings and arrays
    • someone at RC was confused about this
  • trap shouldn't take a string to be eval'd? Why not the name of a function?
  • multiple expression languages per type, leads to WTFs
    • (( a = b )) is assignment of variable names
    • (( a == b )) is equality of variable names
    • [[ a = b ]] is equality of strings, like [[ 'a' == 'b' ]]
    • [[ a == b ]] is equality of strings, like [[ 'a' == 'b' ]]
  • undefined variables 0 in the arithmetic context
  • multiple += operators
    • a+=b vs. (( a += b ))
  • type-compat.test.sh -- horrible runtime parsing of array declarations pointed out by Nix devs
    • there is a fundamental redundancy between literals like a=() and declare +a myarray=()
  • runtime globbing -- it shouldn't happen after variable substitution. Then you can end up globbing untrusted data?
    • TODO: OSH needs a fix for this
  • $* "$*" $@ "$@" are not orthogonal. You never need $* and $@. "$*" joins by IFS?
  • hacky syntax rules
    • here doc EOF vs 'EOF' / "EOF" / \EOF -- this is a very hacky rule. The thing that's easiest to implement.
    • getopts leading : for error handling is hacky
  • read shouldn't return 1 on lack of newline -- it still modified the variable
  • [[ foo.py == *.py ]] shouldn't do globbing, should be a different operator
  • bash WTF: a different lex state for [[ foo =~ (.*) ]] -- no quotes needed, in fact no quotes allowed!
    • the ( ) | chars are special
  • arrays:
    • ${myarray} is the same as ${myarray[0]}
    • ${mystr[@]} is silently allowed
    • decay to strings on equality -- [[ "${a[@]}" == "${b[@]}" ]] doesn't work
    • until bash 4.4, lack of ability to use empty arrays and set -u
      • fundamental confusion between unset variables and empty arrays. present in mksh.
  • extended glob
    • overloading of * in *(a*|b)
    • bash specific: 'shopt -s extglob; echo @(a|b)` gives a syntax error, but if you change the ; to a newline, it doesn't. It does dynamic parsing!!!
    • ambiguity of [[ !(a == a) ]] -- is it a negation of an equality test, or an extended glob? See doc/osh-manual.md.
    • use case: matching *.py without *_test.py with extended glob: echo */!(*_test).py
      • this syntax is confusing! not at all like regexes!
      • I guess !(*_test) is like a negative lookahead and then .* ?
  • argumenting parsing: set -eou pipefail is a very confusing syntax to parse. set -oo or set -ee.
  • Too many sublanguages, most of them fully recursive:
    • command
    • word
    • arithmetic
    • [[, and then at runtime test / [
    • brace expansion -- this is recursive
    • glob -- non-recursive, but extended glob is recursive
    • regular expressions -- recursive
  • IFS is used with two different algorithms: splitting a line for read, and "splicing" an unquoted word into an argv array. POSIX says thay are related, but in practice they seem different? At the very least, one supports backslash escaping and the other doesn't (read -r). Or you can look at it a different way: one supports quotes AND backslashes; the other supports just backslashes.
  • two different syntaxes for octal C escapes: echo -e '\0377' and echo $'\377'. FWIW C is the latter -- don't need a leading zero, and Python uses it.
  • string variables with hidden structure
    • the first char of $PS4 is treated differently
    • characters in $IFS are treated differently, depending on whether they're whitespace or not.

Word Language

  • word elision is confusing and can result in command elision, e.g. $(true). From help-bash@.
  • Double quotes within double quotes is an awkward syntax, but sometimes necessary: echo "${x:-"a*b"}"
  • single quoted arg to double quoted brace sub is treated differently based on operator
    • "${x:-'default'}" -- single quotes are literals
    • "${x#'glob'}" and "${x//'glob'}" -- single quotes are processed by the shell

Glob WTFs

  • [[] is a single left bracket. Conflicts with [[:alpha:]]. User should write [\[] instead.
  • []] should be [\]].

Bash-Specific

  • The stack doesn't line up! BASH_SOURCE is off by one from FUNCNAME and BASH_LINENO This is documented but makes no sense! Sort of like the parsing of regexes after =~.
  • History substitution syntax is ill-defined, with hacks to avoid conflict with ${!indirect}, !(foo|bar), etc.

Categories

TODO: organize the criticisms in these categories:

  • syntactic puns: the same character is used to mean different things
  • opposite problem: different characters/conventions are used to mean the same thing (negation, etc.)
    • (( a == b )) vs [[ a == b ]] (although they differ slightly)
  • sloppiness with types: string, array, undefined vs. empty
  • dynamic parsing -- confusing data and code.
    • arithmetic inside strings: s=1+2; [[ $s -eq 3 ]]
    • echo -e '\n' and printf '\n' "\n" vs. $'\n'
    • local, declare, etc. and array syntax (type-compat.test.sh)
    • shopt -s extglob changes the parsing algorithm, and it doesn't work on the same line!!!
      • bash -c 'shopt -s extglob; echo @(a|b)'
  • lack of error checking / invalid input.
    • echo -e \x is NUL in mksh and zsh, but \x in bash. It's a syntax error in C. Shell generally has the "keep going" mindset of JavaScript/PHP/Perl, which makes it hard to use.
    • likewise with \1 -- should be a syntax error. Or even \d should be \\d.
    • TODO: maybe strict-backslash can handle this?

Too Many Escaping Algorithms

Escaping constructs: \, 'single quotes', "double quotes", and $'C-style strings'

  • arbitrary CompoundWord to glob() or fnmatch() input, which allows \ escaping but not double quoting.
  • arbitrary CompoundWord to regcomp() input, where characters like [ are special too
  • respect \ escape in read without -r
  • \n outside of double quotes evalutes to n. Inside double quotes, it's \n (which is the same as the behavior inside single quotes). Note that neither evalutes to a newline! That only happens with $'\n'.
  • The quoting of $(command subs) is different than that of backticks, e.g. with respect to double quotes and other backticks. This is very confusing and shell behaviors diverge once you have 2 or 3 levels of quoting.

Too Many Lexer Modes

  • BASH_REGEX and REGEX_CHARS lexer modes. This is orthogonal to the regcomp() algorithm
    • Pathological example: [[ foo =~ [ab]<>(foo|bar) ]] ???

Too Many Methods of Negation

  • Different leading char for flag: set -e vs set +e, declare -a vs. declare +a
  • Different flags: shopt -s vs shopt -u
  • An Extra Flag:
    • export vs. export -n -- remove the export bit
  • Different builtin: alias and unalias are opposites
    • set and unset aren't opposites! One sets options and argv. The other unsets variables.
  • capitalization: echo -e vs echo -E

Too Many Methods of Showing Internal State

  • No args: set -- prints functions
    • readonly, export -- prints vars with those properties
  • -p arg:
    • declare -p
    • shopt -p -- prints both set and shopt options
    • alias -p

Duplicated External Builtins

  • test -- no reason for this other than speed?
  • time -- because it should be a block? But you could do this with a more general mechanism
  • kill -- for job specs
  • printf -- don't see a reason for this
  • getopts -- tighter integration, because we want to mutate shell variables. Doesn't behave like a builtin, but has the syntax of one.

Too many Single Letters

  • all the flags: read -n, echo -n, etc.
  • not shell, but a common pattern: date +%m vs date +%M -- I can never remember which. I don't know what + means either.
  • tar xvzf foo.tar.gz can just be tar -x -v -z < foo.tar.gz
    • or tar --verbose --extract --gzip < foo.tar.gz

See Unix Tools

Builtins that Take Variable Names

A questionable Pattern? These builtins don't behave like external commands because they can mutate memory.

  • read varname
  • getopts SPEC varname
Clone this wiki locally