-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output pretty error if possible #399
Conversation
Error position and hint are now included by default if present, colored if the stderr refers to a terminal, overridden by EDGEDB_PRETTY_ERROR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks amazing. I didn't thoroughly review the printing logic though, cursory it looks OK. I'd change the way we define valid valued for ENV vars to enums like in server
Also: * Extracted color impl * Recover from error formatting failure
return rv.getvalue() | ||
|
||
|
||
def _unicode_width(text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is an approximation that will overestimate the length of strings using combining characters that don't normalize away.
This means we can get stuff like
cannot redefine property 'x' of object type 'std::FreeObject' as scalar type 'std::str'
┌─ query:2:26
│
2 │ select { x := 1 } { x := 'f̷͈͎͒̕ǫ̴̏͌ö̶̱̘' };
│ ^^^^^^^^^^^^^^^^ error
https://zalgo.org/ has a great generator for pathological unicode strings.
Probably this is almost always fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLI's error formatter does seem to handle this one right. I'm not sure exactly what algorithm it uses.
I think to some extent anything we do is an approximation because it depends on assumptions about how the terminal will render it.
Unicode does have a notion of "grapheme clusters", and there are python libraries for handling them (https://github.com/alvinlindstam/grapheme). I'm not sure that this is the best approach either, though, since the width of those clusters can vary and I don't know if there is good way to guess that well. (Consider 🇨🇦
, which depending on your terminal will show as either a flag or the letters CA, but either way will probably take two characters, but is only one grapheme cluster.)
Maybe the best approach on our side would be basically just to drop combining characters.
There is a potentially infinitely deep rabbit hole here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust uses de facto standard in Rust: https://crates.io/crates/unicode-width
Which uses Unicode Standard Annex 11 for calculating width.
Note: unicode width != number of graphemes. As width of a grapheme can be two chars.
And you're right that in generic case it's impossible to calculate width. Because some characters can have width not just depending on the terminal but also on the specific font (if there is a character for a specific set of emojis). unicode-width
's frontpage gives an example.
Dropping Zero-width Joiners
- I've tested three terminals I have around (xfce4, alacritty, wezterm), and all of them do match width with unicode width, even though some of them do not render some of the characters (i.e. flags are only supported in wezterm). Disclaimer: alacritty and wezterm are implemented in Rust so this explains it a bit, although xfce4 is not and has much longer history.
- Copying from any of the three terminal's I've tried does discard zero-width joiner by itself
- Rendering in Firefox on my machine gives invalid width regardless of if there is a zero-width joiner (i.e. it somehow draws emojis 1.5 of character width 🤷 )
- Rendering in Chrome on my machine gives good results when zero-width joiner is dropped (like normally copying from terminal), but doesn't if I somehow manage to keep that character (i.e. piping output to
xsel -b
manually).
Note: zalgo-generated stuff looks like plain foo
in all three terminals (although those unicode tricks are present in the output)
The summary of this: dropping zero-width joiners is safe to do, although looks unnecessary in my setup. If that's not true for some popular terminals or on other systems it could be done.
Fixup
For my examples this function works:
def unicode_width(s):
return sum(0 if unicodedata.category(c) in ('Mn', 'Cf') else
2 if unicodedata.east_asian_width(c) == "W" else 1
for c in s)
But I'm not sure if this is good enough. The libraries doing this (rustic unicode-width, and pythonic urwid) have their own width tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in #404
Error position and hint are now included by default if present, overridden by EDGEDB_ERROR_HINT (enabled/disabled). The output is aslo colored if the stderr refers to a terminal, overriden by EDGEDB_COLOR_OUTPUT (auto/enabled/disabled).
Changes ======= * Output pretty error if possible (#399) (by @fantix in a2bec18 for #399) * Codegen: allow providing a path after --file (#400) (by @fantix in 6bce57e for #400) Fixes ===== * Handle ErrorResponse in ping (#393) (by @fantix in 8b28947 for #393) * Disallow None in elements of array argument (by @fantix in 26fb6d8) Docs ==== * Remove references to unix-domain sockets (by @quinchs in 4b8bec6)
Error position and hint are now included by default if present, overridden by
EDGEDB_ERROR_HINT
(enabled/disabled).The output is aslo colored if the stderr refers to a terminal, overriden by
EDGEDB_COLOR_OUTPUT
(auto/enabled/disabled).Fixes #395