Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output pretty error if possible #399

Merged
merged 5 commits into from
Nov 18, 2022
Merged

Output pretty error if possible #399

merged 5 commits into from
Nov 18, 2022

Conversation

fantix
Copy link
Member

@fantix fantix commented Nov 17, 2022

Error position and hint are now included by default if present, overridden by EDGEDB_ERROR_HINT (enabled/disabled).

The output is aslo colored if the stderr refers to a terminal, overriden by EDGEDB_COLOR_OUTPUT (auto/enabled/disabled).

Fixes #395

c.query("select $0")

image

c.query("select ('\033[95m嘿嘿嘿', 1 < '\033[95m哈哈😁lol');")

image

c.query("""\
select (
    '\033[95m哈哈哈哈哈哈哈😁', '\033[95m嘿嘿嘿' < (
        2, 3, 4,
    ), 345
);
""")

image

Error position and hint are now included by default if present, colored
if the stderr refers to a terminal, overridden by EDGEDB_PRETTY_ERROR.
edgedb/errors/_base.py Outdated Show resolved Hide resolved
Copy link
Member

@1st1 1st1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks amazing. I didn't thoroughly review the printing logic though, cursory it looks OK. I'd change the way we define valid valued for ENV vars to enums like in server

Also:

* Extracted color impl
* Recover from error formatting failure
@fantix fantix merged commit 22e9daf into master Nov 18, 2022
@fantix fantix deleted the pretty-error branch November 18, 2022 17:38
return rv.getvalue()


def _unicode_width(text):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is an approximation that will overestimate the length of strings using combining characters that don't normalize away.
This means we can get stuff like

cannot redefine property 'x' of object type 'std::FreeObject' as scalar type 'std::str'
  ┌─ query:2:26
  │ 
2 │   select { x := 1 } { x := 'f̷͈͎͒̕ǫ̴̏͌ö̶̱̘' };
  │                            ^^^^^^^^^^^^^^^^ error

https://zalgo.org/ has a great generator for pathological unicode strings.

Probably this is almost always fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI's error formatter does seem to handle this one right. I'm not sure exactly what algorithm it uses.
I think to some extent anything we do is an approximation because it depends on assumptions about how the terminal will render it.

Unicode does have a notion of "grapheme clusters", and there are python libraries for handling them (https://github.com/alvinlindstam/grapheme). I'm not sure that this is the best approach either, though, since the width of those clusters can vary and I don't know if there is good way to guess that well. (Consider 🇨🇦, which depending on your terminal will show as either a flag or the letters CA, but either way will probably take two characters, but is only one grapheme cluster.)

Maybe the best approach on our side would be basically just to drop combining characters.

There is a potentially infinitely deep rabbit hole here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust uses de facto standard in Rust: https://crates.io/crates/unicode-width
Which uses Unicode Standard Annex 11 for calculating width.

Note: unicode width != number of graphemes. As width of a grapheme can be two chars.

And you're right that in generic case it's impossible to calculate width. Because some characters can have width not just depending on the terminal but also on the specific font (if there is a character for a specific set of emojis). unicode-width's frontpage gives an example.

Dropping Zero-width Joiners

  1. I've tested three terminals I have around (xfce4, alacritty, wezterm), and all of them do match width with unicode width, even though some of them do not render some of the characters (i.e. flags are only supported in wezterm). Disclaimer: alacritty and wezterm are implemented in Rust so this explains it a bit, although xfce4 is not and has much longer history.
  2. Copying from any of the three terminal's I've tried does discard zero-width joiner by itself
  3. Rendering in Firefox on my machine gives invalid width regardless of if there is a zero-width joiner (i.e. it somehow draws emojis 1.5 of character width 🤷 )
  4. Rendering in Chrome on my machine gives good results when zero-width joiner is dropped (like normally copying from terminal), but doesn't if I somehow manage to keep that character (i.e. piping output to xsel -b manually).

Note: zalgo-generated stuff looks like plain foo in all three terminals (although those unicode tricks are present in the output)

The summary of this: dropping zero-width joiners is safe to do, although looks unnecessary in my setup. If that's not true for some popular terminals or on other systems it could be done.

Fixup

For my examples this function works:

def unicode_width(s):
    return sum(0 if unicodedata.category(c) in ('Mn', 'Cf') else
               2 if unicodedata.east_asian_width(c) == "W" else 1
        for c in s)

But I'm not sure if this is good enough. The libraries doing this (rustic unicode-width, and pythonic urwid) have their own width tables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in #404

fantix added a commit that referenced this pull request Nov 23, 2022
Error position and hint are now included by default if present,
overridden by EDGEDB_ERROR_HINT (enabled/disabled).

The output is aslo colored if the stderr refers to a terminal,
overriden by EDGEDB_COLOR_OUTPUT (auto/enabled/disabled).
fantix added a commit that referenced this pull request Nov 23, 2022
Changes
=======

* Output pretty error if possible (#399)
  (by @fantix in a2bec18 for #399)

* Codegen: allow providing a path after --file (#400)
  (by @fantix in 6bce57e for #400)

Fixes
=====

* Handle ErrorResponse in ping (#393)
  (by @fantix in 8b28947 for #393)

* Disallow None in elements of array argument
  (by @fantix in 26fb6d8)

Docs
====

* Remove references to unix-domain sockets
  (by @quinchs in 4b8bec6)
@aljazerzen aljazerzen mentioned this pull request Feb 23, 2024
This was referenced Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python client not showing the exact location of the syntax error
4 participants