-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: builtin descriptions & examples #2178
Conversation
3c78427
to
8e160c6
Compare
The approach I've taken is consistent with similar tools in our field. In fact, many of these are copied from those tools (polars, spark, duckdb, ...). I'm open to making necessary changes. However, given the volume of comments, it would be helpful to prioritize the most critical ones. If there are specific changes you feel strongly about, could you make PR suggestions for those specific ones? The suggestions are much easier to apply for small changes like this. |
I mean I don't think this makes them good, or helpful, and I think we (maybe?) have the opportunity to help and make using our interface easier? If having help text for our methods is important, then having it be clear and useful, should also be important.
I feel moderately strongly about it: while I did have some that were stylistic, many of my comments point out unclear or confusing semantics or situations where the string is actively confusing. Improving things is going to take one of us a few hours of editing. Maybe the best path forward is:
|
What i was suggesting is using the pr I also don't think we want overly verbose examples & descriptions in the catalogs. I do think the "extended" examples are helpful, but i don't think the IMO, this is very readable from the command line, and gives you at least a starting point on what the function is, and what it does.
anything more than that, such as what I have for the table functions (but want to find a way to slim down) isn't as easy to grok. > select function_name, example, description from glare_catalog.functions where function_type = 'table' order by function_name asc;
┌────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────┐
│ function_name │ example │ description │
│ ── │ ── │ ── │
│ Utf8 │ Utf8 │ Utf8 │
╞════════════════════╪═════════════════════════════════════════════════════════════════════════════════════════╪════════════════════════════════════════════════════════════╡
│ csv_scan │ ↵ │ ↵ │
│ │ -- Read a relative path.↵ │ Syntax:↵ │
│ │ SELECT * FROM csv_scan('./my_data.csv');↵ │ -- Single url or path.↵ │
│ │ -- Read all csv files in a directory.↵ │ csv_scan(<url>)↵ │
│ │ SELECT * FROM csv_scan('./directory_of_data/*.csv');↵ │ -- Multiple urls or paths.↵ │
│ │ -- Read csv files from multiple directories.↵ │ csv_scan([<url>])↵ │
│ │ SELECT * FROM csv_scan('./**/*.csv');↵ │ -- Using a cloud credentials object.↵ │
│ │ -- Read multiple explicitly provided files.↵ │ csv_scan(<url>, <credential_object>)↵ │
│ │ SELECT * FROM csv_scan(['./directory_of_data/1.csv', './directory_of_data/2.csv']);↵ │ -- Required named argument for S3 buckets.↵ │
│ │ │ csv_scan(<url>, <credentials_object>, region =>↵ │
│ │ │ '<aws_region>')↵ │
│ │ │ -- Pass S3 credentials using named arguments.↵ │
│ │ │ csv_scan(<url>, access_key_id => '<aws_access_key_id>',↵ │
│ │ │ secret_access_key => '<aws_secret_access_key>', region =>↵ │
│ │ │ '<aws_region>')↵ │
│ │ │ -- Pass GCS credentials using named arguments.↵ │
│ │ │ csv_scan(<url>, service_account_key =>↵ │
│ │ │ '<gcp_service_account_key>' │
│ delta_scan │ NULL │ NULL │
│ read_bigquery │ NULL │ NULL │
│ read_excel │ NULL │ NULL │
│ read_mongodb │ NULL │ NULL │
│ read_mysql │ NULL │ NULL │
│ read_postgres │ NULL │ NULL │
│ read_snowflake │ NULL │ NULL │
└────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────┘
> I like the idea of the suggested maybe something like
|
@tychoish i don't mean to be argumentative or combative. I do agree that we should really strive to have good documentation, i think we are just a bit misaligned on the manifestation of those. I'm of the mind that the table fields should be concise & serve as an "entry point". (The ones addressed in this PR). Anything beyond that (such as ones you'd find in the function reference) should not be part of the I do really like how snowflake has their functions documented. They provide examples, outputs, syntax & more. Doing something like that via |
251b254
to
f1d4d87
Compare
cffae1a
to
b8a65d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm moduo a few minor comments.
closes #2213 Some context.. After finishing up the docs in #2178, I realized that we were missing **all** of our udfs. _kdl_*, and all of our postgres specific functions._ It wasn't really obvious why either. They were in a separate part of the codebase hidden away. This pr moves all of the udfs out of sqlexec and puts them into sqlbuiltins alongside our udtfs (user defined table functions). --------- Co-authored-by: Sean Smith <scsmithr@gmail.com>
closes #2177.
Still need to finish adding examples and descriptions to all builtin functions, but this gives us a simple example & brief description for each builtin function.
example: