-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
document Mosaic #1012
document Mosaic #1012
Conversation
docs/lib/nyc-taxi.parquet
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a data loader to make this, or at least include the script to make it alongside this cached output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. I just didn't want to document a data loader as part of this page, which has enough matter already.
It's not a hugely complicated data loader, except it needs to install the binary duckdb.
export TMPDIR="docs/.observablehq/cache"
export PATH=$TMPDIR:$PATH
duckdb :memory: << EOF
-- Load spatial extension
INSTALL spatial; LOAD spatial;
-- Project, following the example at https://github.com/duckdb/duckdb_spatial
CREATE TEMP TABLE rides AS SELECT
pickup_datetime::TIMESTAMP AS datetime,
ST_Transform(ST_Point(pickup_latitude, pickup_longitude), 'EPSG:4326', 'EPSG:32118') AS pick,
ST_Transform(ST_Point(dropoff_latitude, dropoff_longitude), 'EPSG:4326', 'EPSG:32118') AS drop
FROM 'https://uwdata.github.io/mosaic-datasets/data/nyc-rides-2010.parquet';
-- Write output parquet file
COPY (SELECT
HOUR(datetime) + MINUTE(datetime) / 60 AS time,
ST_X(pick)::INTEGER AS px, -- extract pickup x-coord
ST_Y(pick)::INTEGER AS py, -- extract pickup y-coord
ST_X(drop)::INTEGER AS dx, -- extract dropff x-coord
ST_Y(drop)::INTEGER AS dy -- extract dropff y-coord
FROM rides
ORDER BY 2,3,4,5,1 -- optimize output size by sorting
) TO '$TMPDIR/trips.parquet' (COMPRESSION 'ZSTD', row_group_size 10000000);
EOF
cat $TMPDIR/trips.parquet >&1 # Write output to stdout
rm $TMPDIR/trips.parquet # Clean up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think have the data loader checked-in but not functional, even if undocumented, is a big help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks for including it! I agree that the data loader is an important component of the example, even if non-functional in this particular deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked in a simple version that does not try to work in CI — with the expectation that duckdb is on the $PATH.
Note: it's not active, only here for reference. To make this data loader work in CI, you have to install the proper duckdb binary on the $PATH, and I'd recommend to use a dedicated $TMPDIR rather than write in the root folder.
@@ -0,0 +1,25 @@ | |||
duckdb :memory: << EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have a shebang? One of these probably…
duckdb :memory: << EOF | |
#!/usr/bin/env bash | |
duckdb :memory: << EOF |
duckdb :memory: << EOF | |
#!/bin/sh | |
duckdb :memory: << EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so; shebang are only used for .exe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so even if you have a shebang, it’s ignored because we explicitly run it with sh
? If it’s not ignored, we should add one just to document our expectations around how the script is interpreted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes sh ignores the shebang and interprets the rest of the code directly
❯ chmod +x ./test.py
❯ cat ./test.py
#! /usr/bin/env python3
print("hello, world")
❯ bash ./test.py
./test.py: line 2: syntax error near unexpected token `"hello, world"'
./test.py: line 2: `print("hello, world")'
❯ ./test.py
hello, world
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for educating me. 😄
Co-authored-by: Mike Bostock <mbostock@gmail.com>
Waiting for #1015 to land and then we’ll update this. |
related #1011
cc: @jheer