Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mosaic vgplot #1015

Merged
merged 17 commits into from
Mar 8, 2024
Merged

Mosaic vgplot #1015

merged 17 commits into from
Mar 8, 2024

Conversation

mbostock
Copy link
Member

@mbostock mbostock commented Mar 7, 2024

Fixes #1011. Pretty awesome how concise it is! TODO: Live-reload of vg when the SQL front matter is edited. Done.

Screenshot 2024-03-07 at 12 37 55 PM
---
sql:
  gaia: ./gaia-sample.parquet
---

# Hello, vgplot

```js echo
vg.plot(vg.raster(vg.from("gaia"), {x: "ra", y: "dec", fill: "density"}))
```

Fil and others added 4 commits March 7, 2024 10:27
Note: it's not active, only here for reference. To make this data loader work in CI, you have to install the proper duckdb binary on the $PATH, and I'd recommend to use a dedicated $TMPDIR rather than write in the root folder.
Co-authored-by: Mike Bostock <mbostock@gmail.com>
@mbostock mbostock requested a review from Fil March 7, 2024 20:38
@domoritz
Copy link

domoritz commented Mar 7, 2024

Super cool to see this coming natively! Btw, I found that the u and v fields in Gaia are a bit nicer than ra and dec since they align the Milky way nicely.

@mbostock
Copy link
Member Author

mbostock commented Mar 7, 2024

Thanks @domoritz! It looks like our gaia-sample.parquet doesn’t include the u and v fields.

I can’t figure out how to write SQL expressions in the vgplot specification. I just wanted to try showing -ra instead of ra. Here are some approaches that I tried that didn’t work. First I tried just passing a SQL expression directly to x:

vg.plot(vg.raster(vg.from("gaia"), {x: "-ra", y: "dec", fill: "density"}))

Then I tried using a vg.sql tagged template literal as show here [1]:

vg.plot(vg.raster(vg.from("gaia"), {x: vg.sql`-ra`, y: "dec", fill: "density"}))

I also tried vg.from(table).select

vg.plot(vg.raster(vg.from("gaia").select({u: "-ra", v: "dec"}), {x: "u", y: "v", fill: "density"}))

And vg.Query.select(…).from(table)

vg.plot(vg.raster(vg.Query.select({u: "ra", v: "dec"}).from("gaia"), {x: "u", y: "v", fill: "density"}))

Is there a way to pass SQL expressions to channels? Is this a bug in how we’ve configured Mosaic here?

@mbostock
Copy link
Member Author

mbostock commented Mar 7, 2024

Maybe the problem is that I’m passing in a DuckDB instance that already has the tables defined, and so Mosaic doesn’t know which tables exist? (Though vg.from("gaia") seems to work fine…) Like the documentation says to run:

vg.coordinator().exec(vg.loadCSV("stocks", "stock-data.csv"));

but what if the DuckDB instance I pass to wasmConnector already has tables loaded (because I want to use that DuckDB instance directly, too, not just within Mosaic)? Do I need to tell the coordinator about those existing tables?

@jheer
Copy link

jheer commented Mar 7, 2024

Mosaic can work with tables loaded externally to Mosaic. That is not an issue. You just need to reference extant table and column names.

For custom SQL, we currently don't do any extra parsing. As a result we need to explicitly indicate what is a column reference in order to appropriately track dependencies, pre-load stats, etc.

This should work for you:

{ x: vg.sql`-${vg.column("ra")}`, ... }

Admittedly the syntax could be nicer and more ergonomic here!

@mbostock
Copy link
Member Author

mbostock commented Mar 7, 2024

Perhaps this is a bug in Mosaic with vg.raster? Because vg.dot works fine:

Screenshot 2024-03-07 at 2 07 16 PM
vg.plot(vg.dot(vg.from("gaia"), {x: vg.sql`-ra`, y: "dec", r: 0.5, fill: "currentColor"}))

@jheer
Copy link

jheer commented Mar 7, 2024

Perhaps this is a bug in Mosaic with vg.raster? Because vg.dot works fine.

The difference here is that raster needs to know column dependencies to request column-level summary statistics (e.g., to inform the binning process). Dot does not so simply passes a SQL string through. Obviously we'd prefer to make this process smoother down the road. I'll look into more robust dependency handling / column stats lookup when I can.

@mbostock
Copy link
Member Author

mbostock commented Mar 7, 2024

Thanks for the suggestion, @jheer. That one evaluates but unfortunately I get zeroes for everything:

Screenshot 2024-03-07 at 2 10 40 PM
vg.plot(vg.raster(vg.from("gaia"), {x: vg.sql`-${vg.column("ra")}`, y: "dec", fill: "density", pixelSize: 1}))

However, it works correctly if I use 360 - ra instead…

Screenshot 2024-03-07 at 2 10 23 PM
vg.plot(vg.raster(vg.from("gaia"), {x: vg.sql`360 - ${vg.column("ra")}`, y: "dec", fill: "density", pixelSize: 1}))

Maybe the raster mark is inferring the wrong x-domain when given an expression? It seems to think that the domain is [0, 360] even when it’s [-360, 0].

@mbostock
Copy link
Member Author

mbostock commented Mar 7, 2024

Okay, confirmed that setting the x-domain explicitly fixes it. 🙏

Screenshot 2024-03-07 at 2 13 27 PM
vg.plot(vg.raster(vg.from("gaia"), {x: vg.sql`-${vg.column("ra")}`, y: "dec", fill: "density", pixelSize: 1}), vg.xDomain([-360, 0]))

@jheer
Copy link

jheer commented Mar 7, 2024

Maybe the raster mark is inferring the wrong x-domain when given an expression? It seems to think that the domain is [0, 360] even when it’s [-360, 0].

Yes, that is also what I think, and will be part of what I'll look into as noted above. Thanks for your time digging in and flagging this.

@mbostock mbostock mentioned this pull request Mar 7, 2024
@mbostock mbostock marked this pull request as ready for review March 8, 2024 00:55
@mbostock
Copy link
Member Author

mbostock commented Mar 8, 2024

Okay, this is ready!

There’s some followup work to address duplicate library loads:

  • @uwdata/mosaic-core depends on apache-arrow 15
  • @uwdata/mosaic-core depends on @duckdb/duckdb-wasm 1.28.1 (pre-release)
  • @duckdb/duckdb-wasm 1.28.1 depends on apache-arrow 14
  • @observablehq/duckdb depends on @duckdb/duckdb-wasm 1.28.0 (stable)
  • @duckdb/duckdb-wasm 1.28.0 depends on apache-arrow 13

As a result, we get three (3!) versions of apache-arrow loaded, and two (2!) versions of duckdb-wasm loaded.

Also, only two versions (14 and 15) of apache-arrow are preloaded, and only one version of duckdb-wasm (1.28.1) is preloaded. 😞 This is another bug…

Ultimately, I think we’ll have to rewrite the imports that jsDelivr generates to match what we resolve locally. And I think we’ll have to disregard semantic versioning in the case of apache-arrow (as they don’t use semantic versioning) to ensure that we always get a consistent version of apache-arrow.

These are pre-existing problems, though, so shouldn’t be tied to this PR.

@jheer
Copy link

jheer commented Mar 8, 2024

The next version of DuckDB-WASM uses Arrow 15, so hopefully all the dependencies line up soon: https://github.com/duckdb/duckdb-wasm/blob/main/packages/duckdb-wasm/package.json

@Fil
Copy link
Contributor

Fil commented Mar 8, 2024

If uv is the natural earth projection, you can compute them in the client; you just need to make the chart fenced code block dependent on the result of the projection.

---
sql:
  gaia: ./gaia-sample.parquet
---

```js echo
projected && vg.plot(
  vg.raster(vg.from("gaiauv"), {x: "u", y: "v", fill: "density"}),
  vg.width(800), vg.height(400) // aspect ratio
)
```

```sql id=projected
-- compute u and v with natural earth projection
CREATE TABLE gaiauv AS (

WITH prep AS (
  SELECT
    radians((-ra + 540) % 360 - 180) AS lambda,
    radians(dec) AS phi,
    asin(sqrt(3)/2 * sin(phi)) AS t,
    t^2 AS t2,
    t2^3 AS t6,
    *
  FROM gaia
  WHERE parallax BETWEEN -5 AND 20
)

SELECT
  (1.340264 * lambda * cos(t)) / (sqrt(3)/2 * (1.340264 + (-0.081106 * 3 * t2) + (t6 * (0.000893 * 7 + 0.003796 * 9 * t2)))) AS u,
  t * (1.340264 + (-0.081106 * t2) + (t6 * (0.000893 + 0.003796 * t2))) AS v
FROM prep

)
```

Capture d’écran 2024-03-08 à 08 11 15

@domoritz
Copy link

domoritz commented Mar 8, 2024

No, they are not a projection but a rotation so that the milky way is along the horizontal.

@Fil
Copy link
Contributor

Fil commented Mar 8, 2024

Then we'll have to use the rotation matrix J2000 (https://observablehq.com/@fil/galactic-rotations). 😓

@domoritz
Copy link

domoritz commented Mar 8, 2024

Oh, that's a nice notebook. I tried to do the rotation in sql some time ago but gave up and just used the correctly rotated variables instead but it'd be cool to see how the rotation can be done in sql.

@mbostock mbostock enabled auto-merge (squash) March 8, 2024 16:57
@mbostock mbostock changed the title vgplot Mosaic vgplot Mar 8, 2024
@mbostock mbostock merged commit 646055a into main Mar 8, 2024
4 checks passed
@mbostock mbostock deleted the mbostock/vgplot branch March 8, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better support for Mosaic?
4 participants