Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor crate structure #241

Merged
merged 18 commits into from
Feb 10, 2023
Merged

Refactor crate structure #241

merged 18 commits into from
Feb 10, 2023

Conversation

jonmmease
Copy link
Collaborator

@jonmmease jonmmease commented Feb 10, 2023

This PR includes a significant refactor of the VegaFusion crate structure. Here are the new crates:

Crates

vegafusion-common

A subset of the functionality of vegafusion-core was extracted into vegafusion-common. This includes the error struct and base data structures. This is now a dependency of vegafusion-core and the new vegafusion-dataframe crates.

vegafusion-dataframe

The DataFrame and Connection traits have been pulled out into a vegafusion-dataframe crate. This is used as the dependency of vegafusion-runtime, this way there is no hard dependency on SqlDataFrame and SqlConnection.

vegafusion-sql

This crate provides the SqlConnection and SqlDataFrame structs with implement the Connection and DataFrame traits from the vegafusion-dataframe crate using SQL. The functionality for generating SQL string across dialects is always available in the crate. Optional support for evaluating the queries is enabled by feature flags with a -conn suffix.

The datafusion-conn and sqlite-conn feature flags are currently supported.

vegafusion-datafusion-udfs

This crate contains the definitions of the DataFusion UDFs that are used to implement select Vega expression functions and transforms. These UDFs are used in two places.

  • The DataFusionConnection provided by vegafusion-sql adds these UDFs to its SessionContext so that they are available for use in SQL querires.
  • The vegafusion-runtime crate uses these UDFs for the evaluation of signal expressions and for simplifying expressions passed to the filter and formula transforms. Note: Even when a non-DataFusion Connection is used, DataFusion is still used for signal evaluation and expression simplification.

vegafusion-runtime

This crate was renamed from vegafusion-rt-datafusion to vegafusion-runtime. Along with this, the TaskGraphRuntime struct was renamed to VegaFusionRuntime.

Other refactoring

The VegaFusionRuntime struct now accepts a Arc<dyn Connection> in the constructor. This defines the connection that the runtime uses by default. Currently vegafusion-python-embed sets this to DataFusionConnection.

This change made it possible to remove the dependency on datafusion-core from vegafusion-runtime. Now, the only time we depend on all of datafusion is when the datafusion-conn feature flag is enabled on the vegafusion-sql crate.

Motivation

This work lays the foundation for several exciting possibilities:

  • Non-datafusion SQL dialects and execution connections can be added to vegafusion-sql, and then used in vegafusion-runtime with no changes to that crate.
  • vegafusion-runtime should now compile to WASM (since there is no top-level datafusion dependency, which was a difficulty in the past). Once the DuckDB SQL dialect is supported, it should be possible to compile the runtime to WASM and use DuckDB WASM for execution.
  • Non-SQL implementations of the DataFrame and Connection traits can be developed. e.g. vegafusion-polars, vegafusion-substrait, etc.

@jonmmease jonmmease merged commit 7e3feea into main Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant