Skip to content

Latest commit

 

History

History
235 lines (147 loc) · 17 KB

intro.md

File metadata and controls

235 lines (147 loc) · 17 KB

Introduction to Figurl

Overview

figurl overview

Figurl is a platform for creating and sharing interactive visualizations. From a Python script, users can create interactive browser-based figures that can immediately be shared simply by copy-pasting the link. The data needed for visualization is uploaded to the Kachery cloud and then retrieved by the browser to render using a custom visualization. Each domain-specific visualization plugin is a static HTML bundle that has been built using ReactJS / typescript. This plugin is embedded into the main figurl web application within an HTML iframe.

In addition to managing the flow of data from the cloud to the visualization, Figurl also handles advanced capabilities of the figure, such as user login for curation, GitHub integration, and lazy loading of additional data objects requested by the visualization plugin.

Examples

Example: Static plot

The following simple example uses Altair, a Python wrapper around the Vega-lite visualization grammar. Any Altair chart can be turned into a figurl figure.

import figurl as fig

import altair as alt
from vega_datasets import data

# Create an Altair chart
# This one comes from the Altair Example Gallery
stocks = data.stocks()

chart = alt.Chart(stocks).mark_line().encode(
  x='date:T',
  y='price',
  color='symbol'
).interactive(bind_y=False)

# Create and print the figURL
url = fig.Altair(chart).url(label='stocks chart')
print(url)

# Output: 
# https://figurl.org/f?v=gs://figurl/vegalite-2&d=sha1://0369af9f1a54a5a410f99e63cb08b6b899d1c92f&label=stocks%20chart

Here's the example script a link to the output figure.

Note that this script can be run from anywhere, and the output URL is shareable and archivable.

image

Example: Report

A figurl report consists of a collection of sections that are laid out vertically in a scrollable figure. Sectioncan contain markdown text, static plots, and more advanced interactive figures.

This report is the output of this example script and is self-explanatory.

Here's an example report that documents the results of a benchmarking script for the FINUFFT project. See finufft-benchmark.

image

Example: SortingView

The real power of figurl is the opportunity for domain-specific custom visualizations.

SortingView is built on figurl and allows users to view, curate, and share results of electrophysiological spike sorting in the browser. Here is an example figure that displays the output of spike sorting on a small simulated ephys dataset. Click the buttons in the upper-left corner to launch the various synchronized views. You can drag tabs between the top and bottom view areas.

This visualization also facilitates manual curation: labeling and merging of neural units.

image

Example: Multi-trial spike train viewer

Here's another domain-specific figure in the area of electrophysiology and spike sorting.

See multitrial-raster. Here is a sample output figure. This visualization presents spike trains for hundreds of units over hundreds of trials. You can use the slider controls at the bottom to either view all units at once and slice through the trials, or view all the trials at once and slice through the units.

image

Example: Multi-panel timeseries

Here's an example of multiple timeseries widgets that are stacked vertically. These are zoomable and synchronized. The top panel is a spike raster plot. The bottom panel uses a live backend to compute data on demand depending on the zoom activity of the user.

(Not publicly accessible at this time)

image

Example: Animal track animation

Animation of animal position on a track over time.

(Not publicly accessible at this time)

image

Example: VolumeView

VolumeView is a figurl plugin for visualizing 3D volumetric data, vector fields, and surfaces.

Here are some example output figures:

image

Example: Tiled image

figurl-tiled-image allows interactive visualization of very large images in a multi-scale zoomable tiled image display (Google maps style) using deck.gl. You can view a stack of images, and interactively toggle between the various layers.

Here's an zoomable fractal image.

Basic usage:

From Numpy array:

import numpy as np
from figurl_tiled_image import TiledImage

array1 = ... # create a color image numpy array [N1 x N2 x 3] uint8
array2 = ... # create a color image numpy array [N1 x N2 x 3] uint8

X = TiledImage(tile_size=4096)
X.add_layer('layer 1', array1)
X.add_layer('layer 2', array2)
url = X.url(label='Numpy example')
print(url)

From image file:

import pyvips
from figurl_tiled_image import TiledImage

filename1 = '/path/to/some/image1.png' # substitute the path to your image
image1 = pyvips.new_from_file(filename1)

filename2 = '/path/to/some/image2.png' # substitute the path to your image
image2 = pyvips.new_from_file(filename2)

X = TiledImage(tile_size=4096)
X.add_layer('layer 1', image1)
X.add_layer('layer 2', image2)
url = X.url(label='Example')
print(url)

image

Example: Preview raw ephys traces

The above TiledImage plugin can be used with SpikeInterface to generate zoomable views of raw ephys data at various stages of preprocessing. This is useful for quality control and for inspecting the effects of filtering, denoising, etc. See these notes.

This figure represents Neuropixels 2.0 data after three different pre-processing steps (centering, filtering, and referencing).

image

Here's an example of 60 seconds of Neuropixels raw data (384 channels). This is more than 1 GB of data efficiently loaded into the browser (Google maps style).

Advantages and discussion

Shareable links

The shareable link is one of the most convenient and reliable ways to share information. With their brief and concise code, URLs can be sent via email, text, stored in databases or shared anywhere else. Additionally, links occupy far less storage space than conventional figures, don't need special programs to access them and can be opened from any location with internet access. What's more, if the content's source is a reliable archive, URLs can even be added to publications or kept in permanent storage.

Traditional software programs used for scientific visualization generally produce files rather than URLs. These files can be in well-known (non-interactive) formats (such as pdf, png, or svg), or in formats specific to a particular application, requiring installation and configuration. In some cases, no files are created at all, and the software simply interacts with an existing process. This makes it difficult to use the software in various locations and settings; for example, some plotting libraries work only in Jupyter notebooks, and need to be adjusted for other settings. Cloud-based processes as part of continuous integration systems also have limited interaction with external systems, and the visualization files they generate are not always easy to obtain or open.

Besides the flexibility and ease of use of figURL links, web-based visualizing programs offer a variety of advantages when compared to desktop tools. The primary benefit is that web browsers are compatible across all desktop and mobile operating systems. Additionally, web applications can be updated without any input from the user. Other advantages include authentication, teamwork, and integration with cloud systems. While desktop software provides some advantages in terms of access to local data, in most cases, browser-based and cloud-based visualization tools are the most practical for a user and provide the greatest chances of collaboration and integration.

Content-addressable storage

Figurl stores data in kachery-cloud which uses content-addressible URIs for locating files. Here is an example of a URI that points to a chunk of data in JSON format:

sha1://21df8ad1fd24b9d9ad112b70de5cd5f7cd67d2a8

The string of characters is a content hash that uniquely points to the underlying file, like a fingerprint. The assumption is that no two files exist with the same content hash. This URI therefore points to the file by content and not by location. This is important for creating figURLs because we may want to move data around or change how it is accessed without invalidating URLs that have already been distributed or stored in a database. Here is a pointer to a figure that uses the above file as the d parameter in the query string:

https://www.figurl.org/f?v=gs://figurl/figurl-report&d=sha1://21df8ad1fd24b9d9ad112b70de5cd5f7cd67d2a8&label=FINUFFT%20benchmark

It is also possible to retrieve that chunk of data directly from Kachery cloud using the Python or command-line interface:

kachery-cloud-cat sha1://21df8ad1fd24b9d9ad112b70de5cd5f7cd67d2a8
import kachery_cloud as kcl

a = kcl.load_json('sha1://21df8ad1fd24b9d9ad112b70de5cd5f7cd67d2a8')
print(a)

Since we do not use location-specific data to access the file, we can shift where and how the data is kept without invalidating the link. For instance, if we decide to publish the visualization, we could transfer the data to a long-term storage system. The link would stay the same.

Another advantage of employing content hashes is that we have the ability to validate the accuracy of the data file. When the file is obtained, figurl (and the Kachery client) can verify whether the content hash corresponds to the URI.

TODO: talk about why we don't use IPFS

Visualization plugins

The figurl web app (https://figurl.org) pairs the data object defined by the d query parameter in the figURL with the visualization plugin (v query parameter). The visualization plugin is a static HTML bundle, containing all the html and javascript files that have been compiled down from the ReactJS/typescript application. You can think of it as a binary executable that gets downloaded and executed by the web browser. The figurl web app loads the plugin into an embedded iframe and manages the interaction between the plugin and the kachery-cloud network (authentication, file downloads, etc).

Usually the visualization plugin is hosted on a cloud storage bucket. For example, in the Altair plot of the basic example, it is found at gs://figurl/vegalite-2 on a Google bucket (note the -2 at the end of the URI). If we want to make updates to the visualization that will not affect existing links (improve the layout, add features, etc.), then the new HTML bundle can be uploaded to the same place. However, if the changes made will break existing links (e.g., data spec adjustments), then the version number should be incremented, the new bundle uploaded to gs://figurl/vegalite-3, and all future figURLs pointed to the new location.

Visualization plugins are simply static websites that are embedded in the parent figurl.org web app. This is a big simplification compared with traditional websites that usually require a running server that provides a working API. The real work is performed by the parent figurl.org web application. This design is what allows us to store visualization plugins on storage buckets for long-term availability which is crucial for allowing figURLs to stay valid even as the visualization plugins are updated and improved over time.

Creating a visualization plugin

TODO: This section needs to be written. Contact us for more information on creating your own Figurl visualization plugin.

Using your own cloud storage

By default, your data files will be stored using our cloud resources, and they are not guaranteed to be available forever. You can also configure figurl to use your own storage buckets by creating a Kachery zone.