1.1.0rc4 hangs after repeated calls to pre_transform_datasets #268

jonmmease · 2023-03-20T15:04:51Z

Ran into a regression in pre_transform_datasets in 1.1.0rc4 (introduced after 1.1.0rc3 I believe). Here's a repro

import vegafusion as vf
import pandas as pd
import json
vf.__version__

'1.1.0-rc4'

movies = pd.read_json("https://raw.githubusercontent.com/vega/vega-datasets/main/data/movies.json")
spec = json.loads(r""" 
{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "data": [
    {
      "name": "interval_intervalselection__store"
    },
    {
      "name": "legend_pointselection__store"
    },
    {
      "name": "pivot_hover_32f2e9aa_f08a_4fb5_aa8a_ab3f2cc94a1d_store"
    },
    {
      "name": "movies_clean",
      "url": "vegafusion+dataset://movies_clean"
    },
    {
      "name": "data_0",
      "source": "movies_clean",
      "transform": [
        {
          "type": "formula",
          "expr": "toDate(datum[\"Release Date\"])",
          "as": "Release Date"
        }
      ]
    },
    {
      "name": "data_2",
      "source": "data_0",
      "transform": [
        {
          "field": "Release Date",
          "type": "timeunit",
          "units": [
            "year"
          ],
          "as": [
            "year_Release Date",
            "year_Release Date_end"
          ]
        }
      ]
    },
    {
      "name": "data_3",
      "source": "data_2",
      "transform": [
        {
          "type": "filter",
          "expr": "!length(data(\"interval_intervalselection__store\")) || vlSelectionTest(\"interval_intervalselection__store\", datum)"
        },
        {
          "type": "filter",
          "expr": "time('1986-11-09T18:28:05.617') <= time(datum[\"year_Release Date\"]) && time(datum[\"year_Release Date\"]) <= time('2001-09-16T22:23:39.144')"
        },
        {
          "type": "filter",
          "expr": "!length(data(\"legend_pointselection__store\")) || vlSelectionTest(\"legend_pointselection__store\", datum)"
        }
      ]
    }
  ]
}
""")
for i in range(20):
    print(f"pre_transform_datasets: {i}")
    vf.runtime.pre_transform_datasets(spec, ["data_3"], "UTC", inline_datasets=dict(movies_clean=movies))
    print("done")

for i in range(20):
    print(f"pre_transform_datasets: {i}")
    vf.runtime.pre_transform_datasets(spec, ["data_3"], "UTC", inline_datasets=dict(movies_clean=movies))
    print("done")
for i in range(20):
    print(f"pre_transform_datasets: {i}")
    vf.runtime.pre_transform_datasets(spec, ["data_3"], "UTC", inline_datasets=dict(movies_clean=movies))
    print("done")
pre_transform_datasets: 0
done
pre_transform_datasets: 1
done
pre_transform_datasets: 2
done
pre_transform_datasets: 3
done
pre_transform_datasets: 4
done
pre_transform_datasets: 5
done
pre_transform_datasets: 6
done
pre_transform_datasets: 7

The calls to pre_transform_dataset are very quick up until pre_transform_datasets: 7, then the process hangs indefinitely.

With 1.1.0rc3, the loop completes in a couple of seconds without issue.

Of the changes between 1.1.0rc3 and 1.1.0rc4, these two look like the only PRs that touched relevant code:

Various fixes to support updated Taxi Dashboard example by @jonmmease in Various fixes to support updated Taxi Dashboard example #262
Re-enable zero copy from pyarrow to arrow-rs by @jonmmease in Re-enable zero copy from pyarrow to arrow-rs #264

So i'm going to try reverting each of these locally to narrow in on what caused the regression.

The text was updated successfully, but these errors were encountered:

jonmmease · 2023-03-20T15:23:35Z

Hmm, looks like #264 introduced the regression

This works around #268 by copying the input pyarrow table through the IPC bytes representations. It also allows us to properly hash the input PyArrow table, which allows the cache to work properly.

* work around #268 and fix table fingerprint This works around #268 by copying the input pyarrow table through the IPC bytes representations. It also allows us to properly hash the input PyArrow table, which allows the cache to work properly. * Use to_pyarrow instead of ipc bytes for output of pre_transform_datasets

jonmmease added the bug Something isn't working label Mar 20, 2023

jonmmease mentioned this issue Mar 20, 2023

Fix 1.1.0rc4 hang in pre_transform_dataset #269

Merged

jonmmease closed this as completed in #269 Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.1.0rc4 hangs after repeated calls to pre_transform_datasets #268

1.1.0rc4 hangs after repeated calls to pre_transform_datasets #268

jonmmease commented Mar 20, 2023

jonmmease commented Mar 20, 2023

1.1.0rc4 hangs after repeated calls to pre_transform_datasets #268

1.1.0rc4 hangs after repeated calls to pre_transform_datasets #268

Comments

jonmmease commented Mar 20, 2023

jonmmease commented Mar 20, 2023