Convert Arrow <--> Parquet, and hence Awkward <--> Parquet. #343

jpivarski · 2020-07-16T20:03:21Z

No description provided.

…rness and laziness. Added 'metadata' on ak.Array and ak.Record (for VirtualArray cache).

… data.

jpivarski · 2020-07-17T01:15:43Z

@martindurant As expected, Parquet support was rather easy—I just had to copy the old code. Awkward PartitionedArrays translate into row groups, top-level RecordArrays translate into record batches, and any other kind of array is presented as a record batch with one field whose name is the empty string. (This convention is also used when reading back: if there's only one field and its name is the empty string, we read back into a non-empty array.)

I also tested it on some ancient samples I made for OAMap, but even after all these years, most of the data structures are not supported by pyarrow:

https://github.com/scikit-hep/awkward-1.0/blob/97b165e53666d13a88da247ea18d577bfdf85761/tests/test_0341-parquet-reader-writer.py#L105-L246

To mitigate this, I added an explode_records option to ak.to_parquet. The transformation isn't lossy, but the result would have to be read back into ak.zip. Not ideal.

But hey, we have both eager and lazy reading now!

Convert Arrow <--> Parquet, and hence Awkward <--> Parquet.

fcf4036

This was linked to issues Jul 16, 2020

Copy the OAMap Parquet samples and make awkward.arrow.fromparquet tests #172

Closed

Dasky thoughts for awkward #284

Closed

Does awkward want to promote itself yet as a general nested processor? #303

Closed

jpivarski removed a link to an issue Jul 16, 2020

Dasky thoughts for awkward #284

Closed

jpivarski added 3 commits July 16, 2020 19:30

Completely implmented Parquet reading and writing, with tests of eage…

e8ee755

…rness and laziness. Added 'metadata' on ak.Array and ak.Record (for VirtualArray cache).

Add old OAMap Parquet samples and remember to pytest.importorskip.

ea21381

Add an 'explode_records' option to make it easier to write structured…

97b165e

… data.

jpivarski merged commit f5d3282 into master Jul 17, 2020

jpivarski deleted the jpivarski/parquet-reader-writer branch July 17, 2020 01:54

jonmmease mentioned this pull request Aug 10, 2020

SpatialPandas design and features holoviz/spatialpandas#1

Open

60 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert Arrow <--> Parquet, and hence Awkward <--> Parquet. #343

Convert Arrow <--> Parquet, and hence Awkward <--> Parquet. #343

jpivarski commented Jul 16, 2020

jpivarski commented Jul 17, 2020

Convert Arrow <--> Parquet, and hence Awkward <--> Parquet. #343

Convert Arrow <--> Parquet, and hence Awkward <--> Parquet. #343

Conversation

jpivarski commented Jul 16, 2020

jpivarski commented Jul 17, 2020