You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a copy of the parquet-hadoop file ParquetMetadataConverter, with local modifications. We hack it on top of the real one to get different behavior, presumably as a result of some issue in our own reader (since somehow nobody else need to hack this file to make parquet work in Java).
Figure out what is the issue, fix it, and remove the need for it.
Relates to #806 (likely a prerequisite, but not certain if it is enough).
Trying to read back a file written with LZ4 results in:
org.apache.parquet.io.ParquetDecodingException: could not read page in col [store_and_fwd_flag] optional binary store_and_fwd_flag (STRING) as the dictionary was missing for encoding RLE_DICTIONARY
at io.deephaven.parquet.ColumnPageReaderImpl.getDataReader(ColumnPageReaderImpl.java:760)
at io.deephaven.parquet.ColumnPageReaderImpl.readPageV1(ColumnPageReaderImpl.java:333)
at io.deephaven.parquet.ColumnPageReaderImpl.readDataPage(ColumnPageReaderImpl.java:201)
at io.deephaven.parquet.ColumnPageReaderImpl.materialize(ColumnPageReaderImpl.java:75)
at io.deephaven.db.v2.locations.parquet.topage.ToPage.getResult(ToPage.java:52)
at io.deephaven.db.v2.locations.parquet.topage.ToPage.toPage(ToPage.java:77)
at io.deephaven.db.v2.locations.parquet.ColumnChunkPageStore.toPage(ColumnChunkPageStore.java:158)
at io.deephaven.db.v2.locations.parquet.VariablePageSizeColumnChunkPageStore.getPage(VariablePageSizeColumnChunkPageStore.java:111)
at io.deephaven.db.v2.locations.parquet.VariablePageSizeColumnChunkPageStore.getPageContaining(VariablePageSizeColumnChunkPageStore.java:150)
at io.deephaven.db.v2.locations.parquet.VariablePageSizeColumnChunkPageStore.getPageContaining(VariablePageSizeColumnChunkPageStore.java:17)
at io.deephaven.db.v2.sources.chunk.page.PageStore.fillChunk(PageStore.java:67)
at io.deephaven.db.v2.sources.regioned.ParquetColumnRegionBase.fillChunk(ParquetColumnRegionBase.java:50)
at io.deephaven.db.v2.sources.regioned.DeferredColumnRegionBase.fillChunk(DeferredColumnRegionBase.java:71)
at io.deephaven.db.v2.sources.chunk.page.PageStore.fillChunk(PageStore.java:71)
at io.deephaven.db.v2.sources.regioned.RegionedColumnSourceBase.fillChunk(RegionedColumnSourceBase.java:31)
at io.deephaven.db.v2.sources.regioned.RegionedColumnSourceObject$AsValues.fillChunk(RegionedColumnSourceObject.java:37)
at io.deephaven.db.v2.remote.ConstructSnapshot.getSnapshotDataAsChunk(ConstructSnapshot.java:1365)
at io.deephaven.db.v2.remote.ConstructSnapshot.serializeAllTable(ConstructSnapshot.java:1285)
at io.deephaven.db.v2.remote.ConstructSnapshot.lambda$constructBackplaneSnapshotInPositionSpace$2(ConstructSnapshot.java:575)
at io.deephaven.db.v2.remote.ConstructSnapshot.callDataSnapshotFunction(ConstructSnapshot.java:1045)
at io.deephaven.db.v2.remote.ConstructSnapshot.callDataSnapshotFunction(ConstructSnapshot.java:977)
at io.deephaven.db.v2.remote.ConstructSnapshot.constructBackplaneSnapshotInPositionSpace(ConstructSnapshot.java:578)
at io.deephaven.grpc_api.barrage.BarrageMessageProducer.getSnapshot(BarrageMessageProducer.java:1524)
at io.deephaven.grpc_api.barrage.BarrageMessageProducer.updateSubscriptionsSnapshotAndPropagate(BarrageMessageProducer.java:926)
at io.deephaven.grpc_api.barrage.BarrageMessageProducer.access$1400(BarrageMessageProducer.java:89)
at io.deephaven.grpc_api.barrage.BarrageMessageProducer$UpdatePropagationJob.run(BarrageMessageProducer.java:790)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.deephaven.grpc_api.runner.DeephavenApiServerModule$ThreadFactory.lambda$newThread$0(DeephavenApiServerModule.java:143)
at java.lang.Thread.run(Thread.java:748)
The code I used:
t = readTable('/data/eth_v2_p1_cBROTLI.parquet') # From the deephaven-core-parquet-examples repo
writeTable(t, 'LZ4', '/data/t_LZ4.parquet')
tmore = readTable('/data/t_LZ4.parquet')
The text was updated successfully, but these errors were encountered:
jcferretti
changed the title
Remove the hack that makes the ParquetHadoop module necessary and remove the module.
Remove the hacked ParquetMetadataConverter.java and the need for the ParquetHadoop module.
Aug 9, 2021
We have a copy of the parquet-hadoop file ParquetMetadataConverter, with local modifications. We hack it on top of the real one to get different behavior, presumably as a result of some issue in our own reader (since somehow nobody else need to hack this file to make parquet work in Java).
Figure out what is the issue, fix it, and remove the need for it.
Relates to #806 (likely a prerequisite, but not certain if it is enough).
--
I tried. Didn't go well. I removed the code and module in branch
https://github.com/jcferretti/deephaven-core/tree/cfs-parquethadoop-module-removal-0
Trying to read back a file written with LZ4 results in:
org.apache.parquet.io.ParquetDecodingException: could not read page in col [store_and_fwd_flag] optional binary store_and_fwd_flag (STRING) as the dictionary was missing for encoding RLE_DICTIONARY
at io.deephaven.parquet.ColumnPageReaderImpl.getDataReader(ColumnPageReaderImpl.java:760)
at io.deephaven.parquet.ColumnPageReaderImpl.readPageV1(ColumnPageReaderImpl.java:333)
at io.deephaven.parquet.ColumnPageReaderImpl.readDataPage(ColumnPageReaderImpl.java:201)
at io.deephaven.parquet.ColumnPageReaderImpl.materialize(ColumnPageReaderImpl.java:75)
at io.deephaven.db.v2.locations.parquet.topage.ToPage.getResult(ToPage.java:52)
at io.deephaven.db.v2.locations.parquet.topage.ToPage.toPage(ToPage.java:77)
at io.deephaven.db.v2.locations.parquet.ColumnChunkPageStore.toPage(ColumnChunkPageStore.java:158)
at io.deephaven.db.v2.locations.parquet.VariablePageSizeColumnChunkPageStore.getPage(VariablePageSizeColumnChunkPageStore.java:111)
at io.deephaven.db.v2.locations.parquet.VariablePageSizeColumnChunkPageStore.getPageContaining(VariablePageSizeColumnChunkPageStore.java:150)
at io.deephaven.db.v2.locations.parquet.VariablePageSizeColumnChunkPageStore.getPageContaining(VariablePageSizeColumnChunkPageStore.java:17)
at io.deephaven.db.v2.sources.chunk.page.PageStore.fillChunk(PageStore.java:67)
at io.deephaven.db.v2.sources.regioned.ParquetColumnRegionBase.fillChunk(ParquetColumnRegionBase.java:50)
at io.deephaven.db.v2.sources.regioned.DeferredColumnRegionBase.fillChunk(DeferredColumnRegionBase.java:71)
at io.deephaven.db.v2.sources.chunk.page.PageStore.fillChunk(PageStore.java:71)
at io.deephaven.db.v2.sources.regioned.RegionedColumnSourceBase.fillChunk(RegionedColumnSourceBase.java:31)
at io.deephaven.db.v2.sources.regioned.RegionedColumnSourceObject$AsValues.fillChunk(RegionedColumnSourceObject.java:37)
at io.deephaven.db.v2.remote.ConstructSnapshot.getSnapshotDataAsChunk(ConstructSnapshot.java:1365)
at io.deephaven.db.v2.remote.ConstructSnapshot.serializeAllTable(ConstructSnapshot.java:1285)
at io.deephaven.db.v2.remote.ConstructSnapshot.lambda$constructBackplaneSnapshotInPositionSpace$2(ConstructSnapshot.java:575)
at io.deephaven.db.v2.remote.ConstructSnapshot.callDataSnapshotFunction(ConstructSnapshot.java:1045)
at io.deephaven.db.v2.remote.ConstructSnapshot.callDataSnapshotFunction(ConstructSnapshot.java:977)
at io.deephaven.db.v2.remote.ConstructSnapshot.constructBackplaneSnapshotInPositionSpace(ConstructSnapshot.java:578)
at io.deephaven.grpc_api.barrage.BarrageMessageProducer.getSnapshot(BarrageMessageProducer.java:1524)
at io.deephaven.grpc_api.barrage.BarrageMessageProducer.updateSubscriptionsSnapshotAndPropagate(BarrageMessageProducer.java:926)
at io.deephaven.grpc_api.barrage.BarrageMessageProducer.access$1400(BarrageMessageProducer.java:89)
at io.deephaven.grpc_api.barrage.BarrageMessageProducer$UpdatePropagationJob.run(BarrageMessageProducer.java:790)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.deephaven.grpc_api.runner.DeephavenApiServerModule$ThreadFactory.lambda$newThread$0(DeephavenApiServerModule.java:143)
at java.lang.Thread.run(Thread.java:748)
The code I used:
The text was updated successfully, but these errors were encountered: