Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Deprecate memory over-acquiring #7384

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Sep 29, 2024

As Velox is getting stabler on spill-to-disk, we will start to remove memory over-acquiring (introduced since #2321) to improve Gluten's memory utilization.

This is the part one to set over-acquire ratio to 0.

@github-actions github-actions bot added the CORE works for Gluten Core label Sep 29, 2024
Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member Author

Q97 OOM test is failing when turning over-acquire off, should investigate

link:
https://github.com/apache/incubator-gluten/actions/runs/11089412008/job/30810524875?pr=7384

log:

2024-09-29T04:29:00.8126993Z 24/09/29 04:29:00 ERROR ManagedReservationListener: Error reserving memory from target
2024-09-29T04:29:00.8132417Z org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 8.0 MiB, granted: 6.2 MiB. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled). 
2024-09-29T04:29:00.8135383Z Current config settings: 
2024-09-29T04:29:00.8135979Z 	spark.gluten.memory.offHeap.size.in.bytes=4.0 GiB
2024-09-29T04:29:00.8136832Z 	spark.gluten.memory.task.offHeap.size.in.bytes=341.3 MiB
2024-09-29T04:29:00.8137860Z 	spark.gluten.memory.conservative.task.offHeap.size.in.bytes=307.2 MiB
2024-09-29T04:29:00.8138747Z 	spark.memory.offHeap.enabled=true
2024-09-29T04:29:00.8139489Z 	spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
2024-09-29T04:29:00.8140211Z Memory consumer stats: 
2024-09-29T04:29:00.8141333Z 	Task.77:                                                  Current used bytes: 301.0 MiB, peak bytes:        N/A
2024-09-29T04:29:00.8142636Z 	\- Gluten.Tree.407:                                       Current used bytes: 301.0 MiB, peak bytes:  307.2 MiB
2024-09-29T04:29:00.8143961Z 	   \- root.407, 307.2 MiB:                                Current used bytes: 301.0 MiB, peak bytes:  307.2 MiB
2024-09-29T04:29:00.8145291Z 	      +- WholeStageIterator.414:                          Current used bytes: 293.0 MiB, peak bytes:  299.2 MiB
2024-09-29T04:29:00.8146560Z 	      |  \- single:                                       Current used bytes: 293.0 MiB, peak bytes:  296.0 MiB
2024-09-29T04:29:00.8147717Z 	      |     +- root:                                      Current used bytes: 169.0 MiB, peak bytes:  294.0 MiB
2024-09-29T04:29:00.8148988Z 	      |     |  +- task.Gluten_Stage_33_TID_77_VTID_175:   Current used bytes: 169.0 MiB, peak bytes:  294.0 MiB
2024-09-29T04:29:00.8150335Z 	      |     |  |  +- node.10:                             Current used bytes: 100.4 MiB, peak bytes:  120.0 MiB
2024-09-29T04:29:00.8151529Z 	      |     |  |  |  +- op.10.4.0.HashBuild:              Current used bytes: 100.2 MiB, peak bytes:  100.2 MiB
2024-09-29T04:29:00.8152831Z 	      |     |  |  |  \- op.10.3.0.HashProbe:              Current used bytes: 137.3 KiB, peak bytes:  201.5 KiB
2024-09-29T04:29:00.8154072Z 	      |     |  |  +- node.14:                             Current used bytes:  36.6 MiB, peak bytes:   56.0 MiB
2024-09-29T04:29:00.8155289Z 	      |     |  |  |  +- op.14.3.0.HashProbe:              Current used bytes:  18.3 MiB, peak bytes:   30.3 MiB
2024-09-29T04:29:00.8156574Z 	      |     |  |  |  \- op.14.5.0.HashBuild:              Current used bytes:  18.2 MiB, peak bytes:   18.3 MiB
2024-09-29T04:29:00.8157785Z 	      |     |  |  +- node.3:                              Current used bytes:  14.0 MiB, peak bytes:  104.0 MiB
2024-09-29T04:29:00.8159040Z 	      |     |  |  |  +- op.3.2.0.HashBuild:               Current used bytes:  13.6 MiB, peak bytes:  100.2 MiB
2024-09-29T04:29:00.8160321Z 	      |     |  |  |  \- op.3.1.0.HashProbe:               Current used bytes: 385.1 KiB, peak bytes:    9.8 MiB
2024-09-29T04:29:00.8161887Z 	      |     |  |  +- node.17:                             Current used bytes:  12.1 MiB, peak bytes:   29.0 MiB
2024-09-29T04:29:00.8163140Z 	      |     |  |  |  +- op.17.3.0.HashBuild:              Current used bytes:  12.1 MiB, peak bytes:   12.1 MiB
2024-09-29T04:29:00.8164386Z 	      |     |  |  |  \- op.17.0.0.HashProbe:              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8253532Z 	      |     |  |  +- node.6:                              Current used bytes:   6.0 MiB, peak bytes:   20.0 MiB
2024-09-29T04:29:00.8254756Z 	      |     |  |  |  +- op.6.1.0.HashBuild:               Current used bytes:   6.0 MiB, peak bytes:   18.3 MiB
2024-09-29T04:29:00.8256130Z 	      |     |  |  |  \- op.6.0.0.HashProbe:               Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8257439Z 	      |     |  |  +- node.20:                             Current used bytes:   4.0 KiB, peak bytes:   27.0 MiB
2024-09-29T04:29:00.8258745Z 	      |     |  |  |  +- op.20.6.0.HashBuild:              Current used bytes:   4.0 KiB, peak bytes:   64.0 KiB
2024-09-29T04:29:00.8260045Z 	      |     |  |  |  \- op.20.0.0.HashProbe:              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8261267Z 	      |     |  |  +- node.27:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8262579Z 	      |     |  |  |  \- op.27.8.0.ValueStream:            Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8263876Z 	      |     |  |  +- node.25:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8265184Z 	      |     |  |  |  \- op.25.0.0.FilterProject:          Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8266804Z 	      |     |  |  +- node.26:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8268167Z 	      |     |  |  |  \- op.26.0.0.FilterProject:          Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8269540Z 	      |     |  |  +- node.24:                             Current used bytes:     0.0 B, peak bytes:   28.0 MiB
2024-09-29T04:29:00.8270849Z 	      |     |  |  |  +- op.24.7.0.HashBuild:              Current used bytes:     0.0 B, peak bytes:    3.9 MiB
2024-09-29T04:29:00.8272229Z 	      |     |  |  |  \- op.24.0.0.HashProbe:              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8273527Z 	      |     |  |  +- node.29:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8274863Z 	      |     |  |  |  \- op.29.0.0.FilterProject:          Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8276194Z 	      |     |  |  +- node.5:                              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8277487Z 	      |     |  |  |  \- op.5.1.0.FilterProject:           Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8278802Z 	      |     |  |  +- node.21:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8280127Z 	      |     |  |  |  \- op.21.0.0.FilterProject:          Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8281456Z 	      |     |  |  +- node.28:                             Current used bytes:     0.0 B, peak bytes:   28.0 MiB
2024-09-29T04:29:00.8356259Z 	      |     |  |  |  +- op.28.8.0.HashBuild:              Current used bytes:     0.0 B, peak bytes:   68.0 KiB
2024-09-29T04:29:00.8357871Z 	      |     |  |  |  \- op.28.0.0.HashProbe:              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8359245Z 	      |     |  |  +- node.11:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8360604Z 	      |     |  |  |  \- op.11.3.0.FilterProject:          Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8362294Z 	      |     |  |  +- node.30:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8363630Z 	      |     |  |  |  \- op.30.0.0.FilterProject:          Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8368225Z 	      |     |  |  +- node.8:                              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8369675Z 	      |     |  |  |  \- op.8.3.0.ValueStream:             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8371042Z 	      |     |  |  +- node.16:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8372405Z 	      |     |  |  |  \- op.16.3.0.FilterProject:          Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8373745Z 	      |     |  |  +- node.22:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8375110Z 	      |     |  |  |  \- op.22.0.0.FilterProject:          Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8376447Z 	      |     |  |  +- node.7:                              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8377772Z 	      |     |  |  |  \- op.7.0.0.FilterProject:           Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8379075Z 	      |     |  |  +- node.13:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8380353Z 	      |     |  |  |  \- op.13.5.0.ValueStream:            Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8381962Z 	      |     |  |  +- node.0:                              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8383218Z 	      |     |  |  |  \- op.0.0.0.ValueStream:             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8384366Z 	      |     |  |  +- node.4:                              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8385609Z 	      |     |  |  |  \- op.4.1.0.FilterProject:           Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8386907Z 	      |     |  |  +- node.23:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8388155Z 	      |     |  |  |  \- op.23.7.0.ValueStream:            Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8389404Z 	      |     |  |  +- node.33:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8390700Z 	      |     |  |  |  \- op.33.0.0.PartialAggregation:     Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8392067Z 	      |     |  |  +- node.9:                              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8393327Z 	      |     |  |  |  \- op.9.4.0.ValueStream:             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8394560Z 	      |     |  |  +- node.2:                              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8395764Z 	      |     |  |  |  \- op.2.2.0.ValueStream:             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8396976Z 	      |     |  |  +- node.19:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8398193Z 	      |     |  |  |  \- op.19.6.0.ValueStream:            Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8399422Z 	      |     |  |  +- node.32:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8400647Z 	      |     |  |  |  \- op.32.0.0.Aggregation:            Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8401865Z 	      |     |  |  +- node.12:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8403368Z 	      |     |  |  |  \- op.12.3.0.FilterProject:          Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8404588Z 	      |     |  |  +- node.1:                              Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8690253Z 	      |     |  |  |  \- op.1.1.0.ValueStream:             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8691671Z 	      |     |  |  +- node.31:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8693027Z 	      |     |  |  |  \- op.31.0.0.Aggregation:            Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8694411Z 	      |     |  |  +- node.18:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8695758Z 	      |     |  |  |  \- op.18.0.0.FilterProject:          Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8697154Z 	      |     |  |  \- node.15:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8698515Z 	      |     |  |     \- op.15.3.0.FilterProject:          Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8699898Z 	      |     |  \- default_leaf:                           Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8719955Z 	      |     \- gluten::MemoryAllocator:                   Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8721443Z 	      +- ArrowContextInstance.58:                         Current used bytes:   8.0 MiB, peak bytes:    8.0 MiB
2024-09-29T04:29:00.8723293Z 	      +- ShuffleWriter.154:                               Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8724542Z 	      |  \- single:                                       Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8825268Z 	      |     +- root:                                      Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8828471Z 	      |     |  \- default_leaf:                           Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8831759Z 	      |     \- gluten::MemoryAllocator:                   Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8835086Z 	      +- IndicatorVectorBase#init.169:                    Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8839844Z 	      |  \- single:                                       Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8845746Z 	      |     +- gluten::MemoryAllocator:                   Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8846946Z 	      |     \- root:                                      Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8859187Z 	      |        \- default_leaf:                           Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8860565Z 	      +- IteratorMetrics.173.OverAcquire.0:               Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8862019Z 	      +- ShuffleWriter.154.OverAcquire.0:                 Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8863419Z 	      +- BuildSideRelation#deserialized.29:               Current used bytes:     0.0 B, peak bytes:    8.0 MiB
2024-09-29T04:29:00.8864579Z 	      |  \- single:                                       Current used bytes:     0.0 B, peak bytes:    8.0 MiB
2024-09-29T04:29:00.8865602Z 	      |     +- root:                                      Current used bytes:     0.0 B, peak bytes: 1024.0 KiB
2024-09-29T04:29:00.8866665Z 	      |     |  \- default_leaf:                           Current used bytes:     0.0 B, peak bytes:  192.0 KiB
2024-09-29T04:29:00.8868373Z 	      |     \- gluten::MemoryAllocator:                   Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8869714Z 	      +- VeloxBatchResizer.159:                           Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.8870906Z 	      |  \- single:                                       Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9005856Z 	      |     +- gluten::MemoryAllocator:                   Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9007417Z 	      |     \- root:                                      Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9008609Z 	      |        \- default_leaf:                           Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9010107Z 	      +- BuildSideRelation#deserialized.29.OverAcquire.0: Current used bytes:     0.0 B, peak bytes:    2.4 MiB
2024-09-29T04:29:00.9011799Z 	      +- IndicatorVectorBase#init.169.OverAcquire.0:      Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9013268Z 	      +- ShuffleReader.30:                                Current used bytes:     0.0 B, peak bytes:    8.0 MiB
2024-09-29T04:29:00.9014516Z 	      |  \- single:                                       Current used bytes:     0.0 B, peak bytes:    8.0 MiB
2024-09-29T04:29:00.9015692Z 	      |     +- root:                                      Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9016849Z 	      |     |  \- default_leaf:                           Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9018265Z 	      |     \- gluten::MemoryAllocator:                   Current used bytes:     0.0 B, peak bytes:  130.9 KiB
2024-09-29T04:29:00.9019999Z 	      +- ShuffleReader.30.OverAcquire.0:                  Current used bytes:     0.0 B, peak bytes:    2.4 MiB
2024-09-29T04:29:00.9021443Z 	      +- ArrowContextInstance.62:                         Current used bytes:     0.0 B, peak bytes:    8.0 MiB
2024-09-29T04:29:00.9022906Z 	      +- VeloxBatchResizer.159.OverAcquire.0:             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9024329Z 	      +- IteratorMetrics.173:                             Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9025565Z 	      |  \- single:                                       Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9026712Z 	      |     +- root:                                      Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9027910Z 	      |     |  \- default_leaf:                           Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9029266Z 	      |     \- gluten::MemoryAllocator:                   Current used bytes:     0.0 B, peak bytes:      0.0 B
2024-09-29T04:29:00.9030759Z 	      \- WholeStageIterator.414.OverAcquire.0:            Current used bytes:     0.0 B, peak bytes:   82.5 MiB
2024-09-29T04:29:00.9031603Z 
2024-09-29T04:29:00.9032360Z 	at org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:105)
2024-09-29T04:29:00.9034140Z 	at org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:49)
2024-09-29T04:29:00.9035784Z 	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
2024-09-29T04:29:00.9037366Z 	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
2024-09-29T04:29:00.9038827Z 	at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:36)
2024-09-29T04:29:00.9040085Z 	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
2024-09-29T04:29:00.9041512Z 	at org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
2024-09-29T04:29:00.9042997Z 	at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
2024-09-29T04:29:00.9044618Z 	at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
2024-09-29T04:29:00.9046168Z 	at org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
2024-09-29T04:29:00.9047583Z 	at org.apache.gluten.iterator.IteratorsV1$ReadTimeAccumulator.hasNext(IteratorsV1.scala:127)
2024-09-29T04:29:00.9048796Z 	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
2024-09-29T04:29:00.9049917Z 	at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32)
2024-09-29T04:29:00.9051351Z 	at org.apache.gluten.vectorized.ColumnarBatchInIterator.hasNext(ColumnarBatchInIterator.java:36)
2024-09-29T04:29:00.9052922Z 	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
2024-09-29T04:29:00.9054511Z 	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
2024-09-29T04:29:00.9056020Z 	at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:36)
2024-09-29T04:29:00.9057297Z 	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
2024-09-29T04:29:00.9058625Z 	at org.apache.gluten.iterator.IteratorsV1$ReadTimeAccumulator.hasNext(IteratorsV1.scala:127)
2024-09-29T04:29:00.9060014Z 	at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
2024-09-29T04:29:00.9061387Z 	at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
2024-09-29T04:29:00.9062548Z 	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
2024-09-29T04:29:00.9063525Z 	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
2024-09-29T04:29:00.9064789Z 	at org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:125)
2024-09-29T04:29:00.9066509Z 	at org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:244)
2024-09-29T04:29:00.9067971Z 	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
2024-09-29T04:29:00.9069326Z 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
2024-09-29T04:29:00.9070565Z 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
2024-09-29T04:29:00.9071600Z 	at org.apache.spark.scheduler.Task.run(Task.scala:131)
2024-09-29T04:29:00.9072625Z 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
2024-09-29T04:29:00.9073634Z 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
2024-09-29T04:29:00.9074257Z 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
2024-09-29T04:29:00.9074961Z 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2024-09-29T04:29:00.9075710Z 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2024-09-29T04:29:00.9076262Z 	at java.lang.Thread.run(Thread.java:750)

@zhztheplayer zhztheplayer marked this pull request as draft September 30, 2024 02:13
@FelixYBW
Copy link
Contributor

FelixYBW commented Oct 2, 2024

Q97 OOM test is failing when turning over-acquire off, should investigate

link: https://github.com/apache/incubator-gluten/actions/runs/11089412008/job/30810524875?pr=7384

The root cause should be that: Velox estimates and reserves some memory before real allocation, but the reserved memory size is too small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CORE works for Gluten Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants