Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](parquet-reader) Fix parquet reader crash in set_dict(). #40643

Merged
merged 1 commit into from
Sep 13, 2024

Conversation

kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Sep 11, 2024

Proposed changes

Issue

*** is nereids: 1 ***
tablet id: 4
Abort at 1725864966 (unix time) try "date -d @1725864966" if you are using GNU date ***
*** Set a breakpoint in static void __GI_abort() to debug ***
PC: @ 0x7f007fb4090a04
*** SIGSEGV (address not mapped to object 0xa0fa868a41d6) received by PID 404737 (TID 274135 OR 0x7ece29df700) from PID 1755584205; stack trace: ***
#0 __GI_raise
#1 __GI_abort
#2 sig_handler
#3 _sigaction
#4 JVM_handle_linux_signal
#5 _sigaction
#6 doris::vectorized::ByteArrayDictDecoder::set_dict(std::unique_ptr<unsigned char[], std::default_delete<unsigned char[]>> &&, int, unsigned long)
at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/byte_array_dict_decoder.cpp:41
#7 doris::vectorized::ColumnChunkReader::_decode_dict_page() at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:258
#8 doris::vectorized::ColumnChunkReader::next_page() at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:105
#9 doris::vectorized::ParquetColumnReader::_read_column_data(doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_column_reader.cpp:508
#10 doris::vectorized::ScalarColumnReader::_next_value(doris::vectorized::ICollumn*, unsigned long, unsigned long*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_column_reader.cpp:699
#11 doris::vectorized::RowGroupReader::_read_column_data(doris::vectorized::Block*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> &, std::vector<doris::vectorized::ColumnSelectVector>*, unsigned long, unsigned long*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:425
#12 doris::vectorized::RowGroupReader::get_next_block(doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:311
#13 doris::vectorized::ParquetReader::get_next(doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_reader.cpp:533
#14 doris::vectorized::VFileScanner::_get_next_reader_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/scan/vfile_scanner.cpp:368
#15 doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/scan/vfile_scanner.cpp:411
#16 doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/scan/vscanner.cpp:431
#17 doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/scan/vscanner.cpp:96
#18 doris::vectorized::ScannerScheduler::submit(doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::ScanTask>) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/scan/scanner_context.cpp:96
#19 doris::Thread::supervise_thread(void*) at /mnt/disk1/yy/git/enterprise-core/be/src/util/thread.cpp:499
#20 start_thread
#21 clone in /lib64/libc.so.6

Solution

It is not known why the parquet dictionary page will be null in this case, causing a crash. This PR adds defensive code to prevent the crash.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.82% (9394/25510)
Line Coverage: 28.23% (77509/274543)
Region Coverage: 27.63% (40007/144815)
Branch Coverage: 24.25% (20349/83904)
Coverage Report: http://coverage.selectdb-in.cc/coverage/ae40e4e582de9bd74629df77087d899c0da35f73_ae40e4e582de9bd74629df77087d899c0da35f73/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38088 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ae40e4e582de9bd74629df77087d899c0da35f73, data reload: false

------ Round 1 ----------------------------------
q1	17638	4386	4339	4339
q2	2011	190	190	190
q3	11680	962	1100	962
q4	10513	719	741	719
q5	7744	2847	2795	2795
q6	224	135	135	135
q7	961	622	603	603
q8	9376	2092	2067	2067
q9	7135	6562	6567	6562
q10	6993	2227	2213	2213
q11	460	252	257	252
q12	405	223	228	223
q13	17772	3082	3047	3047
q14	270	236	243	236
q15	545	505	471	471
q16	520	432	430	430
q17	984	695	625	625
q18	7283	6910	6802	6802
q19	1400	974	1030	974
q20	672	333	335	333
q21	3922	3091	3120	3091
q22	1110	1024	1019	1019
Total cold run time: 109618 ms
Total hot run time: 38088 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4374	4241	4300	4241
q2	377	282	285	282
q3	2894	2676	2655	2655
q4	1894	1652	1676	1652
q5	5622	5723	5718	5718
q6	227	140	148	140
q7	2252	1785	1838	1785
q8	3302	3447	3395	3395
q9	8976	8887	8883	8883
q10	3569	3387	3408	3387
q11	589	527	546	527
q12	842	641	661	641
q13	15238	3243	3255	3243
q14	337	282	296	282
q15	524	497	508	497
q16	555	509	508	508
q17	1852	1566	1566	1566
q18	8286	7909	7915	7909
q19	1705	1477	1567	1477
q20	2198	1919	1940	1919
q21	5714	5548	5526	5526
q22	1123	1017	1037	1017
Total cold run time: 72450 ms
Total hot run time: 57250 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197465 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ae40e4e582de9bd74629df77087d899c0da35f73, data reload: false

query1	1278	876	880	876
query2	6348	1905	1894	1894
query3	10638	3989	3952	3952
query4	60036	29301	23305	23305
query5	5001	485	483	483
query6	388	158	158	158
query7	5632	303	293	293
query8	319	226	227	226
query9	7758	2537	2532	2532
query10	450	279	275	275
query11	17516	15090	15547	15090
query12	159	101	101	101
query13	1479	397	388	388
query14	10471	7184	7314	7184
query15	200	182	177	177
query16	6668	438	509	438
query17	1082	579	571	571
query18	1551	331	295	295
query19	195	151	146	146
query20	116	109	112	109
query21	208	104	107	104
query22	4791	4677	4630	4630
query23	34326	33477	34475	33477
query24	5995	2844	2883	2844
query25	514	423	423	423
query26	600	153	154	153
query27	1593	277	277	277
query28	3787	2065	2031	2031
query29	630	406	405	405
query30	226	158	155	155
query31	920	731	775	731
query32	70	51	50	50
query33	410	279	275	275
query34	881	469	468	468
query35	880	718	739	718
query36	1046	932	943	932
query37	142	81	86	81
query38	4083	3960	3989	3960
query39	1470	1421	1384	1384
query40	199	113	111	111
query41	46	47	44	44
query42	114	95	92	92
query43	499	463	471	463
query44	1097	759	744	744
query45	193	164	166	164
query46	1096	719	777	719
query47	1893	1789	1851	1789
query48	368	297	295	295
query49	764	436	475	436
query50	812	414	428	414
query51	6995	6911	6767	6767
query52	99	85	87	85
query53	251	173	174	173
query54	572	454	455	454
query55	75	75	75	75
query56	277	256	249	249
query57	1212	1107	1102	1102
query58	218	237	230	230
query59	2952	2714	2762	2714
query60	295	261	266	261
query61	103	95	100	95
query62	751	644	672	644
query63	210	186	182	182
query64	1357	715	688	688
query65	3252	3208	3178	3178
query66	613	334	340	334
query67	15935	15389	15416	15389
query68	1439	845	849	845
query69	424	318	320	318
query70	1170	1108	1131	1108
query71	350	340	333	333
query72	4618	3436	3457	3436
query73	592	575	581	575
query74	9062	8990	8955	8955
query75	3052	2899	2980	2899
query76	969	838	835	835
query77	413	391	398	391
query78	9340	9532	9168	9168
query79	884	861	855	855
query80	780	823	846	823
query81	444	255	260	255
query82	260	258	264	258
query83	185	189	188	188
query84	188	104	105	104
query85	610	395	375	375
query86	304	285	298	285
query87	4368	4256	4320	4256
query88	4261	4158	4129	4129
query89	364	359	367	359
query90	820	309	308	308
query91	123	128	125	125
query92	79	74	78	74
query93	904	893	888	888
query94	402	384	357	357
query95	426	412	463	412
query96	470	470	471	470
query97	3106	3136	3098	3098
query98	231	234	217	217
query99	1307	1297	1284	1284
Total cold run time: 294050 ms
Total hot run time: 197465 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.07 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ae40e4e582de9bd74629df77087d899c0da35f73, data reload: false

query1	0.05	0.05	0.04
query2	0.07	0.04	0.04
query3	0.23	0.06	0.06
query4	1.67	0.07	0.07
query5	0.49	0.50	0.51
query6	1.14	0.73	0.73
query7	0.02	0.02	0.01
query8	0.05	0.05	0.04
query9	0.55	0.47	0.49
query10	0.54	0.54	0.54
query11	0.15	0.11	0.11
query12	0.15	0.13	0.12
query13	0.61	0.59	0.58
query14	1.40	1.45	1.43
query15	0.85	0.82	0.83
query16	0.37	0.38	0.36
query17	1.03	0.96	1.04
query18	0.21	0.20	0.21
query19	1.88	1.84	1.81
query20	0.01	0.01	0.01
query21	15.39	0.66	0.65
query22	3.86	6.92	2.52
query23	18.32	1.37	1.30
query24	2.11	0.22	0.21
query25	0.16	0.08	0.08
query26	0.27	0.19	0.17
query27	0.07	0.08	0.07
query28	13.28	1.02	0.99
query29	12.60	3.48	3.43
query30	0.24	0.05	0.05
query31	2.87	0.41	0.39
query32	3.26	0.47	0.47
query33	2.96	2.96	3.02
query34	16.93	4.43	4.53
query35	4.49	4.45	4.40
query36	0.67	0.48	0.48
query37	0.18	0.16	0.15
query38	0.16	0.15	0.14
query39	0.05	0.04	0.03
query40	0.16	0.12	0.13
query41	0.09	0.05	0.04
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.68 s
Total hot run time: 32.07 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 11, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@kaka11chen kaka11chen marked this pull request as ready for review September 12, 2024 00:39
@morningman morningman merged commit e40275a into apache:master Sep 13, 2024
25 of 30 checks passed
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Sep 20, 2024
…#40643)

## Proposed changes

### Issue
```
*** is nereids: 1 ***
tablet id: 4
Abort at 1725864966 (unix time) try "date -d @1725864966" if you are using GNU date ***
*** Set a breakpoint in static void __GI_abort() to debug ***
PC: @ 0x7f007fb4090a04
*** SIGSEGV (address not mapped to object 0xa0fa868a41d6) received by PID 404737 (TID 274135 OR 0x7ece29df700) from PID 1755584205; stack trace: ***
#0 __GI_raise
#1 __GI_abort
apache#2 sig_handler
apache#3 _sigaction
apache#4 JVM_handle_linux_signal
apache#5 _sigaction
apache#6 doris::vectorized::ByteArrayDictDecoder::set_dict(std::unique_ptr<unsigned char[], std::default_delete<unsigned char[]>> &&, int, unsigned long)
at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/byte_array_dict_decoder.cpp:41
apache#7 doris::vectorized::ColumnChunkReader::_decode_dict_page() at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:258
apache#8 doris::vectorized::ColumnChunkReader::next_page() at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:105
apache#9 doris::vectorized::ParquetColumnReader::_read_column_data(doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_column_reader.cpp:508
apache#10 doris::vectorized::ScalarColumnReader::_next_value(doris::vectorized::ICollumn*, unsigned long, unsigned long*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_column_reader.cpp:699
apache#11 doris::vectorized::RowGroupReader::_read_column_data(doris::vectorized::Block*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> &, std::vector<doris::vectorized::ColumnSelectVector>*, unsigned long, unsigned long*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:425
apache#12 doris::vectorized::RowGroupReader::get_next_block(doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:311
apache#13 doris::vectorized::ParquetReader::get_next(doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/format/parquet/vparquet_reader.cpp:533
apache#14 doris::vectorized::VFileScanner::_get_next_reader_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/scan/vfile_scanner.cpp:368
apache#15 doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/scan/vfile_scanner.cpp:411
apache#16 doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/scan/vscanner.cpp:431
apache#17 doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/scan/vscanner.cpp:96
apache#18 doris::vectorized::ScannerScheduler::submit(doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::ScanTask>) at /mnt/disk1/yy/git/enterprise-core/be/src/vec/exec/scan/scanner_context.cpp:96
apache#19 doris::Thread::supervise_thread(void*) at /mnt/disk1/yy/git/enterprise-core/be/src/util/thread.cpp:499
apache#20 start_thread
apache#21 clone in /lib64/libc.so.6
```

### Solution
It is not known why the parquet dictionary page will be null in this
case, causing a crash. This PR adds defensive code to prevent the crash.
yiguolei pushed a commit that referenced this pull request Sep 21, 2024
morningman pushed a commit that referenced this pull request Sep 26, 2024
morningman pushed a commit to morningman/doris that referenced this pull request Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.2-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants