Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

branch-3.0: [bugfix](memtable) arena is freed early and will cause use after free #46997 #47006

Merged
merged 1 commit into from
Jan 15, 2025

Conversation

github-actions[bot]
Copy link
Contributor

Cherry-picked from #46997

…#46997)

### What problem does this PR solve?

Related PR: #40912

Problem Summary:

Do not reset _arena in MemTable::to_block(), because it is still used in
~MemTable() when releasing agg places

Fix the following use-after-free

Use:

==3628099==ERROR: AddressSanitizer: heap-use-after-free on address
0x52100381be60 at pc 0x5648f30893f8 bp 0x7f8842433310 sp 0x7f8842433308
READ of size 8 at 0x52100381be60 thread T4767 (wg_flush_broker)
#0 0x5648f30893f7 in
phmap::priv::raw_hash_set<phmap::priv::FlatHashSetPolicy<unsigned long>,
phmap::Hash<unsigned long>, phmap::EqualTo<unsigned long>,
std::allocator<unsigned long>>::destroy_slots()
doris/thirdparty/installed/include/parallel_hashmap/phmap.h:1992:14
#1 0x5648f30936f6 in
phmap::priv::raw_hash_set<phmap::priv::FlatHashSetPolicy<unsigned long>,
phmap::Hash<unsigned long>, phmap::EqualTo<unsigned long>,
std::allocator<unsigned long>>::~raw_hash_set()
doris/thirdparty/installed/include/parallel_hashmap/phmap.h:1236:23
#2 0x5648f3089276 in phmap::flat_hash_set<unsigned long,
phmap::Hash<unsigned long>, phmap::EqualTo<unsigned long>,
std::allocator<unsigned long>>::~flat_hash_set()
doris/thirdparty/installed/include/parallel_hashmap/phmap.h:4577:7
#3 0x5648f308922a in doris::BitmapValue::~BitmapValue()
doris/be/src/util/bitmap_value.h:824:7
#4 0x56490d319fa6 in
doris::vectorized::AggregateFunctionBitmapData<doris::vectorized::AggregateFunctionBitmapUnionOp>::~AggregateFunctionBitmapData()
doris/be/src/vec/aggregate_functions/aggregate_function_bitmap.h:127:8
#5 0x56490d49636a in
doris::vectorized::IAggregateFunctionDataHelper<doris::vectorized::AggregateFunctionBitmapData<doris::vectorized::AggregateFunctionBitmapUnionOp>,
doris::vectorized::AggregateFunctionBitmapOp<doris::vectorized::AggregateFunctionBitmapUnionOp>>::destroy(char*)
const doris/be/src/vec/aggregate_functions/aggregate_function.h:563:92
#6 0x5648f68376e9 in doris::MemTable::~MemTable()
doris/be/src/olap/memtable.cpp:159:27
Free:

0x52100381be60 is located 352 bytes inside of 4096-byte region
[0x52100381bd00,0x52100381cd00)
freed by thread T4767 (wg_flush_broker) here:
#0 0x5648f2f3ee46 in free (doris/output/be/lib/doris_be+0x57418e46)
(BuildId: 298b9c91a1ec8fe0)
#1 0x5648f3080dfc in DefaultMemoryAllocator::free(void*)
doris/be/src/vec/common/allocator.h:108:41
#2 0x5648f3080b3f in Allocator<false, false, false,
DefaultMemoryAllocator>::free(void*, unsigned long)
doris/be/src/vec/common/allocator.h:323:13
#3 0x5648f30b6dee in doris::vectorized::Arena::Chunk::~Chunk()
doris/be/src/vec/common/arena.h:77:31
#4 0x5648f30b6d1f in doris::vectorized::Arena::~Arena()
doris/be/src/vec/common/arena.h:151:16
#5 0x5648f30b695a in
std::default_delete<doris::vectorized::Arena>::operator()(doris::vectorized::Arena*)
const
env/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:99:2
#6 0x5648f30b67c8 in std::__uniq_ptr_impl<doris::vectorized::Arena,
std::default_delete<doris::vectorized::Arena>>::reset(doris::vectorized::Arena*)
env/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:211:4
#7 0x5648f30b5d8c in std::unique_ptr<doris::vectorized::Arena,
std::default_delete<doris::vectorized::Arena>>::reset(doris::vectorized::Arena*)
env/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:509:7
#8 0x5648f684253b in
doris::MemTable::_to_block(std::unique_ptr<doris::vectorized::Block,
std::default_delete<doris::vectorized::Block>>*)
doris/be/src/olap/memtable.cpp:522:12
#9 0x5648f6842ac5 in
doris::MemTable::to_block(std::unique_ptr<doris::vectorized::Block,
std::default_delete<doris::vectorized::Block>>*)
doris/be/src/olap/memtable.cpp:528:5
#10 0x5648f6907a72 in
doris::FlushToken::_do_flush_memtable(doris::MemTable*, int, long*)
doris/be/src/olap/memtable_flush_executor.cpp:144:9
#11 0x5648f690932c in
doris::FlushToken::_flush_memtable(std::shared_ptr<doris::MemTable>,
int, long) doris/be/src/olap/memtable_flush_executor.cpp:183:16
#12 0x5648f6915d18 in doris::MemtableFlushTask::run()
doris/be/src/olap/memtable_flush_executor.cpp:60:20
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Jan 15, 2025
@hello-stephen
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41421 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8aa5d66044b7460635d015e8483fb479dd6f9258, data reload: false

------ Round 1 ----------------------------------
q1	17576	7605	7404	7404
q2	2076	177	173	173
q3	10715	1101	1193	1101
q4	10560	777	764	764
q5	7754	2978	2853	2853
q6	247	150	147	147
q7	1005	612	601	601
q8	9350	1996	2055	1996
q9	6694	6416	6491	6416
q10	7074	2293	2359	2293
q11	467	261	270	261
q12	430	214	213	213
q13	17786	3010	3009	3009
q14	252	211	207	207
q15	563	516	524	516
q16	687	624	625	624
q17	1010	554	574	554
q18	7402	6701	6657	6657
q19	1395	1219	1046	1046
q20	481	216	201	201
q21	4125	3428	3388	3388
q22	1098	1001	997	997
Total cold run time: 108747 ms
Total hot run time: 41421 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7404	7365	7500	7365
q2	332	242	234	234
q3	3093	2936	3035	2936
q4	2172	1828	1867	1828
q5	5775	5787	5814	5787
q6	240	146	148	146
q7	2293	1813	1783	1783
q8	3421	3555	3498	3498
q9	9039	9032	8948	8948
q10	3590	3562	3564	3562
q11	615	502	486	486
q12	841	608	622	608
q13	9272	3175	3184	3175
q14	305	265	271	265
q15	591	538	514	514
q16	738	661	658	658
q17	1915	1689	1638	1638
q18	8303	7651	7493	7493
q19	1823	1637	1698	1637
q20	2093	1880	1863	1863
q21	5685	5509	5410	5410
q22	1136	1006	1033	1006
Total cold run time: 70676 ms
Total hot run time: 60840 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197648 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8aa5d66044b7460635d015e8483fb479dd6f9258, data reload: false

query1	1336	899	913	899
query2	6237	2093	2064	2064
query3	10799	4345	4200	4200
query4	66329	29290	23475	23475
query5	4956	450	446	446
query6	432	183	176	176
query7	5641	310	321	310
query8	305	225	219	219
query9	9280	2674	2666	2666
query10	469	275	273	273
query11	17466	15208	15864	15208
query12	165	104	110	104
query13	1563	460	426	426
query14	10819	7581	7521	7521
query15	200	182	181	181
query16	7211	448	503	448
query17	1101	602	610	602
query18	2114	325	343	325
query19	209	159	163	159
query20	121	109	116	109
query21	208	113	110	110
query22	4840	4548	4437	4437
query23	34627	34246	34391	34246
query24	6100	2899	2978	2899
query25	544	439	412	412
query26	658	174	170	170
query27	1919	353	354	353
query28	4127	2487	2443	2443
query29	716	469	480	469
query30	233	163	161	161
query31	1008	825	825	825
query32	69	54	53	53
query33	477	296	292	292
query34	916	524	521	521
query35	850	739	719	719
query36	1084	958	972	958
query37	116	77	75	75
query38	4048	4004	3996	3996
query39	1520	1495	1465	1465
query40	208	105	104	104
query41	48	49	46	46
query42	110	102	104	102
query43	533	509	490	490
query44	1189	841	834	834
query45	185	176	171	171
query46	1155	726	720	720
query47	2011	1909	1921	1909
query48	482	382	391	382
query49	723	404	388	388
query50	857	438	429	429
query51	7370	7242	6947	6947
query52	95	87	85	85
query53	253	176	183	176
query54	550	462	466	462
query55	78	78	80	78
query56	261	243	243	243
query57	1254	1122	1108	1108
query58	223	205	210	205
query59	3215	3088	2886	2886
query60	276	246	242	242
query61	114	107	108	107
query62	780	678	666	666
query63	216	190	189	189
query64	1386	674	641	641
query65	3269	3152	3215	3152
query66	711	311	302	302
query67	15793	15656	15749	15656
query68	4139	600	581	581
query69	438	283	264	264
query70	1223	1093	1134	1093
query71	402	260	259	259
query72	6440	4067	4013	4013
query73	754	347	353	347
query74	10016	9078	8998	8998
query75	3351	2633	2625	2625
query76	1808	1033	1008	1008
query77	563	294	307	294
query78	10590	9698	9540	9540
query79	1105	610	589	589
query80	833	417	441	417
query81	498	241	238	238
query82	603	122	118	118
query83	170	152	140	140
query84	280	76	83	76
query85	834	290	300	290
query86	350	311	302	302
query87	4553	4321	4359	4321
query88	3500	2404	2349	2349
query89	411	287	284	284
query90	2057	184	183	183
query91	180	149	151	149
query92	65	51	49	49
query93	1089	536	541	536
query94	841	285	300	285
query95	365	259	260	259
query96	613	276	282	276
query97	3306	3216	3137	3137
query98	215	212	194	194
query99	1600	1310	1299	1299
Total cold run time: 317350 ms
Total hot run time: 197648 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.45 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8aa5d66044b7460635d015e8483fb479dd6f9258, data reload: false

query1	0.03	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.07	0.06
query4	1.68	0.10	0.10
query5	0.52	0.51	0.53
query6	1.15	0.73	0.74
query7	0.02	0.02	0.02
query8	0.06	0.03	0.03
query9	0.57	0.49	0.51
query10	0.57	0.56	0.57
query11	0.13	0.10	0.11
query12	0.14	0.12	0.11
query13	0.61	0.60	0.58
query14	2.93	2.93	3.02
query15	0.91	0.85	0.83
query16	0.40	0.37	0.39
query17	1.02	1.05	1.02
query18	0.23	0.22	0.23
query19	1.85	1.80	1.99
query20	0.02	0.01	0.00
query21	15.36	0.58	0.58
query22	2.75	1.80	1.82
query23	17.01	1.00	0.78
query24	3.08	1.99	1.50
query25	0.24	0.10	0.15
query26	0.54	0.14	0.14
query27	0.03	0.04	0.04
query28	9.35	1.11	1.08
query29	12.59	3.31	3.31
query30	0.25	0.06	0.06
query31	2.86	0.39	0.39
query32	3.25	0.46	0.46
query33	3.00	3.00	3.04
query34	16.71	4.47	4.44
query35	4.60	4.48	4.44
query36	0.67	0.51	0.51
query37	0.10	0.06	0.06
query38	0.04	0.03	0.04
query39	0.04	0.03	0.02
query40	0.15	0.12	0.12
query41	0.08	0.02	0.03
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 105.92 s
Total hot run time: 33.45 s

@yiguolei yiguolei merged commit e7f095e into branch-3.0 Jan 15, 2025
21 of 22 checks passed
@github-actions github-actions bot deleted the auto-pick-46997-branch-3.0 branch January 15, 2025 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants