Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](move-memtable) fix schema use-after-free in delta writer v2 #30254

Merged
merged 4 commits into from
Jan 23, 2024

Conversation

kaijchen
Copy link
Contributor

Proposed changes

DeltaWriterV2 are shared among sinks (OlapTableSinks).
After #30098, sinks who complete first will not wait for stream close.
They will also not waiting for DeltaWriterV2 close.
Use-after-free may happen because DeltaWriterV2 and its components are referencing TupleDescriptor and SlotDescriptor by raw pointer,

Since TupleDescriptor and SlotDescriptor are part of OlapTableSchemaParam from a sink.
This PR changes OlapTableSchemaParam passed to DeltaWriterV2 to be a shared pointer.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@kaijchen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -70,7 +70,8 @@ inline std::ostream& operator<<(std::ostream& ostr, const TabletStream& tablet_s
return ostr;
}

Status TabletStream::init(OlapTableSchemaParam* schema, int64_t index_id, int64_t partition_id) {
Status TabletStream::init(std::shared_ptr<OlapTableSchemaParam> schema, int64_t index_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'init' can be made static [readability-convert-member-functions-to-static]

be/src/runtime/load_stream.h:49:

-     Status init(std::shared_ptr<OlapTableSchemaParam> schema, int64_t index_id, int64_t partition_id);
+     static Status init(std::shared_ptr<OlapTableSchemaParam> schema, int64_t index_id, int64_t partition_id);

@doris-robot
Copy link

TPC-H: Total hot run time: 39136 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e83d9fcb30a0051f6ee990a7ec0cd33463431411, data reload: false

------ Round 1 ----------------------------------
q1	17654	5414	5424	5414
q2	2036	142	129	129
q3	10747	1173	1154	1154
q4	10289	782	795	782
q5	7762	3159	3131	3131
q6	202	123	118	118
q7	867	507	490	490
q8	9231	1941	1965	1941
q9	7280	6397	6382	6382
q10	8245	3045	3031	3031
q11	419	213	206	206
q12	355	189	195	189
q13	18001	3356	3330	3330
q14	242	204	217	204
q15	549	512	503	503
q16	479	396	369	369
q17	946	509	480	480
q18	7534	6914	6871	6871
q19	4720	1417	1425	1417
q20	569	314	295	295
q21	2839	2406	2494	2406
q22	339	296	294	294
Total cold run time: 111305 ms
Total hot run time: 39136 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5428	5337	5277	5277
q2	323	218	202	202
q3	3301	3231	3207	3207
q4	2076	2102	2068	2068
q5	6048	5916	5989	5916
q6	202	114	117	114
q7	2330	1906	1835	1835
q8	3237	3389	3390	3389
q9	9026	8817	8788	8788
q10	3874	3758	3812	3758
q11	570	486	470	470
q12	804	638	618	618
q13	16934	3122	3129	3122
q14	283	261	262	261
q15	544	501	509	501
q16	520	479	459	459
q17	1842	1869	1834	1834
q18	9425	10853	9454	9454
q19	25099	1557	1531	1531
q20	4585	1949	1905	1905
q21	15527	5414	5502	5414
q22	2964	561	487	487
Total cold run time: 114942 ms
Total hot run time: 60610 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 176150 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e83d9fcb30a0051f6ee990a7ec0cd33463431411, data reload: false

query1	940	342	336	336
query2	6550	2036	1978	1978
query3	6991	203	202	202
query4	31656	22210	22003	22003
query5	4451	377	375	375
query6	255	155	159	155
query7	4608	262	262	262
query8	226	176	182	176
query9	8318	2541	2526	2526
query10	409	224	246	224
query11	17032	15457	15654	15457
query12	121	66	65	65
query13	1681	367	387	367
query14	10549	7138	6939	6939
query15	216	175	185	175
query16	5787	256	254	254
query17	942	473	475	473
query18	1787	253	252	252
query19	173	132	135	132
query20	66	69	72	69
query21	197	139	128	128
query22	4921	4920	4526	4526
query23	31587	30709	30705	30705
query24	11488	2837	2782	2782
query25	573	302	308	302
query26	1570	143	141	141
query27	3239	290	280	280
query28	7076	1839	1823	1823
query29	1517	626	614	614
query30	277	143	133	133
query31	929	727	762	727
query32	75	49	47	47
query33	700	215	212	212
query34	1124	450	459	450
query35	884	782	742	742
query36	1382	1196	1221	1196
query37	96	56	62	56
query38	3342	3220	3230	3220
query39	1311	1272	1274	1272
query40	204	87	86	86
query41	38	35	35	35
query42	87	86	85	85
query43	539	499	458	458
query44	1078	683	695	683
query45	193	183	173	173
query46	1034	659	663	659
query47	1660	1606	1562	1562
query48	424	308	315	308
query49	1138	288	287	287
query50	682	312	314	312
query51	5277	5193	5150	5150
query52	88	84	75	75
query53	333	258	258	258
query54	250	178	196	178
query55	83	76	78	76
query56	172	168	167	167
query57	1008	935	935	935
query58	185	174	159	159
query59	2968	2881	2821	2821
query60	204	178	189	178
query61	79	83	84	83
query62	602	370	365	365
query63	281	269	280	269
query64	4989	1787	1754	1754
query65	3342	3259	3222	3222
query66	1303	320	323	320
query67	15758	14940	15122	14940
query68	11515	549	516	516
query69	587	307	297	297
query70	1644	1447	1482	1447
query71	485	212	212	212
query72	4993	2806	2811	2806
query73	2157	313	307	307
query74	6956	6357	6338	6338
query75	4622	2380	2271	2271
query76	6327	1060	1030	1030
query77	658	232	234	232
query78	9124	8870	8782	8782
query79	1015	518	492	492
query80	567	323	308	308
query81	471	204	201	201
query82	195	77	78	77
query83	140	118	117	117
query84	268	67	66	66
query85	1052	356	331	331
query86	395	354	385	354
query87	3509	3311	3328	3311
query88	2734	2186	2170	2170
query89	438	370	377	370
query90	1951	190	192	190
query91	152	123	128	123
query92	50	46	44	44
query93	1186	424	417	417
query94	1272	159	156	156
query95	498	471	446	446
query96	628	317	319	317
query97	4304	4150	4171	4150
query98	202	193	198	193
query99	1055	706	705	705
Total cold run time: 290195 ms
Total hot run time: 176150 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.88 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e83d9fcb30a0051f6ee990a7ec0cd33463431411, data reload: false

query1	0.03	0.02	0.02
query2	0.06	0.02	0.03
query3	0.22	0.04	0.05
query4	1.70	0.06	0.06
query5	0.53	0.52	0.53
query6	1.27	0.67	0.65
query7	0.02	0.01	0.01
query8	0.04	0.02	0.02
query9	0.55	0.48	0.50
query10	0.55	0.56	0.56
query11	0.11	0.09	0.09
query12	0.11	0.10	0.09
query13	0.61	0.61	0.61
query14	0.77	0.80	0.81
query15	0.79	0.78	0.77
query16	0.36	0.37	0.37
query17	1.04	1.04	1.02
query18	0.25	0.24	0.26
query19	1.88	1.80	1.76
query20	0.00	0.01	0.01
query21	15.42	0.58	0.58
query22	2.43	1.85	2.23
query23	17.31	0.80	0.79
query24	2.40	0.92	1.10
query25	0.54	0.23	0.16
query26	0.43	0.13	0.14
query27	0.06	0.05	0.05
query28	11.59	0.76	0.75
query29	12.60	3.09	3.21
query30	0.62	0.51	0.51
query31	2.79	0.35	0.35
query32	3.37	0.48	0.48
query33	3.24	3.24	3.26
query34	16.10	4.34	4.23
query35	4.27	4.32	4.31
query36	1.12	1.07	1.06
query37	0.06	0.04	0.04
query38	0.03	0.03	0.02
query39	0.02	0.01	0.01
query40	0.15	0.13	0.13
query41	0.07	0.02	0.01
query42	0.02	0.02	0.01
query43	0.03	0.02	0.02
Total cold run time: 105.56 s
Total hot run time: 30.88 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit e83d9fcb30a0051f6ee990a7ec0cd33463431411 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       14.6 seconds inserted 10000000 Rows, about 684K ops/s

yiguolei
yiguolei previously approved these changes Jan 23, 2024
Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 23, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@kaijchen
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jan 23, 2024
Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -70,7 +70,8 @@ inline std::ostream& operator<<(std::ostream& ostr, const TabletStream& tablet_s
return ostr;
}

Status TabletStream::init(OlapTableSchemaParam* schema, int64_t index_id, int64_t partition_id) {
Status TabletStream::init(std::shared_ptr<OlapTableSchemaParam> schema, int64_t index_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method 'init' can be made static [readability-convert-member-functions-to-static]

be/src/runtime/load_stream.h:49:

-     Status init(std::shared_ptr<OlapTableSchemaParam> schema, int64_t index_id,
+     static Status init(std::shared_ptr<OlapTableSchemaParam> schema, int64_t index_id,

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.70% (8663/23607)
Line Coverage: 28.71% (70718/246287)
Region Coverage: 27.59% (36441/132064)
Branch Coverage: 24.34% (18657/76666)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c3368476c54a9ca9d0933534832662c85acfcf5b_c3368476c54a9ca9d0933534832662c85acfcf5b/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38773 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c3368476c54a9ca9d0933534832662c85acfcf5b, data reload: false

------ Round 1 ----------------------------------
q1	17660	5469	5293	5293
q2	2038	136	129	129
q3	10757	1161	1171	1161
q4	10327	795	837	795
q5	7766	3130	3119	3119
q6	196	122	125	122
q7	853	491	482	482
q8	9239	1970	1948	1948
q9	7263	6390	6362	6362
q10	8268	3080	3032	3032
q11	429	197	220	197
q12	358	190	191	190
q13	17997	3320	3338	3320
q14	240	206	216	206
q15	554	516	506	506
q16	471	375	400	375
q17	947	501	509	501
q18	7388	7069	6798	6798
q19	3263	1363	1282	1282
q20	606	304	307	304
q21	2808	2442	2358	2358
q22	358	308	293	293
Total cold run time: 109786 ms
Total hot run time: 38773 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5504	5325	5446	5325
q2	324	215	210	210
q3	3321	3214	3230	3214
q4	2122	2042	2031	2031
q5	5997	5938	5775	5775
q6	198	115	121	115
q7	2275	1816	1914	1816
q8	3226	3395	3373	3373
q9	8944	8822	8773	8773
q10	3989	3814	3865	3814
q11	549	437	453	437
q12	810	625	617	617
q13	16994	3158	3147	3147
q14	282	254	269	254
q15	553	500	510	500
q16	500	487	478	478
q17	2261	1755	1858	1755
q18	9467	11346	9621	9621
q19	20345	1601	1539	1539
q20	4608	1949	1905	1905
q21	14854	5498	5422	5422
q22	1541	554	540	540
Total cold run time: 108664 ms
Total hot run time: 60661 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186254 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c3368476c54a9ca9d0933534832662c85acfcf5b, data reload: false

query1	928	333	336	333
query2	6566	1913	1925	1913
query3	6695	205	199	199
query4	32163	22238	22013	22013
query5	4455	383	375	375
query6	254	160	153	153
query7	4646	272	261	261
query8	229	180	177	177
query9	8323	2573	2534	2534
query10	417	228	240	228
query11	16431	15486	15500	15486
query12	126	73	68	68
query13	1670	377	379	377
query14	10578	7092	6999	6999
query15	224	176	186	176
query16	5784	256	256	256
query17	944	476	471	471
query18	1782	264	251	251
query19	184	133	132	132
query20	71	77	70	70
query21	185	135	131	131
query22	4960	4868	4547	4547
query23	31427	30757	30959	30757
query24	5408	2840	2864	2840
query25	403	311	312	311
query26	777	147	150	147
query27	2255	291	295	291
query28	3887	1841	1845	1841
query29	927	643	647	643
query30	201	139	135	135
query31	924	736	757	736
query32	67	53	49	49
query33	357	211	218	211
query34	819	458	467	458
query35	874	748	800	748
query36	1326	1259	1193	1193
query37	91	63	60	60
query38	3378	3238	3233	3233
query39	1320	1265	1244	1244
query40	195	90	82	82
query41	37	35	34	34
query42	90	81	86	81
query43	513	481	482	481
query44	1052	689	687	687
query45	193	180	172	172
query46	1046	657	666	657
query47	1632	1550	1559	1550
query48	396	299	318	299
query49	604	287	287	287
query50	689	311	317	311
query51	5311	5187	5166	5166
query52	96	79	81	79
query53	320	275	258	258
query54	217	184	192	184
query55	82	77	83	77
query56	181	162	177	162
query57	1015	901	952	901
query58	184	158	159	158
query59	2900	2726	2742	2726
query60	208	184	189	184
query61	83	81	80	80
query62	475	359	333	333
query63	285	271	254	254
query64	2543	1751	1767	1751
query65	3340	3276	3258	3258
query66	967	329	316	316
query67	15345	14915	15046	14915
query68	11680	547	523	523
query69	600	304	300	300
query70	1689	1504	1497	1497
query71	10462	10201	10212	10201
query72	5022	2846	2845	2845
query73	2267	314	314	314
query74	7865	6412	6447	6412
query75	4998	2375	2301	2301
query76	6485	999	958	958
query77	682	238	238	238
query78	9786	8806	8564	8564
query79	1018	510	507	507
query80	623	334	312	312
query81	472	210	211	210
query82	210	82	87	82
query83	145	123	122	122
query84	274	69	71	69
query85	1068	339	328	328
query86	396	383	413	383
query87	3514	3350	3300	3300
query88	3294	2184	2196	2184
query89	436	359	350	350
query90	2290	186	186	186
query91	155	131	131	131
query92	53	47	44	44
query93	1575	459	454	454
query94	1417	161	160	160
query95	508	473	449	449
query96	615	325	313	313
query97	4250	4127	4202	4127
query98	201	196	183	183
query99	961	724	717	717
Total cold run time: 287066 ms
Total hot run time: 186254 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.23 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c3368476c54a9ca9d0933534832662c85acfcf5b, data reload: false

query1	0.03	0.02	0.02
query2	0.06	0.02	0.03
query3	0.22	0.05	0.05
query4	1.67	0.11	0.11
query5	0.53	0.53	0.53
query6	1.26	0.63	0.62
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.53	0.51	0.51
query10	0.55	0.55	0.56
query11	0.12	0.09	0.09
query12	0.11	0.09	0.09
query13	0.60	0.59	0.60
query14	0.79	0.81	0.82
query15	0.79	0.78	0.78
query16	0.36	0.36	0.37
query17	1.01	0.95	1.01
query18	0.23	0.26	0.25
query19	1.88	1.79	1.78
query20	0.01	0.01	0.01
query21	15.39	0.55	0.55
query22	2.66	2.37	1.98
query23	17.02	0.83	0.76
query24	2.74	1.60	1.32
query25	0.39	0.36	0.09
query26	0.56	0.13	0.13
query27	0.06	0.06	0.06
query28	10.70	0.79	0.76
query29	12.52	3.27	3.21
query30	0.58	0.53	0.49
query31	2.78	0.33	0.34
query32	3.36	0.48	0.49
query33	3.24	3.21	3.19
query34	15.89	4.40	4.23
query35	4.34	4.29	4.20
query36	1.12	1.06	1.08
query37	0.08	0.05	0.05
query38	0.04	0.03	0.02
query39	0.02	0.01	0.02
query40	0.18	0.12	0.15
query41	0.07	0.02	0.01
query42	0.03	0.01	0.02
query43	0.02	0.01	0.02
Total cold run time: 104.6 s
Total hot run time: 31.23 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit c3368476c54a9ca9d0933534832662c85acfcf5b with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       14.9 seconds inserted 10000000 Rows, about 671K ops/s

@liaoxin01
Copy link
Contributor

run p0

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 0f8e30b into apache:master Jan 23, 2024
27 of 28 checks passed
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 23, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.0 dev/3.0.0-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants