Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](hive) fix block decompressor bug #45289

Merged
merged 1 commit into from
Dec 13, 2024

Conversation

suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Dec 11, 2024

What problem does this PR solve?

Problem Summary:
In the block decompressor, when it is found that the input data is less than 4 bytes (the header size of the large block), should set more_input_bytes instead of reporting an error.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@wm1581066 wm1581066 added usercase Important user case type label dev/2.1.x labels Dec 11, 2024
@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.75% (10103/26073)
Line Coverage: 29.69% (84770/285526)
Region Coverage: 28.75% (43503/151325)
Branch Coverage: 25.31% (22105/87330)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5acbc1a3ef442b52f760f213defd8e3865a909e0_5acbc1a3ef442b52f760f213defd8e3865a909e0/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 39839 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5acbc1a3ef442b52f760f213defd8e3865a909e0, data reload: false

------ Round 1 ----------------------------------
q1	17576	7454	7266	7266
q2	2042	169	162	162
q3	10650	1092	1177	1092
q4	10574	761	789	761
q5	7621	2716	2616	2616
q6	237	153	151	151
q7	974	653	606	606
q8	9450	1892	1932	1892
q9	7237	6437	6469	6437
q10	7020	2294	2305	2294
q11	459	268	250	250
q12	430	231	231	231
q13	17799	2995	3023	2995
q14	248	203	206	203
q15	565	526	525	525
q16	644	563	574	563
q17	966	524	550	524
q18	7255	6710	6615	6615
q19	1357	1040	892	892
q20	462	184	183	183
q21	4007	3381	3276	3276
q22	377	325	305	305
Total cold run time: 107950 ms
Total hot run time: 39839 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7205	7208	7189	7189
q2	324	233	229	229
q3	2893	2823	2991	2823
q4	2060	1857	1892	1857
q5	5671	5633	5640	5633
q6	226	142	150	142
q7	2229	1837	1808	1808
q8	3412	3508	3498	3498
q9	8904	9050	8939	8939
q10	3582	3528	3528	3528
q11	619	539	507	507
q12	819	619	591	591
q13	11742	3225	3158	3158
q14	294	291	276	276
q15	575	527	517	517
q16	690	621	650	621
q17	1873	1638	1610	1610
q18	8366	7738	7678	7678
q19	1739	1550	1604	1550
q20	2111	1889	1870	1870
q21	5628	5487	5445	5445
q22	608	591	552	552
Total cold run time: 71570 ms
Total hot run time: 60021 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 195498 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5acbc1a3ef442b52f760f213defd8e3865a909e0, data reload: false

query1	1261	994	934	934
query2	6243	2135	2126	2126
query3	10975	4505	4144	4144
query4	66579	29136	23445	23445
query5	4970	456	464	456
query6	415	190	186	186
query7	5691	318	297	297
query8	314	221	228	221
query9	9399	2720	2704	2704
query10	452	247	245	245
query11	17535	15129	15707	15129
query12	158	107	105	105
query13	1581	453	450	450
query14	9827	7467	6819	6819
query15	225	189	185	185
query16	7074	455	494	455
query17	1039	565	557	557
query18	1936	297	302	297
query19	218	156	146	146
query20	117	113	108	108
query21	204	96	102	96
query22	4711	4413	4586	4413
query23	34517	33919	33864	33864
query24	5434	2555	2613	2555
query25	526	427	421	421
query26	643	195	150	150
query27	1748	295	284	284
query28	4280	2487	2475	2475
query29	689	433	406	406
query30	207	150	150	150
query31	991	838	853	838
query32	71	57	56	56
query33	415	316	285	285
query34	940	533	529	529
query35	858	796	761	761
query36	1116	958	982	958
query37	124	73	76	73
query38	4369	4279	4325	4279
query39	1479	1453	1446	1446
query40	203	98	97	97
query41	51	41	42	41
query42	121	104	107	104
query43	553	499	515	499
query44	1212	867	863	863
query45	192	181	180	180
query46	1181	739	724	724
query47	2017	1922	1893	1893
query48	440	328	322	322
query49	746	402	388	388
query50	841	392	378	378
query51	7410	7142	7196	7142
query52	95	89	86	86
query53	252	173	173	173
query54	536	395	395	395
query55	77	76	83	76
query56	248	233	226	226
query57	1269	1117	1110	1110
query58	209	211	210	210
query59	3361	3188	2922	2922
query60	265	253	236	236
query61	108	106	107	106
query62	803	667	686	667
query63	209	180	191	180
query64	1380	653	648	648
query65	3270	3183	3176	3176
query66	700	311	295	295
query67	15922	15661	15531	15531
query68	4139	567	557	557
query69	437	250	250	250
query70	1093	1140	1161	1140
query71	352	252	270	252
query72	6378	4182	4051	4051
query73	770	361	365	361
query74	10250	8954	8900	8900
query75	3387	2778	2644	2644
query76	1830	1100	1154	1100
query77	462	283	279	279
query78	10411	9436	9378	9378
query79	2014	583	589	583
query80	1389	418	440	418
query81	516	227	228	227
query82	1297	123	115	115
query83	170	143	141	141
query84	284	71	67	67
query85	1016	303	316	303
query86	408	272	298	272
query87	4672	4512	4462	4462
query88	3593	2217	2183	2183
query89	414	302	290	290
query90	1943	186	181	181
query91	142	102	112	102
query92	63	47	51	47
query93	2776	553	555	553
query94	889	302	280	280
query95	353	247	236	236
query96	628	277	274	274
query97	2865	2678	2642	2642
query98	220	196	196	196
query99	1645	1314	1298	1298
Total cold run time: 319808 ms
Total hot run time: 195498 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.73 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5acbc1a3ef442b52f760f213defd8e3865a909e0, data reload: false

query1	0.03	0.03	0.04
query2	0.06	0.06	0.03
query3	0.23	0.07	0.07
query4	1.62	0.10	0.11
query5	0.42	0.42	0.42
query6	1.16	0.65	0.65
query7	0.02	0.01	0.02
query8	0.04	0.04	0.04
query9	0.58	0.50	0.50
query10	0.55	0.57	0.56
query11	0.14	0.10	0.10
query12	0.13	0.11	0.10
query13	0.61	0.60	0.60
query14	2.72	2.70	2.79
query15	0.90	0.83	0.82
query16	0.38	0.39	0.38
query17	1.06	1.06	1.04
query18	0.22	0.22	0.20
query19	1.94	1.87	2.00
query20	0.02	0.01	0.01
query21	15.36	0.59	0.59
query22	2.79	1.89	1.70
query23	17.43	0.97	0.87
query24	3.16	1.07	2.29
query25	0.32	0.22	0.12
query26	0.51	0.13	0.12
query27	0.04	0.05	0.05
query28	9.85	1.11	1.08
query29	12.57	3.26	3.25
query30	0.25	0.06	0.06
query31	2.86	0.39	0.37
query32	3.27	0.46	0.46
query33	3.00	3.03	3.02
query34	17.08	4.45	4.44
query35	4.52	4.48	4.49
query36	0.66	0.51	0.53
query37	0.09	0.06	0.06
query38	0.04	0.03	0.04
query39	0.03	0.02	0.02
query40	0.17	0.13	0.13
query41	0.07	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.98 s
Total hot run time: 32.73 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 12, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit c7fefc3 into apache:master Dec 13, 2024
36 of 39 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 13, 2024
### What problem does this PR solve?

Problem Summary:
In the block decompressor, when it is found that the input data is less
than 4 bytes (the header size of the large block), should set
more_input_bytes instead of reporting an error.
github-actions bot pushed a commit that referenced this pull request Dec 13, 2024
### What problem does this PR solve?

Problem Summary:
In the block decompressor, when it is found that the input data is less
than 4 bytes (the header size of the large block), should set
more_input_bytes instead of reporting an error.
morningman pushed a commit that referenced this pull request Dec 15, 2024
Cherry-picked from #45289

Co-authored-by: Socrates <suyiteng@selectdb.com>
morningman pushed a commit that referenced this pull request Dec 15, 2024
Cherry-picked from #45289

Co-authored-by: Socrates <suyiteng@selectdb.com>
hubgeter pushed a commit to hubgeter/doris that referenced this pull request Jan 15, 2025
### What problem does this PR solve?

Problem Summary:
In the block decompressor, when it is found that the input data is less
than 4 bytes (the header size of the large block), should set
more_input_bytes instead of reporting an error.

Co-authored-by: Socrates <suyiteng@selectdb.com>
@yiguolei yiguolei mentioned this pull request Jan 19, 2025
@suxiaogang223 suxiaogang223 deleted the fix_snappy branch March 11, 2025 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants