Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Some benchmarks report incorrect metrics #44081

Closed
pitrou opened this issue Sep 12, 2024 · 1 comment
Closed

[C++][Parquet] Some benchmarks report incorrect metrics #44081

pitrou opened this issue Sep 12, 2024 · 1 comment

Comments

@pitrou
Copy link
Member

pitrou commented Sep 12, 2024

Describe the bug, including details regarding any error messages, version, and platform.

See e.g. for reading and writing booleans:

------------------------------------------------------------------------------------------------
Benchmark                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------
BM_WriteColumn<false,BooleanType>       33919859 ns     33893957 ns           21 bytes_per_second=295.038Mi/s items_per_second=309.37M/s
BM_WriteColumn<true,BooleanType>       155647232 ns    155603108 ns            4 bytes_per_second=64.2661Mi/s items_per_second=67.3879M/s
BM_ReadColumn<false,BooleanType>/-1/0    6586119 ns      6582773 ns          106 bytes_per_second=1.48351Gi/s items_per_second=1.59291G/s
BM_ReadColumn<false,BooleanType>/1/20   19042489 ns     19035007 ns           36 bytes_per_second=525.348Mi/s items_per_second=550.867M/s
BM_ReadColumn<true,BooleanType>/-1/1    59521534 ns     59415433 ns           11 bytes_per_second=168.306Mi/s items_per_second=176.482M/s
BM_ReadColumn<true,BooleanType>/5/10    47425596 ns     47282945 ns           15 bytes_per_second=211.493Mi/s items_per_second=221.766M/s

Component(s)

Benchmarking, C++, Parquet

@pitrou pitrou self-assigned this Sep 12, 2024
@pitrou pitrou changed the title [C++][Parquet] Some benchmarks report incorrect bytes/s. metric [C++][Parquet] Some benchmarks report incorrect metrics Sep 12, 2024
pitrou added a commit to pitrou/arrow that referenced this issue Sep 12, 2024
…reader-writer-benchmark

1. items/sec and bytes/sec were set to the same value in some benchmarks
2. bytes/sec was incorrectly computed for boolean columns
pitrou added a commit to pitrou/arrow that referenced this issue Sep 12, 2024
…reader-writer-benchmark

1. items/sec and bytes/sec were set to the same value in some benchmarks
2. bytes/sec was incorrectly computed for boolean columns
pitrou added a commit that referenced this issue Sep 12, 2024
…-writer-benchmark (#44082)

### Rationale for this change

1. items/sec and bytes/sec were set to the same value in some benchmarks
2. bytes/sec was incorrectly computed for boolean columns

### What changes are included in this PR?

Fix parquet-arrow-reader-writer-benchmark to report correct metrics.

#### Example (column writing)

Before:
```
--------------------------------------------------------------------------------------------------------------------
Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
BM_WriteColumn<false,Int32Type>                             43138428 ns     43118609 ns           15 bytes_per_second=927.674Mi/s items_per_second=972.736M/s
BM_WriteColumn<true,Int32Type>                             150528627 ns    150480597 ns            5 bytes_per_second=265.815Mi/s items_per_second=278.727M/s
BM_WriteColumn<false,Int64Type>                             49243514 ns     49214955 ns           14 bytes_per_second=1.58742Gi/s items_per_second=1.70448G/s
BM_WriteColumn<true,Int64Type>                             151526550 ns    151472832 ns            5 bytes_per_second=528.148Mi/s items_per_second=553.803M/s
BM_WriteColumn<false,DoubleType>                            59101372 ns     59068058 ns           12 bytes_per_second=1.32263Gi/s items_per_second=1.42016G/s
BM_WriteColumn<true,DoubleType>                            159944872 ns    159895095 ns            4 bytes_per_second=500.328Mi/s items_per_second=524.632M/s
BM_WriteColumn<false,BooleanType>                           32855604 ns     32845322 ns           21 bytes_per_second=304.457Mi/s items_per_second=319.247M/s
BM_WriteColumn<true,BooleanType>                           150566118 ns    150528329 ns            5 bytes_per_second=66.4327Mi/s items_per_second=69.6597M/s
```
After:
```
Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
BM_WriteColumn<false,Int32Type>                             43919180 ns     43895926 ns           16 bytes_per_second=911.246Mi/s items_per_second=238.878M/s
BM_WriteColumn<true,Int32Type>                             153981290 ns    153929841 ns            5 bytes_per_second=259.859Mi/s items_per_second=68.1204M/s
BM_WriteColumn<false,Int64Type>                             49906105 ns     49860098 ns           14 bytes_per_second=1.56688Gi/s items_per_second=210.304M/s
BM_WriteColumn<true,Int64Type>                             154273499 ns    154202319 ns            5 bytes_per_second=518.799Mi/s items_per_second=68M/s
BM_WriteColumn<false,DoubleType>                            59789490 ns     59733498 ns           12 bytes_per_second=1.30789Gi/s items_per_second=175.542M/s
BM_WriteColumn<true,DoubleType>                            161235860 ns    161169670 ns            4 bytes_per_second=496.371Mi/s items_per_second=65.0604M/s
BM_WriteColumn<false,BooleanType>                           32962097 ns     32950864 ns           21 bytes_per_second=37.9353Mi/s items_per_second=318.224M/s
BM_WriteColumn<true,BooleanType>                           154103499 ns    154052873 ns            5 bytes_per_second=8.1141Mi/s items_per_second=68.066M/s
```

#### Example (column reading)

Before:
```
---------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------
BM_ReadColumn<false,BooleanType>/-1/0                               6456731 ns      6453510 ns          108 bytes_per_second=1.51323Gi/s items_per_second=1.62482G/s
BM_ReadColumn<false,BooleanType>/1/20                              19012505 ns     19006068 ns           36 bytes_per_second=526.148Mi/s items_per_second=551.706M/s
BM_ReadColumn<true,BooleanType>/-1/1                               58365426 ns     58251529 ns           12 bytes_per_second=171.669Mi/s items_per_second=180.008M/s
BM_ReadColumn<true,BooleanType>/5/10                               46498966 ns     46442191 ns           15 bytes_per_second=215.321Mi/s items_per_second=225.781M/s

BM_ReadIndividualRowGroups                                         29617575 ns     29600557 ns           24 bytes_per_second=2.63931Gi/s items_per_second=2.83394G/s
BM_ReadMultipleRowGroups                                           47416980 ns     47288951 ns           15 bytes_per_second=1.65208Gi/s items_per_second=1.7739G/s
BM_ReadMultipleRowGroupsGenerator                                  29741012 ns     29722112 ns           24 bytes_per_second=2.62851Gi/s items_per_second=2.82235G/s
```

After:
```
---------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------
BM_ReadColumn<false,BooleanType>/-1/0                               6438249 ns      6435159 ns          109 bytes_per_second=194.245Mi/s items_per_second=1.62945G/s
BM_ReadColumn<false,BooleanType>/1/20                              19427495 ns     19419378 ns           37 bytes_per_second=64.3687Mi/s items_per_second=539.964M/s
BM_ReadColumn<true,BooleanType>/-1/1                               58342877 ns     58298236 ns           12 bytes_per_second=21.4415Mi/s items_per_second=179.864M/s
BM_ReadColumn<true,BooleanType>/5/10                               46591584 ns     46532288 ns           15 bytes_per_second=26.8631Mi/s items_per_second=225.344M/s

BM_ReadIndividualRowGroups                                         30039049 ns     30021676 ns           23 bytes_per_second=2.60229Gi/s items_per_second=349.273M/s
BM_ReadMultipleRowGroups                                           47877663 ns     47650438 ns           15 bytes_per_second=1.63954Gi/s items_per_second=220.056M/s
BM_ReadMultipleRowGroupsGenerator                                  30377987 ns     30360019 ns           23 bytes_per_second=2.57329Gi/s items_per_second=345.381M/s
```

### Are these changes tested?

Manually by running benchmarks.

### Are there any user-facing changes?

No, but this breaks historical comparisons in continuous benchmarking.
* GitHub Issue: #44081

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 18.0.0 milestone Sep 12, 2024
@pitrou
Copy link
Member Author

pitrou commented Sep 12, 2024

Issue resolved by pull request 44082
#44082

@pitrou pitrou closed this as completed Sep 12, 2024
khwilson pushed a commit to khwilson/arrow that referenced this issue Sep 14, 2024
…reader-writer-benchmark (apache#44082)

### Rationale for this change

1. items/sec and bytes/sec were set to the same value in some benchmarks
2. bytes/sec was incorrectly computed for boolean columns

### What changes are included in this PR?

Fix parquet-arrow-reader-writer-benchmark to report correct metrics.

#### Example (column writing)

Before:
```
--------------------------------------------------------------------------------------------------------------------
Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
BM_WriteColumn<false,Int32Type>                             43138428 ns     43118609 ns           15 bytes_per_second=927.674Mi/s items_per_second=972.736M/s
BM_WriteColumn<true,Int32Type>                             150528627 ns    150480597 ns            5 bytes_per_second=265.815Mi/s items_per_second=278.727M/s
BM_WriteColumn<false,Int64Type>                             49243514 ns     49214955 ns           14 bytes_per_second=1.58742Gi/s items_per_second=1.70448G/s
BM_WriteColumn<true,Int64Type>                             151526550 ns    151472832 ns            5 bytes_per_second=528.148Mi/s items_per_second=553.803M/s
BM_WriteColumn<false,DoubleType>                            59101372 ns     59068058 ns           12 bytes_per_second=1.32263Gi/s items_per_second=1.42016G/s
BM_WriteColumn<true,DoubleType>                            159944872 ns    159895095 ns            4 bytes_per_second=500.328Mi/s items_per_second=524.632M/s
BM_WriteColumn<false,BooleanType>                           32855604 ns     32845322 ns           21 bytes_per_second=304.457Mi/s items_per_second=319.247M/s
BM_WriteColumn<true,BooleanType>                           150566118 ns    150528329 ns            5 bytes_per_second=66.4327Mi/s items_per_second=69.6597M/s
```
After:
```
Benchmark                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------
BM_WriteColumn<false,Int32Type>                             43919180 ns     43895926 ns           16 bytes_per_second=911.246Mi/s items_per_second=238.878M/s
BM_WriteColumn<true,Int32Type>                             153981290 ns    153929841 ns            5 bytes_per_second=259.859Mi/s items_per_second=68.1204M/s
BM_WriteColumn<false,Int64Type>                             49906105 ns     49860098 ns           14 bytes_per_second=1.56688Gi/s items_per_second=210.304M/s
BM_WriteColumn<true,Int64Type>                             154273499 ns    154202319 ns            5 bytes_per_second=518.799Mi/s items_per_second=68M/s
BM_WriteColumn<false,DoubleType>                            59789490 ns     59733498 ns           12 bytes_per_second=1.30789Gi/s items_per_second=175.542M/s
BM_WriteColumn<true,DoubleType>                            161235860 ns    161169670 ns            4 bytes_per_second=496.371Mi/s items_per_second=65.0604M/s
BM_WriteColumn<false,BooleanType>                           32962097 ns     32950864 ns           21 bytes_per_second=37.9353Mi/s items_per_second=318.224M/s
BM_WriteColumn<true,BooleanType>                           154103499 ns    154052873 ns            5 bytes_per_second=8.1141Mi/s items_per_second=68.066M/s
```

#### Example (column reading)

Before:
```
---------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------
BM_ReadColumn<false,BooleanType>/-1/0                               6456731 ns      6453510 ns          108 bytes_per_second=1.51323Gi/s items_per_second=1.62482G/s
BM_ReadColumn<false,BooleanType>/1/20                              19012505 ns     19006068 ns           36 bytes_per_second=526.148Mi/s items_per_second=551.706M/s
BM_ReadColumn<true,BooleanType>/-1/1                               58365426 ns     58251529 ns           12 bytes_per_second=171.669Mi/s items_per_second=180.008M/s
BM_ReadColumn<true,BooleanType>/5/10                               46498966 ns     46442191 ns           15 bytes_per_second=215.321Mi/s items_per_second=225.781M/s

BM_ReadIndividualRowGroups                                         29617575 ns     29600557 ns           24 bytes_per_second=2.63931Gi/s items_per_second=2.83394G/s
BM_ReadMultipleRowGroups                                           47416980 ns     47288951 ns           15 bytes_per_second=1.65208Gi/s items_per_second=1.7739G/s
BM_ReadMultipleRowGroupsGenerator                                  29741012 ns     29722112 ns           24 bytes_per_second=2.62851Gi/s items_per_second=2.82235G/s
```

After:
```
---------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------
BM_ReadColumn<false,BooleanType>/-1/0                               6438249 ns      6435159 ns          109 bytes_per_second=194.245Mi/s items_per_second=1.62945G/s
BM_ReadColumn<false,BooleanType>/1/20                              19427495 ns     19419378 ns           37 bytes_per_second=64.3687Mi/s items_per_second=539.964M/s
BM_ReadColumn<true,BooleanType>/-1/1                               58342877 ns     58298236 ns           12 bytes_per_second=21.4415Mi/s items_per_second=179.864M/s
BM_ReadColumn<true,BooleanType>/5/10                               46591584 ns     46532288 ns           15 bytes_per_second=26.8631Mi/s items_per_second=225.344M/s

BM_ReadIndividualRowGroups                                         30039049 ns     30021676 ns           23 bytes_per_second=2.60229Gi/s items_per_second=349.273M/s
BM_ReadMultipleRowGroups                                           47877663 ns     47650438 ns           15 bytes_per_second=1.63954Gi/s items_per_second=220.056M/s
BM_ReadMultipleRowGroupsGenerator                                  30377987 ns     30360019 ns           23 bytes_per_second=2.57329Gi/s items_per_second=345.381M/s
```

### Are these changes tested?

Manually by running benchmarks.

### Are there any user-facing changes?

No, but this breaks historical comparisons in continuous benchmarking.
* GitHub Issue: apache#44081

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant