Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize FlatVector<T>::copyRanges #6566

Closed

Conversation

mbasmanova
Copy link
Contributor

@mbasmanova mbasmanova commented Sep 14, 2023

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: #5958 (comment)

FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }

This change optimizes FlatVector::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

Before: 

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls##1                           33.50ms     29.85
array_constructor_ARRAY_nulls##2                           59.05ms     16.93
array_constructor_ARRAY_nulls##3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls##1                           30.51ms     32.78
array_constructor_ARRAY_nulls##2                           55.13ms     18.14
array_constructor_ARRAY_nulls##3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls##1                             37.00ms     27.03
array_constructor_MAP_nulls##2                             67.76ms     14.76
array_constructor_MAP_nulls##3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls##1                             34.34ms     29.12
array_constructor_MAP_nulls##2                             55.23ms     18.11
array_constructor_MAP_nulls##3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35

@netlify
Copy link

netlify bot commented Sep 14, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 585bcbf
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/65088f9f142d340008b58e3a

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 14, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 14, 2023
Summary:

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

We also optimize FlatVector::copyRanges (which is used by Array/MapVector::copyRanges).

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63


After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.43ms     64.80
array_constructor_ARRAY_nullfree#facebookincubator#2                        24.50ms     40.81
array_constructor_ARRAY_nullfree#facebookincubator#3                        35.12ms     28.47
array_constructor_ARRAY_nullfree##2_null                   54.52ms     18.34
array_constructor_ARRAY_nullfree##2_const                  43.28ms     23.10
array_constructor_ARRAY_nulls#facebookincubator#1                           28.60ms     34.96
array_constructor_ARRAY_nulls#facebookincubator#2                           50.82ms     19.68
array_constructor_ARRAY_nulls#facebookincubator#3                           70.31ms     14.22
array_constructor_ARRAY_nulls##2_null                      64.43ms     15.52
array_constructor_ARRAY_nulls##2_const                     80.71ms     12.39


Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06


After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.49ms    286.59
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.91ms    126.46
array_constructor_INTEGER_nullfree#facebookincubator#3                      11.99ms     83.41
array_constructor_INTEGER_nullfree##2_null                 12.57ms     79.55
array_constructor_INTEGER_nullfree##2_const                11.03ms     90.67
array_constructor_INTEGER_nulls#facebookincubator#1                          4.37ms    228.97
array_constructor_INTEGER_nulls#facebookincubator#2                          9.99ms    100.14
array_constructor_INTEGER_nulls#facebookincubator#3                         14.79ms     67.60
array_constructor_INTEGER_nulls##2_null                    12.21ms     81.92
array_constructor_INTEGER_nulls##2_const                   12.64ms     79.12


Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33


After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.38ms     57.53
array_constructor_MAP_nullfree#facebookincubator#2                          29.41ms     34.00
array_constructor_MAP_nullfree#facebookincubator#3                          38.30ms     26.11
array_constructor_MAP_nullfree##2_null                     58.52ms     17.09
array_constructor_MAP_nullfree##2_const                    48.62ms     20.57
array_constructor_MAP_nulls#facebookincubator#1                             30.60ms     32.68
array_constructor_MAP_nulls#facebookincubator#2                             53.94ms     18.54
array_constructor_MAP_nulls#facebookincubator#3                             86.48ms     11.56
array_constructor_MAP_nulls##2_null                        69.53ms     14.38
array_constructor_MAP_nulls##2_const                       87.56ms     11.42


Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.64ms    177.44
array_constructor_ROW_nullfree#facebookincubator#2                          14.40ms     69.44
array_constructor_ROW_nullfree#facebookincubator#3                          21.46ms     46.59
array_constructor_ROW_nullfree##2_null                     19.14ms     52.26
array_constructor_ROW_nullfree##2_const                    18.60ms     53.77
array_constructor_ROW_nulls#facebookincubator#1                             10.97ms     91.18
array_constructor_ROW_nulls#facebookincubator#2                             18.29ms     54.67
array_constructor_ROW_nulls#facebookincubator#3                             28.57ms     35.01
array_constructor_ROW_nulls##2_null                        25.10ms     39.84
array_constructor_ROW_nulls##2_const                       24.55ms     40.74
```

Differential Revision: D49269500
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 14, 2023
Summary:

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

We also optimize FlatVector::copyRanges (which is used by Array/MapVector::copyRanges).

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63


After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.43ms     64.80
array_constructor_ARRAY_nullfree#facebookincubator#2                        24.50ms     40.81
array_constructor_ARRAY_nullfree#facebookincubator#3                        35.12ms     28.47
array_constructor_ARRAY_nullfree##2_null                   54.52ms     18.34
array_constructor_ARRAY_nullfree##2_const                  43.28ms     23.10
array_constructor_ARRAY_nulls#facebookincubator#1                           28.60ms     34.96
array_constructor_ARRAY_nulls#facebookincubator#2                           50.82ms     19.68
array_constructor_ARRAY_nulls#facebookincubator#3                           70.31ms     14.22
array_constructor_ARRAY_nulls##2_null                      64.43ms     15.52
array_constructor_ARRAY_nulls##2_const                     80.71ms     12.39


Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06


After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.49ms    286.59
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.91ms    126.46
array_constructor_INTEGER_nullfree#facebookincubator#3                      11.99ms     83.41
array_constructor_INTEGER_nullfree##2_null                 12.57ms     79.55
array_constructor_INTEGER_nullfree##2_const                11.03ms     90.67
array_constructor_INTEGER_nulls#facebookincubator#1                          4.37ms    228.97
array_constructor_INTEGER_nulls#facebookincubator#2                          9.99ms    100.14
array_constructor_INTEGER_nulls#facebookincubator#3                         14.79ms     67.60
array_constructor_INTEGER_nulls##2_null                    12.21ms     81.92
array_constructor_INTEGER_nulls##2_const                   12.64ms     79.12


Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33


After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.38ms     57.53
array_constructor_MAP_nullfree#facebookincubator#2                          29.41ms     34.00
array_constructor_MAP_nullfree#facebookincubator#3                          38.30ms     26.11
array_constructor_MAP_nullfree##2_null                     58.52ms     17.09
array_constructor_MAP_nullfree##2_const                    48.62ms     20.57
array_constructor_MAP_nulls#facebookincubator#1                             30.60ms     32.68
array_constructor_MAP_nulls#facebookincubator#2                             53.94ms     18.54
array_constructor_MAP_nulls#facebookincubator#3                             86.48ms     11.56
array_constructor_MAP_nulls##2_null                        69.53ms     14.38
array_constructor_MAP_nulls##2_const                       87.56ms     11.42


Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.64ms    177.44
array_constructor_ROW_nullfree#facebookincubator#2                          14.40ms     69.44
array_constructor_ROW_nullfree#facebookincubator#3                          21.46ms     46.59
array_constructor_ROW_nullfree##2_null                     19.14ms     52.26
array_constructor_ROW_nullfree##2_const                    18.60ms     53.77
array_constructor_ROW_nulls#facebookincubator#1                             10.97ms     91.18
array_constructor_ROW_nulls#facebookincubator#2                             18.29ms     54.67
array_constructor_ROW_nulls#facebookincubator#3                             28.57ms     35.01
array_constructor_ROW_nulls##2_null                        25.10ms     39.84
array_constructor_ROW_nulls##2_const                       24.55ms     40.74
```

Differential Revision: D49269500
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

@mbasmanova mbasmanova changed the title Optimize array_constructor Optimize array_constructor - v1 Sep 14, 2023
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 14, 2023
Summary:

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

We also optimize FlatVector::copyRanges (which is used by Array/MapVector::copyRanges).

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63


After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.43ms     64.80
array_constructor_ARRAY_nullfree#facebookincubator#2                        24.50ms     40.81
array_constructor_ARRAY_nullfree#facebookincubator#3                        35.12ms     28.47
array_constructor_ARRAY_nullfree##2_null                   54.52ms     18.34
array_constructor_ARRAY_nullfree##2_const                  43.28ms     23.10
array_constructor_ARRAY_nulls#facebookincubator#1                           28.60ms     34.96
array_constructor_ARRAY_nulls#facebookincubator#2                           50.82ms     19.68
array_constructor_ARRAY_nulls#facebookincubator#3                           70.31ms     14.22
array_constructor_ARRAY_nulls##2_null                      64.43ms     15.52
array_constructor_ARRAY_nulls##2_const                     80.71ms     12.39


Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06


After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.49ms    286.59
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.91ms    126.46
array_constructor_INTEGER_nullfree#facebookincubator#3                      11.99ms     83.41
array_constructor_INTEGER_nullfree##2_null                 12.57ms     79.55
array_constructor_INTEGER_nullfree##2_const                11.03ms     90.67
array_constructor_INTEGER_nulls#facebookincubator#1                          4.37ms    228.97
array_constructor_INTEGER_nulls#facebookincubator#2                          9.99ms    100.14
array_constructor_INTEGER_nulls#facebookincubator#3                         14.79ms     67.60
array_constructor_INTEGER_nulls##2_null                    12.21ms     81.92
array_constructor_INTEGER_nulls##2_const                   12.64ms     79.12


Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33


After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.38ms     57.53
array_constructor_MAP_nullfree#facebookincubator#2                          29.41ms     34.00
array_constructor_MAP_nullfree#facebookincubator#3                          38.30ms     26.11
array_constructor_MAP_nullfree##2_null                     58.52ms     17.09
array_constructor_MAP_nullfree##2_const                    48.62ms     20.57
array_constructor_MAP_nulls#facebookincubator#1                             30.60ms     32.68
array_constructor_MAP_nulls#facebookincubator#2                             53.94ms     18.54
array_constructor_MAP_nulls#facebookincubator#3                             86.48ms     11.56
array_constructor_MAP_nulls##2_null                        69.53ms     14.38
array_constructor_MAP_nulls##2_const                       87.56ms     11.42


Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.64ms    177.44
array_constructor_ROW_nullfree#facebookincubator#2                          14.40ms     69.44
array_constructor_ROW_nullfree#facebookincubator#3                          21.46ms     46.59
array_constructor_ROW_nullfree##2_null                     19.14ms     52.26
array_constructor_ROW_nullfree##2_const                    18.60ms     53.77
array_constructor_ROW_nulls#facebookincubator#1                             10.97ms     91.18
array_constructor_ROW_nulls#facebookincubator#2                             18.29ms     54.67
array_constructor_ROW_nulls#facebookincubator#3                             28.57ms     35.01
array_constructor_ROW_nulls##2_null                        25.10ms     39.84
array_constructor_ROW_nulls##2_const                       24.55ms     40.74
```

Differential Revision: D49269500
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 14, 2023
Summary:

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

We also optimize FlatVector::copyRanges (which is used by Array/MapVector::copyRanges).

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63


After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.43ms     64.80
array_constructor_ARRAY_nullfree#facebookincubator#2                        24.50ms     40.81
array_constructor_ARRAY_nullfree#facebookincubator#3                        35.12ms     28.47
array_constructor_ARRAY_nullfree##2_null                   54.52ms     18.34
array_constructor_ARRAY_nullfree##2_const                  43.28ms     23.10
array_constructor_ARRAY_nulls#facebookincubator#1                           28.60ms     34.96
array_constructor_ARRAY_nulls#facebookincubator#2                           50.82ms     19.68
array_constructor_ARRAY_nulls#facebookincubator#3                           70.31ms     14.22
array_constructor_ARRAY_nulls##2_null                      64.43ms     15.52
array_constructor_ARRAY_nulls##2_const                     80.71ms     12.39


Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06


After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.49ms    286.59
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.91ms    126.46
array_constructor_INTEGER_nullfree#facebookincubator#3                      11.99ms     83.41
array_constructor_INTEGER_nullfree##2_null                 12.57ms     79.55
array_constructor_INTEGER_nullfree##2_const                11.03ms     90.67
array_constructor_INTEGER_nulls#facebookincubator#1                          4.37ms    228.97
array_constructor_INTEGER_nulls#facebookincubator#2                          9.99ms    100.14
array_constructor_INTEGER_nulls#facebookincubator#3                         14.79ms     67.60
array_constructor_INTEGER_nulls##2_null                    12.21ms     81.92
array_constructor_INTEGER_nulls##2_const                   12.64ms     79.12


Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33


After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.38ms     57.53
array_constructor_MAP_nullfree#facebookincubator#2                          29.41ms     34.00
array_constructor_MAP_nullfree#facebookincubator#3                          38.30ms     26.11
array_constructor_MAP_nullfree##2_null                     58.52ms     17.09
array_constructor_MAP_nullfree##2_const                    48.62ms     20.57
array_constructor_MAP_nulls#facebookincubator#1                             30.60ms     32.68
array_constructor_MAP_nulls#facebookincubator#2                             53.94ms     18.54
array_constructor_MAP_nulls#facebookincubator#3                             86.48ms     11.56
array_constructor_MAP_nulls##2_null                        69.53ms     14.38
array_constructor_MAP_nulls##2_const                       87.56ms     11.42


Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.64ms    177.44
array_constructor_ROW_nullfree#facebookincubator#2                          14.40ms     69.44
array_constructor_ROW_nullfree#facebookincubator#3                          21.46ms     46.59
array_constructor_ROW_nullfree##2_null                     19.14ms     52.26
array_constructor_ROW_nullfree##2_const                    18.60ms     53.77
array_constructor_ROW_nulls#facebookincubator#1                             10.97ms     91.18
array_constructor_ROW_nulls#facebookincubator#2                             18.29ms     54.67
array_constructor_ROW_nulls#facebookincubator#3                             28.57ms     35.01
array_constructor_ROW_nulls##2_null                        25.10ms     39.84
array_constructor_ROW_nulls##2_const                       24.55ms     40.74
```

Differential Revision: D49269500
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 15, 2023
Summary:

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

We also optimize FlatVector::copyRanges (which is used by Array/MapVector::copyRanges).

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63


After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.43ms     64.80
array_constructor_ARRAY_nullfree#facebookincubator#2                        24.50ms     40.81
array_constructor_ARRAY_nullfree#facebookincubator#3                        35.12ms     28.47
array_constructor_ARRAY_nullfree##2_null                   54.52ms     18.34
array_constructor_ARRAY_nullfree##2_const                  43.28ms     23.10
array_constructor_ARRAY_nulls#facebookincubator#1                           28.60ms     34.96
array_constructor_ARRAY_nulls#facebookincubator#2                           50.82ms     19.68
array_constructor_ARRAY_nulls#facebookincubator#3                           70.31ms     14.22
array_constructor_ARRAY_nulls##2_null                      64.43ms     15.52
array_constructor_ARRAY_nulls##2_const                     80.71ms     12.39


Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06


After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.49ms    286.59
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.91ms    126.46
array_constructor_INTEGER_nullfree#facebookincubator#3                      11.99ms     83.41
array_constructor_INTEGER_nullfree##2_null                 12.57ms     79.55
array_constructor_INTEGER_nullfree##2_const                11.03ms     90.67
array_constructor_INTEGER_nulls#facebookincubator#1                          4.37ms    228.97
array_constructor_INTEGER_nulls#facebookincubator#2                          9.99ms    100.14
array_constructor_INTEGER_nulls#facebookincubator#3                         14.79ms     67.60
array_constructor_INTEGER_nulls##2_null                    12.21ms     81.92
array_constructor_INTEGER_nulls##2_const                   12.64ms     79.12


Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33


After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.38ms     57.53
array_constructor_MAP_nullfree#facebookincubator#2                          29.41ms     34.00
array_constructor_MAP_nullfree#facebookincubator#3                          38.30ms     26.11
array_constructor_MAP_nullfree##2_null                     58.52ms     17.09
array_constructor_MAP_nullfree##2_const                    48.62ms     20.57
array_constructor_MAP_nulls#facebookincubator#1                             30.60ms     32.68
array_constructor_MAP_nulls#facebookincubator#2                             53.94ms     18.54
array_constructor_MAP_nulls#facebookincubator#3                             86.48ms     11.56
array_constructor_MAP_nulls##2_null                        69.53ms     14.38
array_constructor_MAP_nulls##2_const                       87.56ms     11.42


Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.64ms    177.44
array_constructor_ROW_nullfree#facebookincubator#2                          14.40ms     69.44
array_constructor_ROW_nullfree#facebookincubator#3                          21.46ms     46.59
array_constructor_ROW_nullfree##2_null                     19.14ms     52.26
array_constructor_ROW_nullfree##2_const                    18.60ms     53.77
array_constructor_ROW_nulls#facebookincubator#1                             10.97ms     91.18
array_constructor_ROW_nulls#facebookincubator#2                             18.29ms     54.67
array_constructor_ROW_nulls#facebookincubator#3                             28.57ms     35.01
array_constructor_ROW_nulls##2_null                        25.10ms     39.84
array_constructor_ROW_nulls##2_const                       24.55ms     40.74
```

Differential Revision: D49269500
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 15, 2023
Summary:
Pull Request resolved: facebookincubator#6566

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

We also optimize FlatVector::copyRanges (which is used by Array/MapVector::copyRanges).

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63

After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.43ms     64.80
array_constructor_ARRAY_nullfree#facebookincubator#2                        24.50ms     40.81
array_constructor_ARRAY_nullfree#facebookincubator#3                        35.12ms     28.47
array_constructor_ARRAY_nullfree##2_null                   54.52ms     18.34
array_constructor_ARRAY_nullfree##2_const                  43.28ms     23.10
array_constructor_ARRAY_nulls#facebookincubator#1                           28.60ms     34.96
array_constructor_ARRAY_nulls#facebookincubator#2                           50.82ms     19.68
array_constructor_ARRAY_nulls#facebookincubator#3                           70.31ms     14.22
array_constructor_ARRAY_nulls##2_null                      64.43ms     15.52
array_constructor_ARRAY_nulls##2_const                     80.71ms     12.39

Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06

After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.49ms    286.59
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.91ms    126.46
array_constructor_INTEGER_nullfree#facebookincubator#3                      11.99ms     83.41
array_constructor_INTEGER_nullfree##2_null                 12.57ms     79.55
array_constructor_INTEGER_nullfree##2_const                11.03ms     90.67
array_constructor_INTEGER_nulls#facebookincubator#1                          4.37ms    228.97
array_constructor_INTEGER_nulls#facebookincubator#2                          9.99ms    100.14
array_constructor_INTEGER_nulls#facebookincubator#3                         14.79ms     67.60
array_constructor_INTEGER_nulls##2_null                    12.21ms     81.92
array_constructor_INTEGER_nulls##2_const                   12.64ms     79.12

Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33

After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.38ms     57.53
array_constructor_MAP_nullfree#facebookincubator#2                          29.41ms     34.00
array_constructor_MAP_nullfree#facebookincubator#3                          38.30ms     26.11
array_constructor_MAP_nullfree##2_null                     58.52ms     17.09
array_constructor_MAP_nullfree##2_const                    48.62ms     20.57
array_constructor_MAP_nulls#facebookincubator#1                             30.60ms     32.68
array_constructor_MAP_nulls#facebookincubator#2                             53.94ms     18.54
array_constructor_MAP_nulls#facebookincubator#3                             86.48ms     11.56
array_constructor_MAP_nulls##2_null                        69.53ms     14.38
array_constructor_MAP_nulls##2_const                       87.56ms     11.42

Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.64ms    177.44
array_constructor_ROW_nullfree#facebookincubator#2                          14.40ms     69.44
array_constructor_ROW_nullfree#facebookincubator#3                          21.46ms     46.59
array_constructor_ROW_nullfree##2_null                     19.14ms     52.26
array_constructor_ROW_nullfree##2_const                    18.60ms     53.77
array_constructor_ROW_nulls#facebookincubator#1                             10.97ms     91.18
array_constructor_ROW_nulls#facebookincubator#2                             18.29ms     54.67
array_constructor_ROW_nulls#facebookincubator#3                             28.57ms     35.01
array_constructor_ROW_nulls##2_null                        25.10ms     39.84
array_constructor_ROW_nulls##2_const                       24.55ms     40.74
```

Differential Revision: D49269500

fbshipit-source-id: 1f5ff279be27a55f72930457991cefec0d39fcf4
@mbasmanova mbasmanova changed the title Optimize array_constructor - v1 Optimize FlatVector<T>::copyRanges Sep 15, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 15, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.80ms     63.30
array_constructor_ARRAY_nullfree#facebookincubator#2                        25.59ms     39.08
array_constructor_ARRAY_nullfree#facebookincubator#3                        34.49ms     28.99
array_constructor_ARRAY_nullfree##2_null                   58.96ms     16.96
array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.53ms     64.39
array_constructor_ARRAY_nullfree#facebookincubator#2                        25.75ms     38.84
array_constructor_ARRAY_nullfree#facebookincubator#3                        34.37ms     29.10
array_constructor_ARRAY_nullfree##2_null                   59.63ms     16.77
array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree#facebookincubator#1                          16.89ms     59.20
array_constructor_MAP_nullfree#facebookincubator#2                          29.24ms     34.20
array_constructor_MAP_nullfree#facebookincubator#3                          41.11ms     24.33
array_constructor_MAP_nullfree##2_null                     61.98ms     16.14
array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.07ms     58.57
array_constructor_MAP_nullfree#facebookincubator#2                          28.39ms     35.23
array_constructor_MAP_nullfree#facebookincubator#3                          42.34ms     23.62
array_constructor_MAP_nullfree##2_null                     65.24ms     15.33
array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500

fbshipit-source-id: 5b58c3942224962691e9b7f114f5134513328b5d
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 16, 2023
Summary:

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 18, 2023
Summary:

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 18, 2023
Summary:

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 18, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500

fbshipit-source-id: 8fc304cce31f782aef818e6d8ce052360af8e832
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 18, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500

fbshipit-source-id: 054dcec94cc3768443440ffb57d58c96f9b1f006
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 18, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 2fdf6f2a717a3bc4bf54fb3d9bd970157a8c415b
mbasmanova and others added 2 commits September 18, 2023 10:55
Summary:
Fixes facebookincubator#6611

Pull Request resolved: facebookincubator#6613

Differential Revision: D49362420

Pulled By: mbasmanova

fbshipit-source-id: 5a78e1982b2607c400794a61cf368ab4f644d749
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: b7271a981d33885b2d4bc3219bc15f069bcd06d3
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49269500

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in bd8d20b.

codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 7b702921202f4bb8d10a252eb0ab20f0e5792ae6
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 7b702921202f4bb8d10a252eb0ab20f0e5792ae6
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 7b702921202f4bb8d10a252eb0ab20f0e5792ae6
ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 7b702921202f4bb8d10a252eb0ab20f0e5792ae6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants