Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove forward declaration of koski::core::TraitType enum from Velox #3

Closed

Conversation

mbasmanova
Copy link
Contributor

Differential Revision: D30187963

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Aug 9, 2021
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30187963

…(#3)

Summary: Pull Request resolved: facebookincubator/velox#3

Reviewed By: mshang816

Differential Revision: D30187963

fbshipit-source-id: f6ec4d280855ad2bfa7d0f516444b79def3f4d86
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30187963

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ac0f043.

winningsix pushed a commit to winningsix/velox-1 that referenced this pull request Sep 27, 2021
winningsix pushed a commit to winningsix/velox-1 that referenced this pull request Sep 27, 2021
@pedroerp pedroerp mentioned this pull request Nov 8, 2021
facebook-github-bot pushed a commit that referenced this pull request Apr 28, 2022
Summary:
Enhance printExprWithStats to identify common-sub expressions.

For example, `c0 + c1` is a common sub-expression in
`"(c0 + c1) % 5", " (c0 + c1) % 3"` expression set. It is evaluated only once and
there is a single Expr object that represents it. That object appears in the
expression tree twice. printExprWithStats does not show the runtime stats for
second instance of that expression and instead annotates it with `[CSE https://github.com/facebookincubator/velox/issues/2]`,
where CSE stands for common sub-expression and 2 refers to the first instance
of the expression.

```
mod [cpu time: 50.49us, rows: 1024] -> BIGINT [#1]
   cast(plus as BIGINT) [cpu time: 68.15us, rows: 1024] -> BIGINT [#2]
      plus [cpu time: 51.84us, rows: 1024] -> INTEGER [#3]
         c0 [cpu time: 0ns, rows: 0] -> INTEGER [#4]
         c1 [cpu time: 0ns, rows: 0] -> INTEGER [#5]
   5:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [#6]

mod [cpu time: 49.29us, rows: 1024] -> BIGINT [#7]
   cast((plus(c0, c1)) as BIGINT) -> BIGINT [CSE #2]
   3:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [#8]
```

Pull Request resolved: #1500

Reviewed By: Yuhta

Differential Revision: D35994836

Pulled By: mbasmanova

fbshipit-source-id: 6bacbbe61b68dad97ce2fd5f99610c4ad55897be
zhouyuan pushed a commit to zhouyuan/velox that referenced this pull request May 11, 2022
Arty-Maly pushed a commit to Arty-Maly/velox that referenced this pull request May 13, 2022
…tor#1500)

Summary:
Enhance printExprWithStats to identify common-sub expressions.

For example, `c0 + c1` is a common sub-expression in
`"(c0 + c1) % 5", " (c0 + c1) % 3"` expression set. It is evaluated only once and
there is a single Expr object that represents it. That object appears in the
expression tree twice. printExprWithStats does not show the runtime stats for
second instance of that expression and instead annotates it with `[CSE https://github.com/facebookincubator/velox/issues/2]`,
where CSE stands for common sub-expression and 2 refers to the first instance
of the expression.

```
mod [cpu time: 50.49us, rows: 1024] -> BIGINT [#1]
   cast(plus as BIGINT) [cpu time: 68.15us, rows: 1024] -> BIGINT [facebookincubator#2]
      plus [cpu time: 51.84us, rows: 1024] -> INTEGER [facebookincubator#3]
         c0 [cpu time: 0ns, rows: 0] -> INTEGER [facebookincubator#4]
         c1 [cpu time: 0ns, rows: 0] -> INTEGER [facebookincubator#5]
   5:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [facebookincubator#6]

mod [cpu time: 49.29us, rows: 1024] -> BIGINT [facebookincubator#7]
   cast((plus(c0, c1)) as BIGINT) -> BIGINT [CSE facebookincubator#2]
   3:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [facebookincubator#8]
```

Pull Request resolved: facebookincubator#1500

Reviewed By: Yuhta

Differential Revision: D35994836

Pulled By: mbasmanova

fbshipit-source-id: 6bacbbe61b68dad97ce2fd5f99610c4ad55897be
shiyu-bytedance pushed a commit to shiyu-bytedance/velox-1 that referenced this pull request Aug 18, 2022
…tor#1500)

Summary:
Enhance printExprWithStats to identify common-sub expressions.

For example, `c0 + c1` is a common sub-expression in
`"(c0 + c1) % 5", " (c0 + c1) % 3"` expression set. It is evaluated only once and
there is a single Expr object that represents it. That object appears in the
expression tree twice. printExprWithStats does not show the runtime stats for
second instance of that expression and instead annotates it with `[CSE https://github.com/facebookincubator/velox/issues/2]`,
where CSE stands for common sub-expression and 2 refers to the first instance
of the expression.

```
mod [cpu time: 50.49us, rows: 1024] -> BIGINT [facebookincubator#1]
   cast(plus as BIGINT) [cpu time: 68.15us, rows: 1024] -> BIGINT [facebookincubator#2]
      plus [cpu time: 51.84us, rows: 1024] -> INTEGER [facebookincubator#3]
         c0 [cpu time: 0ns, rows: 0] -> INTEGER [facebookincubator#4]
         c1 [cpu time: 0ns, rows: 0] -> INTEGER [facebookincubator#5]
   5:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [facebookincubator#6]

mod [cpu time: 49.29us, rows: 1024] -> BIGINT [facebookincubator#7]
   cast((plus(c0, c1)) as BIGINT) -> BIGINT [CSE facebookincubator#2]
   3:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [facebookincubator#8]
```

Pull Request resolved: facebookincubator#1500

Reviewed By: Yuhta

Differential Revision: D35994836

Pulled By: mbasmanova

fbshipit-source-id: 6bacbbe61b68dad97ce2fd5f99610c4ad55897be
zhouyuan pushed a commit to zhouyuan/velox that referenced this pull request Oct 27, 2022
PHILO-HE pushed a commit to PHILO-HE/velox that referenced this pull request Nov 16, 2022
PHILO-HE pushed a commit to PHILO-HE/velox that referenced this pull request Nov 22, 2022
PHILO-HE pushed a commit to PHILO-HE/velox that referenced this pull request Dec 18, 2022
glipner pushed a commit to glipner/velox that referenced this pull request Jan 13, 2023
PHILO-HE pushed a commit to PHILO-HE/velox that referenced this pull request Feb 3, 2023
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 15, 2023
Summary:

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63


After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.25ms     65.58
array_constructor_ARRAY_nullfree#facebookincubator#2                        25.11ms     39.82
array_constructor_ARRAY_nullfree#facebookincubator#3                        34.59ms     28.91
array_constructor_ARRAY_nullfree##2_null                   53.61ms     18.65
array_constructor_ARRAY_nullfree##2_const                  51.48ms     19.42
array_constructor_ARRAY_nulls#facebookincubator#1                           29.99ms     33.34
array_constructor_ARRAY_nulls#facebookincubator#2                           55.91ms     17.89
array_constructor_ARRAY_nulls#facebookincubator#3                           81.73ms     12.24
array_constructor_ARRAY_nulls##2_null                      66.97ms     14.93
array_constructor_ARRAY_nulls##2_const                     92.96ms     10.76


Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06


After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.39ms    294.89
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.35ms    136.10
array_constructor_INTEGER_nullfree#facebookincubator#3                      10.78ms     92.74
array_constructor_INTEGER_nullfree##2_null                 11.29ms     88.57
array_constructor_INTEGER_nullfree##2_const                10.14ms     98.65
array_constructor_INTEGER_nulls#facebookincubator#1                          4.49ms    222.53
array_constructor_INTEGER_nulls#facebookincubator#2                          9.78ms    102.29
array_constructor_INTEGER_nulls#facebookincubator#3                         14.69ms     68.08
array_constructor_INTEGER_nulls##2_null                    12.14ms     82.36
array_constructor_INTEGER_nulls##2_const                   12.27ms     81.53

Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33


After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.05ms     58.66
array_constructor_MAP_nullfree#facebookincubator#2                          28.42ms     35.18
array_constructor_MAP_nullfree#facebookincubator#3                          36.96ms     27.06
array_constructor_MAP_nullfree##2_null                     55.64ms     17.97
array_constructor_MAP_nullfree##2_const                    67.53ms     14.81
array_constructor_MAP_nulls#facebookincubator#1                             32.91ms     30.39
array_constructor_MAP_nulls#facebookincubator#2                             64.50ms     15.50
array_constructor_MAP_nulls#facebookincubator#3                             95.71ms     10.45
array_constructor_MAP_nulls##2_null                        77.22ms     12.95
array_constructor_MAP_nulls##2_const                      114.91ms      8.70

Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.55ms    180.15
array_constructor_ROW_nullfree#facebookincubator#2                          12.83ms     77.94
array_constructor_ROW_nullfree#facebookincubator#3                          18.89ms     52.95
array_constructor_ROW_nullfree##2_null                     18.74ms     53.36
array_constructor_ROW_nullfree##2_const                    18.16ms     55.07
array_constructor_ROW_nulls#facebookincubator#1                             11.29ms     88.61
array_constructor_ROW_nulls#facebookincubator#2                             18.57ms     53.86
array_constructor_ROW_nulls#facebookincubator#3                             34.20ms     29.24
array_constructor_ROW_nulls##2_null                        25.05ms     39.92
array_constructor_ROW_nulls##2_const                       25.15ms     39.77
```

Reviewed By: laithsakka

Differential Revision: D49272797
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 15, 2023
Summary:
Pull Request resolved: facebookincubator#6566

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

We also optimize FlatVector::copyRanges (which is used by Array/MapVector::copyRanges).

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63

After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.43ms     64.80
array_constructor_ARRAY_nullfree#facebookincubator#2                        24.50ms     40.81
array_constructor_ARRAY_nullfree#facebookincubator#3                        35.12ms     28.47
array_constructor_ARRAY_nullfree##2_null                   54.52ms     18.34
array_constructor_ARRAY_nullfree##2_const                  43.28ms     23.10
array_constructor_ARRAY_nulls#facebookincubator#1                           28.60ms     34.96
array_constructor_ARRAY_nulls#facebookincubator#2                           50.82ms     19.68
array_constructor_ARRAY_nulls#facebookincubator#3                           70.31ms     14.22
array_constructor_ARRAY_nulls##2_null                      64.43ms     15.52
array_constructor_ARRAY_nulls##2_const                     80.71ms     12.39

Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06

After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.49ms    286.59
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.91ms    126.46
array_constructor_INTEGER_nullfree#facebookincubator#3                      11.99ms     83.41
array_constructor_INTEGER_nullfree##2_null                 12.57ms     79.55
array_constructor_INTEGER_nullfree##2_const                11.03ms     90.67
array_constructor_INTEGER_nulls#facebookincubator#1                          4.37ms    228.97
array_constructor_INTEGER_nulls#facebookincubator#2                          9.99ms    100.14
array_constructor_INTEGER_nulls#facebookincubator#3                         14.79ms     67.60
array_constructor_INTEGER_nulls##2_null                    12.21ms     81.92
array_constructor_INTEGER_nulls##2_const                   12.64ms     79.12

Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33

After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.38ms     57.53
array_constructor_MAP_nullfree#facebookincubator#2                          29.41ms     34.00
array_constructor_MAP_nullfree#facebookincubator#3                          38.30ms     26.11
array_constructor_MAP_nullfree##2_null                     58.52ms     17.09
array_constructor_MAP_nullfree##2_const                    48.62ms     20.57
array_constructor_MAP_nulls#facebookincubator#1                             30.60ms     32.68
array_constructor_MAP_nulls#facebookincubator#2                             53.94ms     18.54
array_constructor_MAP_nulls#facebookincubator#3                             86.48ms     11.56
array_constructor_MAP_nulls##2_null                        69.53ms     14.38
array_constructor_MAP_nulls##2_const                       87.56ms     11.42

Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.64ms    177.44
array_constructor_ROW_nullfree#facebookincubator#2                          14.40ms     69.44
array_constructor_ROW_nullfree#facebookincubator#3                          21.46ms     46.59
array_constructor_ROW_nullfree##2_null                     19.14ms     52.26
array_constructor_ROW_nullfree##2_const                    18.60ms     53.77
array_constructor_ROW_nulls#facebookincubator#1                             10.97ms     91.18
array_constructor_ROW_nulls#facebookincubator#2                             18.29ms     54.67
array_constructor_ROW_nulls#facebookincubator#3                             28.57ms     35.01
array_constructor_ROW_nulls##2_null                        25.10ms     39.84
array_constructor_ROW_nulls##2_const                       24.55ms     40.74
```

Differential Revision: D49269500

fbshipit-source-id: 1f5ff279be27a55f72930457991cefec0d39fcf4
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 15, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.80ms     63.30
array_constructor_ARRAY_nullfree#facebookincubator#2                        25.59ms     39.08
array_constructor_ARRAY_nullfree#facebookincubator#3                        34.49ms     28.99
array_constructor_ARRAY_nullfree##2_null                   58.96ms     16.96
array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.53ms     64.39
array_constructor_ARRAY_nullfree#facebookincubator#2                        25.75ms     38.84
array_constructor_ARRAY_nullfree#facebookincubator#3                        34.37ms     29.10
array_constructor_ARRAY_nullfree##2_null                   59.63ms     16.77
array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree#facebookincubator#1                          16.89ms     59.20
array_constructor_MAP_nullfree#facebookincubator#2                          29.24ms     34.20
array_constructor_MAP_nullfree#facebookincubator#3                          41.11ms     24.33
array_constructor_MAP_nullfree##2_null                     61.98ms     16.14
array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.07ms     58.57
array_constructor_MAP_nullfree#facebookincubator#2                          28.39ms     35.23
array_constructor_MAP_nullfree#facebookincubator#3                          42.34ms     23.62
array_constructor_MAP_nullfree##2_null                     65.24ms     15.33
array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500

fbshipit-source-id: 5b58c3942224962691e9b7f114f5134513328b5d
facebook-github-bot pushed a commit that referenced this pull request Sep 15, 2023
Summary:
Pull Request resolved: #6568

array_constructor is very slow: #5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

```
Before:

array_constructor_ARRAY_nullfree##1                        16.80ms     59.53
array_constructor_ARRAY_nullfree##2                        27.02ms     37.01
array_constructor_ARRAY_nullfree##3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls##1                           30.61ms     32.66
array_constructor_ARRAY_nulls##2                           55.01ms     18.18
array_constructor_ARRAY_nulls##3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63

After:

array_constructor_ARRAY_nullfree##1                        15.25ms     65.58
array_constructor_ARRAY_nullfree##2                        25.11ms     39.82
array_constructor_ARRAY_nullfree##3                        34.59ms     28.91
array_constructor_ARRAY_nullfree##2_null                   53.61ms     18.65
array_constructor_ARRAY_nullfree##2_const                  51.48ms     19.42
array_constructor_ARRAY_nulls##1                           29.99ms     33.34
array_constructor_ARRAY_nulls##2                           55.91ms     17.89
array_constructor_ARRAY_nulls##3                           81.73ms     12.24
array_constructor_ARRAY_nulls##2_null                      66.97ms     14.93
array_constructor_ARRAY_nulls##2_const                     92.96ms     10.76

Before:

array_constructor_INTEGER_nullfree##1                      19.72ms     50.71
array_constructor_INTEGER_nullfree##2                      34.51ms     28.97
array_constructor_INTEGER_nullfree##3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls##1                         29.99ms     33.34
array_constructor_INTEGER_nulls##2                         55.32ms     18.08
array_constructor_INTEGER_nulls##3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06

After:

array_constructor_INTEGER_nullfree##1                       3.39ms    294.89
array_constructor_INTEGER_nullfree##2                       7.35ms    136.10
array_constructor_INTEGER_nullfree##3                      10.78ms     92.74
array_constructor_INTEGER_nullfree##2_null                 11.29ms     88.57
array_constructor_INTEGER_nullfree##2_const                10.14ms     98.65
array_constructor_INTEGER_nulls##1                          4.49ms    222.53
array_constructor_INTEGER_nulls##2                          9.78ms    102.29
array_constructor_INTEGER_nulls##3                         14.69ms     68.08
array_constructor_INTEGER_nulls##2_null                    12.14ms     82.36
array_constructor_INTEGER_nulls##2_const                   12.27ms     81.53

Before:

array_constructor_MAP_nullfree##1                          17.34ms     57.65
array_constructor_MAP_nullfree##2                          29.84ms     33.51
array_constructor_MAP_nullfree##3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls##1                             36.22ms     27.61
array_constructor_MAP_nulls##2                             68.18ms     14.67
array_constructor_MAP_nulls##3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33

After:

array_constructor_MAP_nullfree##1                          17.05ms     58.66
array_constructor_MAP_nullfree##2                          28.42ms     35.18
array_constructor_MAP_nullfree##3                          36.96ms     27.06
array_constructor_MAP_nullfree##2_null                     55.64ms     17.97
array_constructor_MAP_nullfree##2_const                    67.53ms     14.81
array_constructor_MAP_nulls##1                             32.91ms     30.39
array_constructor_MAP_nulls##2                             64.50ms     15.50
array_constructor_MAP_nulls##3                             95.71ms     10.45
array_constructor_MAP_nulls##2_null                        77.22ms     12.95
array_constructor_MAP_nulls##2_const                      114.91ms      8.70

Before:

array_constructor_ROW_nullfree##1                          33.88ms     29.52
array_constructor_ROW_nullfree##2                          62.00ms     16.13
array_constructor_ROW_nullfree##3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls##1                             44.11ms     22.67
array_constructor_ROW_nulls##2                            115.43ms      8.66
array_constructor_ROW_nulls##3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree##1                           5.55ms    180.15
array_constructor_ROW_nullfree##2                          12.83ms     77.94
array_constructor_ROW_nullfree##3                          18.89ms     52.95
array_constructor_ROW_nullfree##2_null                     18.74ms     53.36
array_constructor_ROW_nullfree##2_const                    18.16ms     55.07
array_constructor_ROW_nulls##1                             11.29ms     88.61
array_constructor_ROW_nulls##2                             18.57ms     53.86
array_constructor_ROW_nulls##3                             34.20ms     29.24
array_constructor_ROW_nulls##2_null                        25.05ms     39.92
array_constructor_ROW_nulls##2_const                       25.15ms     39.77
```

Reviewed By: laithsakka

Differential Revision: D49272797

fbshipit-source-id: 55d83de7b69c7ae4b72b5a5ae62a7868f36b0e19
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 16, 2023
Summary:

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 16, 2023
Summary:

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 16, 2023
Summary:

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 16, 2023
Summary:

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 18, 2023
Summary:

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 18, 2023
Summary:

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 18, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500

fbshipit-source-id: 8fc304cce31f782aef818e6d8ce052360af8e832
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 18, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Differential Revision: D49269500

fbshipit-source-id: 054dcec94cc3768443440ffb57d58c96f9b1f006
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 18, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 2fdf6f2a717a3bc4bf54fb3d9bd970157a8c415b
mbasmanova added a commit to mbasmanova/velox-1 that referenced this pull request Sep 18, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: b7271a981d33885b2d4bc3219bc15f069bcd06d3
facebook-github-bot pushed a commit that referenced this pull request Sep 19, 2023
Summary:
Pull Request resolved: #6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: #5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls##1                           33.50ms     29.85
array_constructor_ARRAY_nulls##2                           59.05ms     16.93
array_constructor_ARRAY_nulls##3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls##1                           30.51ms     32.78
array_constructor_ARRAY_nulls##2                           55.13ms     18.14
array_constructor_ARRAY_nulls##3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls##1                             37.00ms     27.03
array_constructor_MAP_nulls##2                             67.76ms     14.76
array_constructor_MAP_nulls##3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls##1                             34.34ms     29.12
array_constructor_MAP_nulls##2                             55.23ms     18.11
array_constructor_MAP_nulls##3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 7b702921202f4bb8d10a252eb0ab20f0e5792ae6
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Pull Request resolved: facebookincubator#6568

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63

After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.25ms     65.58
array_constructor_ARRAY_nullfree#facebookincubator#2                        25.11ms     39.82
array_constructor_ARRAY_nullfree#facebookincubator#3                        34.59ms     28.91
array_constructor_ARRAY_nullfree##2_null                   53.61ms     18.65
array_constructor_ARRAY_nullfree##2_const                  51.48ms     19.42
array_constructor_ARRAY_nulls#facebookincubator#1                           29.99ms     33.34
array_constructor_ARRAY_nulls#facebookincubator#2                           55.91ms     17.89
array_constructor_ARRAY_nulls#facebookincubator#3                           81.73ms     12.24
array_constructor_ARRAY_nulls##2_null                      66.97ms     14.93
array_constructor_ARRAY_nulls##2_const                     92.96ms     10.76

Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06

After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.39ms    294.89
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.35ms    136.10
array_constructor_INTEGER_nullfree#facebookincubator#3                      10.78ms     92.74
array_constructor_INTEGER_nullfree##2_null                 11.29ms     88.57
array_constructor_INTEGER_nullfree##2_const                10.14ms     98.65
array_constructor_INTEGER_nulls#facebookincubator#1                          4.49ms    222.53
array_constructor_INTEGER_nulls#facebookincubator#2                          9.78ms    102.29
array_constructor_INTEGER_nulls#facebookincubator#3                         14.69ms     68.08
array_constructor_INTEGER_nulls##2_null                    12.14ms     82.36
array_constructor_INTEGER_nulls##2_const                   12.27ms     81.53

Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33

After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.05ms     58.66
array_constructor_MAP_nullfree#facebookincubator#2                          28.42ms     35.18
array_constructor_MAP_nullfree#facebookincubator#3                          36.96ms     27.06
array_constructor_MAP_nullfree##2_null                     55.64ms     17.97
array_constructor_MAP_nullfree##2_const                    67.53ms     14.81
array_constructor_MAP_nulls#facebookincubator#1                             32.91ms     30.39
array_constructor_MAP_nulls#facebookincubator#2                             64.50ms     15.50
array_constructor_MAP_nulls#facebookincubator#3                             95.71ms     10.45
array_constructor_MAP_nulls##2_null                        77.22ms     12.95
array_constructor_MAP_nulls##2_const                      114.91ms      8.70

Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.55ms    180.15
array_constructor_ROW_nullfree#facebookincubator#2                          12.83ms     77.94
array_constructor_ROW_nullfree#facebookincubator#3                          18.89ms     52.95
array_constructor_ROW_nullfree##2_null                     18.74ms     53.36
array_constructor_ROW_nullfree##2_const                    18.16ms     55.07
array_constructor_ROW_nulls#facebookincubator#1                             11.29ms     88.61
array_constructor_ROW_nulls#facebookincubator#2                             18.57ms     53.86
array_constructor_ROW_nulls#facebookincubator#3                             34.20ms     29.24
array_constructor_ROW_nulls##2_null                        25.05ms     39.92
array_constructor_ROW_nulls##2_const                       25.15ms     39.77
```

Reviewed By: laithsakka

Differential Revision: D49272797

fbshipit-source-id: 55d83de7b69c7ae4b72b5a5ae62a7868f36b0e19
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 7b702921202f4bb8d10a252eb0ab20f0e5792ae6
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Pull Request resolved: facebookincubator#6568

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63

After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.25ms     65.58
array_constructor_ARRAY_nullfree#facebookincubator#2                        25.11ms     39.82
array_constructor_ARRAY_nullfree#facebookincubator#3                        34.59ms     28.91
array_constructor_ARRAY_nullfree##2_null                   53.61ms     18.65
array_constructor_ARRAY_nullfree##2_const                  51.48ms     19.42
array_constructor_ARRAY_nulls#facebookincubator#1                           29.99ms     33.34
array_constructor_ARRAY_nulls#facebookincubator#2                           55.91ms     17.89
array_constructor_ARRAY_nulls#facebookincubator#3                           81.73ms     12.24
array_constructor_ARRAY_nulls##2_null                      66.97ms     14.93
array_constructor_ARRAY_nulls##2_const                     92.96ms     10.76

Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06

After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.39ms    294.89
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.35ms    136.10
array_constructor_INTEGER_nullfree#facebookincubator#3                      10.78ms     92.74
array_constructor_INTEGER_nullfree##2_null                 11.29ms     88.57
array_constructor_INTEGER_nullfree##2_const                10.14ms     98.65
array_constructor_INTEGER_nulls#facebookincubator#1                          4.49ms    222.53
array_constructor_INTEGER_nulls#facebookincubator#2                          9.78ms    102.29
array_constructor_INTEGER_nulls#facebookincubator#3                         14.69ms     68.08
array_constructor_INTEGER_nulls##2_null                    12.14ms     82.36
array_constructor_INTEGER_nulls##2_const                   12.27ms     81.53

Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33

After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.05ms     58.66
array_constructor_MAP_nullfree#facebookincubator#2                          28.42ms     35.18
array_constructor_MAP_nullfree#facebookincubator#3                          36.96ms     27.06
array_constructor_MAP_nullfree##2_null                     55.64ms     17.97
array_constructor_MAP_nullfree##2_const                    67.53ms     14.81
array_constructor_MAP_nulls#facebookincubator#1                             32.91ms     30.39
array_constructor_MAP_nulls#facebookincubator#2                             64.50ms     15.50
array_constructor_MAP_nulls#facebookincubator#3                             95.71ms     10.45
array_constructor_MAP_nulls##2_null                        77.22ms     12.95
array_constructor_MAP_nulls##2_const                      114.91ms      8.70

Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.55ms    180.15
array_constructor_ROW_nullfree#facebookincubator#2                          12.83ms     77.94
array_constructor_ROW_nullfree#facebookincubator#3                          18.89ms     52.95
array_constructor_ROW_nullfree##2_null                     18.74ms     53.36
array_constructor_ROW_nullfree##2_const                    18.16ms     55.07
array_constructor_ROW_nulls#facebookincubator#1                             11.29ms     88.61
array_constructor_ROW_nulls#facebookincubator#2                             18.57ms     53.86
array_constructor_ROW_nulls#facebookincubator#3                             34.20ms     29.24
array_constructor_ROW_nulls##2_null                        25.05ms     39.92
array_constructor_ROW_nulls##2_const                       25.15ms     39.77
```

Reviewed By: laithsakka

Differential Revision: D49272797

fbshipit-source-id: 55d83de7b69c7ae4b72b5a5ae62a7868f36b0e19
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 7b702921202f4bb8d10a252eb0ab20f0e5792ae6
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Pull Request resolved: facebookincubator#6568

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63

After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.25ms     65.58
array_constructor_ARRAY_nullfree#facebookincubator#2                        25.11ms     39.82
array_constructor_ARRAY_nullfree#facebookincubator#3                        34.59ms     28.91
array_constructor_ARRAY_nullfree##2_null                   53.61ms     18.65
array_constructor_ARRAY_nullfree##2_const                  51.48ms     19.42
array_constructor_ARRAY_nulls#facebookincubator#1                           29.99ms     33.34
array_constructor_ARRAY_nulls#facebookincubator#2                           55.91ms     17.89
array_constructor_ARRAY_nulls#facebookincubator#3                           81.73ms     12.24
array_constructor_ARRAY_nulls##2_null                      66.97ms     14.93
array_constructor_ARRAY_nulls##2_const                     92.96ms     10.76

Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06

After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.39ms    294.89
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.35ms    136.10
array_constructor_INTEGER_nullfree#facebookincubator#3                      10.78ms     92.74
array_constructor_INTEGER_nullfree##2_null                 11.29ms     88.57
array_constructor_INTEGER_nullfree##2_const                10.14ms     98.65
array_constructor_INTEGER_nulls#facebookincubator#1                          4.49ms    222.53
array_constructor_INTEGER_nulls#facebookincubator#2                          9.78ms    102.29
array_constructor_INTEGER_nulls#facebookincubator#3                         14.69ms     68.08
array_constructor_INTEGER_nulls##2_null                    12.14ms     82.36
array_constructor_INTEGER_nulls##2_const                   12.27ms     81.53

Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33

After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.05ms     58.66
array_constructor_MAP_nullfree#facebookincubator#2                          28.42ms     35.18
array_constructor_MAP_nullfree#facebookincubator#3                          36.96ms     27.06
array_constructor_MAP_nullfree##2_null                     55.64ms     17.97
array_constructor_MAP_nullfree##2_const                    67.53ms     14.81
array_constructor_MAP_nulls#facebookincubator#1                             32.91ms     30.39
array_constructor_MAP_nulls#facebookincubator#2                             64.50ms     15.50
array_constructor_MAP_nulls#facebookincubator#3                             95.71ms     10.45
array_constructor_MAP_nulls##2_null                        77.22ms     12.95
array_constructor_MAP_nulls##2_const                      114.91ms      8.70

Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.55ms    180.15
array_constructor_ROW_nullfree#facebookincubator#2                          12.83ms     77.94
array_constructor_ROW_nullfree#facebookincubator#3                          18.89ms     52.95
array_constructor_ROW_nullfree##2_null                     18.74ms     53.36
array_constructor_ROW_nullfree##2_const                    18.16ms     55.07
array_constructor_ROW_nulls#facebookincubator#1                             11.29ms     88.61
array_constructor_ROW_nulls#facebookincubator#2                             18.57ms     53.86
array_constructor_ROW_nulls#facebookincubator#3                             34.20ms     29.24
array_constructor_ROW_nulls##2_null                        25.05ms     39.92
array_constructor_ROW_nulls##2_const                       25.15ms     39.77
```

Reviewed By: laithsakka

Differential Revision: D49272797

fbshipit-source-id: 55d83de7b69c7ae4b72b5a5ae62a7868f36b0e19
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 7b702921202f4bb8d10a252eb0ab20f0e5792ae6
ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
Summary:
Pull Request resolved: facebookincubator#6568

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

```
Before:

array_constructor_ARRAY_nullfree#facebookincubator#1                        16.80ms     59.53
array_constructor_ARRAY_nullfree#facebookincubator#2                        27.02ms     37.01
array_constructor_ARRAY_nullfree#facebookincubator#3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls#facebookincubator#1                           30.61ms     32.66
array_constructor_ARRAY_nulls#facebookincubator#2                           55.01ms     18.18
array_constructor_ARRAY_nulls#facebookincubator#3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63

After:

array_constructor_ARRAY_nullfree#facebookincubator#1                        15.25ms     65.58
array_constructor_ARRAY_nullfree#facebookincubator#2                        25.11ms     39.82
array_constructor_ARRAY_nullfree#facebookincubator#3                        34.59ms     28.91
array_constructor_ARRAY_nullfree##2_null                   53.61ms     18.65
array_constructor_ARRAY_nullfree##2_const                  51.48ms     19.42
array_constructor_ARRAY_nulls#facebookincubator#1                           29.99ms     33.34
array_constructor_ARRAY_nulls#facebookincubator#2                           55.91ms     17.89
array_constructor_ARRAY_nulls#facebookincubator#3                           81.73ms     12.24
array_constructor_ARRAY_nulls##2_null                      66.97ms     14.93
array_constructor_ARRAY_nulls##2_const                     92.96ms     10.76

Before:

array_constructor_INTEGER_nullfree#facebookincubator#1                      19.72ms     50.71
array_constructor_INTEGER_nullfree#facebookincubator#2                      34.51ms     28.97
array_constructor_INTEGER_nullfree#facebookincubator#3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls#facebookincubator#1                         29.99ms     33.34
array_constructor_INTEGER_nulls#facebookincubator#2                         55.32ms     18.08
array_constructor_INTEGER_nulls#facebookincubator#3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06

After:

array_constructor_INTEGER_nullfree#facebookincubator#1                       3.39ms    294.89
array_constructor_INTEGER_nullfree#facebookincubator#2                       7.35ms    136.10
array_constructor_INTEGER_nullfree#facebookincubator#3                      10.78ms     92.74
array_constructor_INTEGER_nullfree##2_null                 11.29ms     88.57
array_constructor_INTEGER_nullfree##2_const                10.14ms     98.65
array_constructor_INTEGER_nulls#facebookincubator#1                          4.49ms    222.53
array_constructor_INTEGER_nulls#facebookincubator#2                          9.78ms    102.29
array_constructor_INTEGER_nulls#facebookincubator#3                         14.69ms     68.08
array_constructor_INTEGER_nulls##2_null                    12.14ms     82.36
array_constructor_INTEGER_nulls##2_const                   12.27ms     81.53

Before:

array_constructor_MAP_nullfree#facebookincubator#1                          17.34ms     57.65
array_constructor_MAP_nullfree#facebookincubator#2                          29.84ms     33.51
array_constructor_MAP_nullfree#facebookincubator#3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls#facebookincubator#1                             36.22ms     27.61
array_constructor_MAP_nulls#facebookincubator#2                             68.18ms     14.67
array_constructor_MAP_nulls#facebookincubator#3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33

After:

array_constructor_MAP_nullfree#facebookincubator#1                          17.05ms     58.66
array_constructor_MAP_nullfree#facebookincubator#2                          28.42ms     35.18
array_constructor_MAP_nullfree#facebookincubator#3                          36.96ms     27.06
array_constructor_MAP_nullfree##2_null                     55.64ms     17.97
array_constructor_MAP_nullfree##2_const                    67.53ms     14.81
array_constructor_MAP_nulls#facebookincubator#1                             32.91ms     30.39
array_constructor_MAP_nulls#facebookincubator#2                             64.50ms     15.50
array_constructor_MAP_nulls#facebookincubator#3                             95.71ms     10.45
array_constructor_MAP_nulls##2_null                        77.22ms     12.95
array_constructor_MAP_nulls##2_const                      114.91ms      8.70

Before:

array_constructor_ROW_nullfree#facebookincubator#1                          33.88ms     29.52
array_constructor_ROW_nullfree#facebookincubator#2                          62.00ms     16.13
array_constructor_ROW_nullfree#facebookincubator#3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls#facebookincubator#1                             44.11ms     22.67
array_constructor_ROW_nulls#facebookincubator#2                            115.43ms      8.66
array_constructor_ROW_nulls#facebookincubator#3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree#facebookincubator#1                           5.55ms    180.15
array_constructor_ROW_nullfree#facebookincubator#2                          12.83ms     77.94
array_constructor_ROW_nullfree#facebookincubator#3                          18.89ms     52.95
array_constructor_ROW_nullfree##2_null                     18.74ms     53.36
array_constructor_ROW_nullfree##2_const                    18.16ms     55.07
array_constructor_ROW_nulls#facebookincubator#1                             11.29ms     88.61
array_constructor_ROW_nulls#facebookincubator#2                             18.57ms     53.86
array_constructor_ROW_nulls#facebookincubator#3                             34.20ms     29.24
array_constructor_ROW_nulls##2_null                        25.05ms     39.92
array_constructor_ROW_nulls##2_const                       25.15ms     39.77
```

Reviewed By: laithsakka

Differential Revision: D49272797

fbshipit-source-id: 55d83de7b69c7ae4b72b5a5ae62a7868f36b0e19
ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls#facebookincubator#1                           33.50ms     29.85
array_constructor_ARRAY_nulls#facebookincubator#2                           59.05ms     16.93
array_constructor_ARRAY_nulls#facebookincubator#3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls#facebookincubator#1                           30.51ms     32.78
array_constructor_ARRAY_nulls#facebookincubator#2                           55.13ms     18.14
array_constructor_ARRAY_nulls#facebookincubator#3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls#facebookincubator#1                             37.00ms     27.03
array_constructor_MAP_nulls#facebookincubator#2                             67.76ms     14.76
array_constructor_MAP_nulls#facebookincubator#3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls#facebookincubator#1                             34.34ms     29.12
array_constructor_MAP_nulls#facebookincubator#2                             55.23ms     18.11
array_constructor_MAP_nulls#facebookincubator#3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 7b702921202f4bb8d10a252eb0ab20f0e5792ae6
laithsakka added a commit to laithsakka/velox that referenced this pull request Oct 12, 2023
Summary:
```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#1               71.48ms     13.99
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#2               76.58ms     13.06
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#3               85.31ms     11.72
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#4              121.56ms      8.23
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#1                      27.19ms     36.78
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#2                      33.10ms     30.21
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#3                      33.47ms     29.88
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#4                      31.70ms     31.55
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#1                      26.92ms     37.14
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#2                      36.62ms     27.31
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#3                      34.19ms     29.24
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#4                      33.76ms     29.62
```

Differential Revision: D50237919
laithsakka added a commit to laithsakka/velox that referenced this pull request Oct 12, 2023
Summary:

```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#1               71.48ms     13.99
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#2               76.58ms     13.06
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#3               85.31ms     11.72
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#4              121.56ms      8.23
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#1                      27.19ms     36.78
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#2                      33.10ms     30.21
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#3                      33.47ms     29.88
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#4                      31.70ms     31.55
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#1                      26.92ms     37.14
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#2                      36.62ms     27.31
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#3                      34.19ms     29.24
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#4                      33.76ms     29.62
```

Differential Revision: D50237919
laithsakka added a commit to laithsakka/velox that referenced this pull request Oct 12, 2023
Summary:

```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#1               71.48ms     13.99
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#2               76.58ms     13.06
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#3               85.31ms     11.72
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#4              121.56ms      8.23
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#1                      27.19ms     36.78
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#2                      33.10ms     30.21
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#3                      33.47ms     29.88
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#4                      31.70ms     31.55
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#1                      26.92ms     37.14
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#2                      36.62ms     27.31
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#3                      34.19ms     29.24
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#4                      33.76ms     29.62
```

Differential Revision: D50237919
laithsakka added a commit to laithsakka/velox that referenced this pull request Oct 12, 2023
Summary:

```
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#1               71.48ms     13.99
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#2               76.58ms     13.06
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#3               85.31ms     11.72
map_subscript_MAP<ARRAY<VARCHAR>,INTEGER>#facebookincubator#4              121.56ms      8.23
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#1                      27.19ms     36.78
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#2                      33.10ms     30.21
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#3                      33.47ms     29.88
map_subscript_MAP<INTEGER,INTEGER>#facebookincubator#4                      31.70ms     31.55
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#1                      26.92ms     37.14
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#2                      36.62ms     27.31
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#3                      34.19ms     29.24
map_subscript_MAP<VARCHAR,INTEGER>#facebookincubator#4                      33.76ms     29.62
```

Differential Revision: D50237919
facebook-github-bot pushed a commit that referenced this pull request Aug 23, 2024
Summary:
The default algorithm used is MD5. However, MD5 is not supported with
fips and can cause a SIGSEGV. Set CRC32 instead which is a standard for
checksum computation and is not restricted by fips.
crc32 is also faster than md5.

Internally at IBM, we hit the following SIGSEGV

```
0x0000000000000000 in ?? ()
Missing separate debuginfos, use: dnf debuginfo-install openssl-fips-provider-3.0.7-2.el9.x86_64 xz-libs-5.2.5-8.el9_0.x86_64
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000000004e5f89b in Aws::Utils::Crypto::MD5OpenSSLImpl::Calculate(std::istream&) ()
#2  0x0000000004efd298 in Aws::Utils::Crypto::MD5::Calculate(std::istream&) ()
#3  0x0000000004ef71b9 in Aws::Utils::HashingUtils::CalculateMD5(std::iostream&) ()
#4  0x0000000004e8ebe8 in Aws::Client::AWSClient::AddChecksumToRequest(std::shared_ptr<Aws::Http::HttpRequest> const&, Aws::AmazonWebServiceRequest const&) const ()
#5  0x0000000004e8ed15 in Aws::Client::AWSClient::BuildHttpRequest(Aws::AmazonWebServiceRequest const&, std::shared_ptr<Aws::Http::HttpRequest> const&) const ()
#6  0x0000000004e977f9 in Aws::Client::AWSClient::AttemptOneRequest(std::shared_ptr<Aws::Http::HttpRequest> const&, Aws::AmazonWebServiceRequest const&, char const*, char const*, char const*) const ()
#7  0x0000000004e9e1c0 in Aws::Client::AWSClient::AttemptExhaustively(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*, char const*, char const*) const ()
#8  0x0000000004ea15e8 in Aws::Client::AWSXMLClient::MakeRequest(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*, char const*, char const*) const ()
#9  0x0000000004ea1f70 in Aws::Client::AWSXMLClient::MakeRequest(Aws::AmazonWebServiceRequest const&, Aws::Endpoint::AWSEndpoint const&, Aws::Http::HttpMethod, char const*, char const*, char const*) const ()
#10 0x0000000004de0933 in Aws::S3::S3Client::UploadPart(Aws::S3::Model::UploadPartRequest const&) const::{lambda()https://github.com/facebookincubator/velox/issues/1}::operator()() const ()
#11 0x0000000004de0b8c in std::_Function_handler<Aws::Utils::Outcome<Aws::S3::Model::UploadPartResult, Aws::S3::S3Error> (), Aws::S3::S3Client::UploadPart(Aws::S3::Model::UploadPartRequest const&) const::{lambda()https://github.com/facebookincubator/velox/issues/1}>::_M_invoke(std::_Any_data const&) ()
#12 0x0000000004e19317 in Aws::Utils::Outcome<Aws::S3::Model::UploadPartResult, Aws::S3::S3Error> smithy::components::tracing::TracingUtils::MakeCallWithTiming<Aws::Utils::Outcome<Aws::S3::Model::UploadPartResult, Aws::S3::S3Error> >(std::function<Aws::Utils::Outcome<Aws::S3::Model::UploadPartResult, Aws::S3::S3Error> ()>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, smithy::components::tracing::Meter const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#13 0x0000000004d7cdcf in Aws::S3::S3Client::UploadPart(Aws::S3::Model::UploadPartRequest const&) const ()
#14 0x0000000004ca4aa6 in facebook::velox::filesystems::S3WriteFile::Impl::uploadPart (this=0x7fffec2f09a0, part=..., isLast=true)
    at /root/velox/velox/connectors/hive/storage_adapters/s3fs/S3FileSystem.cpp:380
```

Pull Request resolved: #10801

Reviewed By: amitkdutta

Differential Revision: D61671574

Pulled By: kgpai

fbshipit-source-id: 34c7b777b3fde0659ef74c4fbfd93740fdfa3f7c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants