Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize cast(string as double/float) by switching to fast_float library #9119

Open
wypb opened this issue Mar 18, 2024 · 3 comments
Open

Optimize cast(string as double/float) by switching to fast_float library #9119

wypb opened this issue Mar 18, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@wypb
Copy link
Contributor

wypb commented Mar 18, 2024

Description

I implemented TextFile Reader based on velox internally. In the implementation, I need to convert the read string data into the corresponding type of data. For simplicity, I use a lot of the cast methods implemented in Velox's velox/type/Conversions.h, The implementation is roughly as follows:

  template <TypeKind Kind>
  inline void decodeColumn(
      velox::VectorPtr& result,
      vector_size_t row,
      const TypePtr& type,
      const std::string& columnData,
      int32_t /*depth*/) {
    using T = typename TypeTraits<Kind>::NativeType;
    auto flatVector = result->as<FlatVector<T>>();
    auto mutableRawValues = flatVector->mutableRawValues();

    mutableRawValues[row] = util::Converter<
        Kind,
        void,
        /*Truncate*/ false,
        /*LegacyCast*/ false>::cast(columnData);
  }
};

When benchmarking TextFile Reader, I found a performance problem. I made a flame graph and found that it took a lot of time(28.65%) to convert the read string into double/float.

image

I rewrote this part of the code using the fast_float library and found that the performance improved.

At the same time, I tested the cast(string as double/float) before and after the modification in velox/velox/benchmarks/basic/CastBenchmark.cpp, and it was also verified.

BEFORE:

/Users/wypb/data/code/apache/velox/_build/release/velox/benchmarks/basic/velox_cast_benchmark
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
cast##cast_string_as_double                                54.15ms     18.47
cast##cast_string_as_float                                 52.68ms     18.98

AFTER:

/Users/wypb/data/code/apache/velox/_build/release/velox/benchmarks/basic/velox_cast_benchmark
============================================================================
[...]hmarks/ExpressionBenchmarkBuilder.cpp     relative  time/iter   iters/s
============================================================================
cast##cast_string_as_double                                 4.95ms    201.98
cast##cast_string_as_float                                  5.00ms    200.15

The fast_float library is used by apache/arrow#8494 where it multiplied the number parsing speed by two or three times. It is also used by ClickHouse, DuckDB, starrocks, and Google Jsonnet. It is part of GCC (as of GCC 12). It is part of WebKit (Safari). I want to contribute this part of the code. I wonder what the community thinks about this?

CC: @mbasmanova

@wypb wypb added the enhancement New feature or request label Mar 18, 2024
@wypb wypb changed the title performance issues when executing cast(string as double/float) Performance issues when executing cast(string as double/float) Mar 18, 2024
@mbasmanova
Copy link
Contributor

@wypb This is a good catch. It would be nice to switch to fast_float library.

CC: @kagamiori @kgpai @gggrace14 @amitkdutta

@mbasmanova mbasmanova changed the title Performance issues when executing cast(string as double/float) Optimize cast(string as double/float) by switching to fast_float library Mar 18, 2024
@mbasmanova
Copy link
Contributor

CC: @rui-mo @FelixYBW

@wypb
Copy link
Contributor Author

wypb commented Mar 18, 2024

@mbasmanova Thank you for your reply, I will contribute some internal implementation to the community today or tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants