You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I implemented TextFile Reader based on velox internally. In the implementation, I need to convert the read string data into the corresponding type of data. For simplicity, I use a lot of the cast methods implemented in Velox's velox/type/Conversions.h, The implementation is roughly as follows:
When benchmarking TextFile Reader, I found a performance problem. I made a flame graph and found that it took a lot of time(28.65%) to convert the read string into double/float.
I rewrote this part of the code using the fast_float library and found that the performance improved.
At the same time, I tested the cast(string as double/float) before and after the modification in velox/velox/benchmarks/basic/CastBenchmark.cpp, and it was also verified.
The fast_float library is used by apache/arrow#8494 where it multiplied the number parsing speed by two or three times. It is also used by ClickHouse, DuckDB, starrocks, and Google Jsonnet. It is part of GCC (as of GCC 12). It is part of WebKit (Safari). I want to contribute this part of the code. I wonder what the community thinks about this?
wypb
changed the title
performance issues when executing cast(string as double/float)
Performance issues when executing cast(string as double/float)
Mar 18, 2024
mbasmanova
changed the title
Performance issues when executing cast(string as double/float)
Optimize cast(string as double/float) by switching to fast_float library
Mar 18, 2024
Description
I implemented TextFile Reader based on velox internally. In the implementation, I need to convert the read string data into the corresponding type of data. For simplicity, I use a lot of the cast methods implemented in Velox's
velox/type/Conversions.h
, The implementation is roughly as follows:When benchmarking TextFile Reader, I found a performance problem. I made a flame graph and found that it took a lot of time(28.65%) to convert the read string into double/float.
I rewrote this part of the code using the fast_float library and found that the performance improved.
At the same time, I tested the cast(string as double/float) before and after the modification in
velox/velox/benchmarks/basic/CastBenchmark.cpp
, and it was also verified.BEFORE:
AFTER:
The fast_float library is used by apache/arrow#8494 where it multiplied the number parsing speed by two or three times. It is also used by ClickHouse, DuckDB, starrocks, and Google Jsonnet. It is part of GCC (as of GCC 12). It is part of WebKit (Safari). I want to contribute this part of the code. I wonder what the community thinks about this?
CC: @mbasmanova
The text was updated successfully, but these errors were encountered: