Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add binary string operations (length and concatenation) #3646

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

f4t4nt
Copy link
Contributor

@f4t4nt f4t4nt commented Jan 7, 2025

Added two new binary string operations to Daft:\n\n- binary.length(): Get length of binary strings in bytes\n- binary.concat(): Concatenate two binary strings\n\nImplemented in Rust with full test coverage including ASCII, UTF-8, and special binary sequences.

@f4t4nt f4t4nt changed the title Add Binary String Operations: Length and Concatenation feat: add binary string operations (length and concatenation) Jan 7, 2025
@github-actions github-actions bot added the feat label Jan 7, 2025
@kevinzwang kevinzwang self-requested a review January 7, 2025 18:58
@f4t4nt f4t4nt force-pushed the nishant-binary-string branch from 18b8ab2 to 3e0dae6 Compare January 7, 2025 19:01
Copy link

codspeed-hq bot commented Jan 7, 2025

CodSpeed Performance Report

Merging #3646 will improve performances by 49.11%

Comparing nishant-binary-string (4e4b23f) with main (c650794)

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

Benchmark main nishant-binary-string Change
test_show[100 Small Files] 23.6 ms 15.8 ms +49.11%

Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far!

src/daft-core/src/array/ops/binary.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/binary.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/binary.rs Show resolved Hide resolved
Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 86.82171% with 51 lines in your changes missing coverage. Please review.

Project coverage is 77.88%. Comparing base (39bb62c) to head (4e4b23f).
Report is 27 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-functions/src/binary/slice.rs 67.34% 16 Missing ⚠️
src/daft-functions/src/binary/concat.rs 84.21% 12 Missing ⚠️
src/daft-core/src/array/ops/binary.rs 95.47% 9 Missing ⚠️
src/daft-core/src/series/ops/binary.rs 46.66% 8 Missing ⚠️
src/daft-functions/src/binary/length.rs 80.64% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3646      +/-   ##
==========================================
- Coverage   77.99%   77.88%   -0.11%     
==========================================
  Files         720      737      +17     
  Lines       88794    90984    +2190     
==========================================
+ Hits        69252    70866    +1614     
- Misses      19542    20118     +576     
Files with missing lines Coverage Δ
daft/expressions/expressions.py 93.64% <100.00%> (+0.13%) ⬆️
src/daft-core/src/series/ops/mod.rs 100.00% <ø> (ø)
src/daft-functions/src/lib.rs 0.00% <ø> (ø)
src/daft-functions/src/python/mod.rs 100.00% <100.00%> (ø)
src/daft-functions/src/binary/length.rs 80.64% <80.64%> (ø)
src/daft-core/src/series/ops/binary.rs 46.66% <46.66%> (ø)
src/daft-core/src/array/ops/binary.rs 95.47% <95.47%> (ø)
src/daft-functions/src/binary/concat.rs 84.21% <84.21%> (ø)
src/daft-functions/src/binary/slice.rs 67.34% <67.34%> (ø)

... and 300 files with indirect coverage changes

@f4t4nt f4t4nt force-pushed the nishant-binary-string branch from 6338308 to cf73ba2 Compare January 7, 2025 19:52
@f4t4nt f4t4nt force-pushed the nishant-binary-string branch from 2deddac to 1c26a63 Compare January 8, 2025 21:54
@f4t4nt f4t4nt requested a review from kevinzwang January 9, 2025 01:01
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! In addition to the other comments there's two things you should also do:

  • add these functions to our docs by adding a section in docs/source/api_docs/expressions.rst
  • implement this for the FixedSizeBinaryArray type as well

src/daft-core/src/series/ops/binary.rs Outdated Show resolved Hide resolved
src/daft-functions/src/python/binary.rs Outdated Show resolved Hide resolved
tests/table/utf8/test_concat.py Outdated Show resolved Hide resolved
@f4t4nt f4t4nt requested a review from kevinzwang January 14, 2025 18:28
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! I had some comments on things to clean up.

daft/expressions/expressions.py Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
src/daft-core/src/array/ops/binary.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/binary.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/binary.rs Outdated Show resolved Hide resolved
src/daft-functions/src/binary/concat.rs Show resolved Hide resolved
src/daft-functions/src/binary/length.rs Show resolved Hide resolved
src/daft-functions/src/binary/slice.rs Show resolved Hide resolved
tests/table/binary/test_fixed_size_binary_concat.py Outdated Show resolved Hide resolved
tests/table/binary/test_concat.py Outdated Show resolved Hide resolved
@f4t4nt f4t4nt linked an issue Jan 15, 2025 that may be closed by this pull request
3 tasks
@f4t4nt f4t4nt requested a review from kevinzwang January 16, 2025 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binary string operations
2 participants