Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for BitwiseAnd &, BitOr | binary operators #1619

Closed
FauxFaux opened this issue Jan 20, 2022 · 10 comments
Closed

Support for BitwiseAnd &, BitOr | binary operators #1619

FauxFaux opened this issue Jan 20, 2022 · 10 comments
Labels
datafusion Changes in the datafusion crate enhancement New feature or request

Comments

@FauxFaux
Copy link

FauxFaux commented Jan 20, 2022

My parquet file contains a bitpacked flags (i32, but pretend it's u32). The 2nd bit in flags indicates whether the record is a potato or not. I would like to be able to access this from SQL.

Describe the solution you'd like
I think, in SQL, this would look like:

select sum(value) from pq where ((flags & 0b10) = 0b10);

I'd also be happy with it having a boolean output:

select sum(value) from pq where (flags & 0b10);

These operators (but not the binary literals) parse, but cannot be planned:

NotImplemented("Unsupported SQL binary operator BitwiseAnd")

This makes & a regular binary operator, like % or even +, although I wouldn't implement it for Decimal, or even signed integers, to start?

In my application I am going to be checking multiple flags, so any syntax which could be optimised would be great for me.

Postgres produces the integer output, instead of boolean.

Describe alternatives you've considered
You can fake this, to some extent, with (integer) / and %2.

You can (outside of the CLI, my favourite place to run SQL) register custom functions, like is_potato(flags).

I have also considered a custom function like bits_set(flags, "1001") -> bool? This is valuable to readability as there's no support for the 0b1001 binary literal syntax in the SQL at the moment, but its performance is bad, 'cos you end up with a long list of strings passed to your function.

@FauxFaux FauxFaux added the enhancement New feature or request label Jan 20, 2022
@liukun4515
Copy link
Contributor

& Binary AND Operator copies a bit to the result if it exists in both operands. (A & B) will give 12 which is 0000 1100
| Binary OR Operator copies a bit if it exists in either operand. (A | B) will give 61 which is 0011 1101
~ Binary Ones Complement Operator is unary and has the effect of 'flipping' bits. (~A ) will give -61 which is 1100 0011 in 2's complement form due to a signed binary number.
<< Binary Left Shift Operator. The left operands value is moved left by the number of bits specified by the right operand. A << 2 will give 240 which is 1111 0000
>> Binary Right Shift Operator. The left operands value is moved right by the number of bits specified by the right operand. A >> 2 will give 15 which is 0000 1111
# bitwise XOR. A # B will give 49 which is 00110001

I have investigated the bitwise operations in the PG database.
You can follow the PG document to implement this.
@FauxFaux

@liukun4515
Copy link
Contributor

If you want to take this task, it's better to give a plan for �implementation.
@FauxFaux

@alamb
Copy link
Contributor

alamb commented Jan 21, 2022

I think this would be a fairly straightforward task of adding Operator::BitWiseAnd, and then plumbing it through the planner, starting here

https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/sql/planner.rs#L1252

@alamb alamb added the datafusion Changes in the datafusion crate label Jan 21, 2022
@alamb
Copy link
Contributor

alamb commented Jan 21, 2022

This is a good project for anyone who wants to learn how operators are implemented in DataFusion

@liukun4515
Copy link
Contributor

I can take some tasks of this issue if you have a plan to implement it.

@alamb
Copy link
Contributor

alamb commented Sep 9, 2022

The only bitwise operator left to support is # -- I filed #3420 to track that and will close this one to make the current state of DataFusion clearer.

Thanks all!

@RTEnzyme
Copy link
Contributor

Binary Ones Complement Operator is unary and has the effect of 'flipping' bits.

I am curious if the unary operator ~ has been implemented . I was unable to find any relevant implementation.

@alamb
Copy link
Contributor

alamb commented Mar 22, 2023

I am curious if the unary operator ~ has been implemented . I was unable to find any relevant implementation.

I would say no

DataFusion CLI v20.0.0
❯ select ~0; 🤔 Invalid statement: sql parser error: Expected an expression:, found: ~

@RTEnzyme
Copy link
Contributor

I believe adding the bitwise ~ operator would be beneficial for the completeness of bitwise operations. Therefore, I would like to request the implementation of this operator.

Could you please provide some guidance on how I could proceed with implementing this operator? Any help or suggestions would be greatly appreciated.

@alamb
Copy link
Contributor

alamb commented Mar 23, 2023

I believe adding the bitwise ~ operator would be beneficial for the completeness of bitwise operations. Therefore, I would like to request the implementation of this operator.

Thank you -- that sounds great. I will comment on #5700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants