Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for null values #67

Merged

Conversation

osopardo1
Copy link
Member

@osopardo1 osopardo1 commented Jan 25, 2022

This PR shows the current status on the solution for issue #10

The theory is to substitute/map null values in the indexed columns transformation with a Random number in the [min, max] range.

  • New API for LinearTransformation and HashTransformation case classes to include nullValue as part of the contained information
  • New API for the Transformer which includes an optionalNullValue. If none is given, then a random is calculated internally with the min/max values
  • New API for the user (optional):
df.write.format("qbeast").option("columnsToIndex", "column:type(nullValue),column2:type(nullValue2)...")
  • Adaptation of the tests to this new functionality

…exed-columns-support

# Conflicts:
#	core/src/main/scala/io/qbeast/core/transform/HashTransformer.scala
#	core/src/main/scala/io/qbeast/core/transform/LinearTransformer.scala
#	core/src/main/scala/io/qbeast/core/transform/Transformer.scala
#	core/src/test/scala/io/qbeast/core/transform/TransformerTest.scala
# Conflicts:
#	src/test/scala/io/qbeast/spark/index/SparkPointWeightIndexerTest.scala
@osopardo1 osopardo1 marked this pull request as ready for review February 14, 2022 09:29
@osopardo1 osopardo1 requested review from cugni and eavilaes and removed request for cugni February 14, 2022 09:30
Copy link
Member

@cugni cugni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still missing the filtering push down for the null values, meaning that a query with x=null, should be converted to x>=randomNullValue and x<randomNullValue and executed using the index.

…ith all null values

Pending to change the problem of having all values at null
Copy link
Contributor

@eavilaes eavilaes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, but generally looks good! :)

@osopardo1 osopardo1 requested review from cugni and eavilaes February 18, 2022 15:22
@eavilaes
Copy link
Contributor

eavilaes commented Mar 3, 2022

I've seen some typos on the new AdvancedConfiguration.md file, you could maybe check it with Grammarly! :)

@osopardo1
Copy link
Member Author

I've seen some typos on the new AdvancedConfiguration.md file, you could maybe check it with Grammarly! :)

Thank you, fixed!

@osopardo1
Copy link
Member Author

Hey! I think is ready to merge, changes are addressed and it's compatible with the current version in the main 👍

Copy link
Contributor

@eavilaes eavilaes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Great job :)

@osopardo1 osopardo1 requested a review from cugni March 16, 2022 09:16
Copy link
Member

@cugni cugni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect 👌🏻

@osopardo1 osopardo1 merged commit c3b23ea into Qbeast-io:main Mar 16, 2022
@osopardo1 osopardo1 deleted the 10-null-values-on-indexed-columns-support branch April 22, 2022 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants