Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ArrayType and explode function #25

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nightscape
Copy link

Not sure if I understood everything correctly, I was mostly applying monkey see, monkey do 😉

@@ -86,6 +86,14 @@ object Encoder:
type ColumnType = DoubleOptType
def catalystType = sql.types.DoubleType

inline given arrayFromMirror[A](using encoder: Encoder[A]): (Encoder[Seq[A]] { type ColumnType = ArrayOptType[encoder.ColumnType] }) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding basic support for arrays is something that probably deserves a separate PR on its own. As it's slightly more complex: we should reuse encoders of element types and support both nullable and nonnullable arrays. I have some drafts of the implementation mixed with other changes locally but I'll try to extract it and get merged to main

import org.virtuslab.iskra.api.*
import functions.explode

case class Foo(ints: Seq[Int])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider more cases here: Not only Seq[Int] but also Seq[Option[Int]], Option[Seq[Int]] and Option[Seq[Option[Int]]] and check how these should behave at compile time and at runtime

Foo(Seq(1)),
Foo(Seq(2)),
Foo(Seq()),
Foo(null),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For maximal type safety optional values should be represented by Option[...]. TBH I haven't thought about how to prevent users from using nulls explicitly yet. Maybe -Yexplicit-nulls could come to the rescue. Alternatively we could have some runtime assertions performed when toTypedDF is called`. However both these things would probably have to be opt-in

import org.virtuslab.iskra.Column
import org.virtuslab.iskra.types.{ ArrayOptType, DataType }

def explode[T <: DataType](c: Column[ArrayOptType[T]]): Column[T] = Column(sql.functions.explode(c.untyped))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a way to prevent users from using explode more than once in the same select clause as that would result in a runtime error. This constraint doesn't seem to be easy to express in the current model of iskra. However I'm in the middle of a major redesign of the library's model so I'll try to take this use case into account

@prolativ
Copy link
Collaborator

@nightscape thanks for your contribution! I'm afraid your changes can't be incorporated into the main branch at the moment because of the reasons mentioned in the comments but I'll try to find some time to take care of them to unblock you

@nightscape
Copy link
Author

@prolativ I hadn't seen your iskra-next branch. I'll have a look at that.
Basically the Array-exploding and StructType expansion using .* is what I'm currently missing to write a PoC using Iskra.
If I can assist with anything let me know!
I'll try to familiarize myself with the iskra-next branch in the mean time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants