Cannot use colMany to access anything wrapped in Option #168

palmerlao · 2017-08-31T01:29:53Z

My understanding is that Option should be used to represent columns that one might mark nullable in vanilla Spark. I tried something along the lines of the following:

case class B(i: Int)
case class A(ob: Option[B], b: B)

implicit val sqlc = spark.sqlContext
val as = TypedDataset.create(Seq(
  A(Some(B(1)), B(2)),
  A(None, B(3))))

as.colMany('b, 'i) // works
as.colMany('ob, 'i) // runs into the option and can't find i as a field in it?

The last line resulted in

<console>:27: error: No columns shapeless.::[shapeless.tag.@@[Symbol,String("ob")],shapeless.::[shapeless.tag.@@[Symbol,String("i")],shapeless.HNil]] of type Out in A

What I think is reasonable is to return something of type TypedColumn[A, Option[Int]]. For comparison, in regular Spark:

scala> as.dataset.select("ob.i").show
+----+
|   _1|
+----+
|   1|
|null|
+----+

which I would expect as.select(as.colMany('ob, 'i).show().run to be roughly equivalent to up to some decisions on whether to display null or None.
Perhaps a reasonable way to approach this problem is to integrate the column selection mechanism with some kind of optics.

The text was updated successfully, but these errors were encountered:

imarios · 2017-09-01T02:41:42Z

I have an alternative proposal here.

What if we have a way to flatten schema's with Optional fields.

Example:

case class O(x: Option[Int], y: Option[Int])

val df: TypedDataset[O] = ...
val dfFlat = df.flatten : TypedDataset[(Int,Int)]

What happens here is that df may have any of its fields null (hence the Option type). When we do the flatten, it essentially filter outs any null values and goes from Option[A] to A. It will do that for any optional fields.

@palmerlao Do you think this would solve your use case?

palmerlao · 2017-09-01T04:21:59Z

Let me know if this is what you mean. In your example, say that df was made by
TypedDataset.create(Seq((Some(1), Some(2)), (None, Some(4)), (None, None))).
Then your proposed flatten would give me dfFlat.collect.run something along the lines of
Seq((1, 2))?
It would be tough to make that work for my use case. Unfortunately, we work with highly denormalized data and nulls are very common.

However, I think there is some interest at my company for somehow building over frameless with Monocle. Do you think that would be something that other people find useful?

imarios · 2017-09-01T04:25:05Z

I see what you mean

mfelsche · 2018-02-12T20:48:09Z

This is also an issue the we got bitten by lately. Unfortunately we cannot adapt our model and thus, for now, need to use some hacky non-typechecked workarounds, which makes me sad.

I am way to new to shapeless and frameless to make a valuable contribution here, but I really hope that this is in general solvable.

imarios · 2018-02-12T21:34:42Z

I was hopping to get this working in #204, but I hit a wall with UDFs. The idea is to be able to do a map on an optional column TypedColumn[T, Option[X]] and then get back an unwrapped TypedColumn[T,X] so you can do anything you would if the Option was not there. I will probably try to do some work around this for the 0.6.0 release (0.5.0 is already out the door).

In the meanwhile you can probably work around this using a UDF, but you will have to serialize the entire column. If that is not an issue for you, then a UDF is a fairly ok typesafe work around.

t.makeUDF( (x: Option[Foo]) => x.bar + 1)

pgrandjean · 2018-08-08T17:42:33Z

Got the same issue.

mossprescott · 2020-02-18T19:38:15Z

Ran into this pretty quick as soon as we tried to do joinLeft and wondering if there's anything new to report. I asked in gitter as well; sorry for the spam!

ayoub-benali · 2021-01-03T21:55:58Z

@palmerlao #479 helps with this issues.

palmerlao changed the title ~~Cannot use colMany to access anything wrapped in Option~~ Cannot use colMany to access anything wrapped in Option Aug 31, 2017

imarios added the feature label Sep 1, 2017

kanterov mentioned this issue Sep 17, 2017

Parameterize TypedEncoder over nullability #182

Closed

vc019 mentioned this issue Mar 4, 2019

Add documentation examples for left join and right join #367

Open

OlivierBlanvillain mentioned this issue Jul 30, 2019

Add support for left and right joins #384

Closed

ayoub-benali mentioned this issue Oct 16, 2020

Dotted syntax (via lambda) for TypedColumns #449

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot use colMany to access anything wrapped in Option #168

Cannot use colMany to access anything wrapped in Option #168

palmerlao commented Aug 31, 2017 •

edited

Loading

imarios commented Sep 1, 2017 •

edited

Loading

palmerlao commented Sep 1, 2017 •

edited

Loading

imarios commented Sep 1, 2017

mfelsche commented Feb 12, 2018

imarios commented Feb 12, 2018 •

edited

Loading

pgrandjean commented Aug 8, 2018

mossprescott commented Feb 18, 2020

ayoub-benali commented Jan 3, 2021

Cannot use colMany to access anything wrapped in Option #168

Cannot use colMany to access anything wrapped in Option #168

Comments

palmerlao commented Aug 31, 2017 • edited Loading

imarios commented Sep 1, 2017 • edited Loading

palmerlao commented Sep 1, 2017 • edited Loading

imarios commented Sep 1, 2017

mfelsche commented Feb 12, 2018

imarios commented Feb 12, 2018 • edited Loading

pgrandjean commented Aug 8, 2018

mossprescott commented Feb 18, 2020

ayoub-benali commented Jan 3, 2021

palmerlao commented Aug 31, 2017 •

edited

Loading

imarios commented Sep 1, 2017 •

edited

Loading

palmerlao commented Sep 1, 2017 •

edited

Loading

imarios commented Feb 12, 2018 •

edited

Loading