Alternate column referencing syntax for TypeDataset #39

OlivierBlanvillain · 2016-06-06T12:40:24Z

It would be nice to add an alternate column referencing syntax to the TypedDataset API which is closer to the vanilla syntax, similarly to the way it's done in TypeDataFrame.

Currently it looks like: (source)

val dataset = TypedDataset.create(data)
val A = dataset.col[A]('a)
val B = dataset.col[B]('b)

val dataset2 = dataset.select(A, B).collect().run().toVector

I think it should be possible to change it to something like:

val dataset = TypedDataset.create(data)
val dataset2 = dataset.select('a, 'b).collect().run().toVector

It would be also interesting to investigate an alternate syntax for td.colMany('b, 'b) (equivalent to accessing _.b.b).

The text was updated successfully, but these errors were encountered:

kanterov · 2016-06-07T04:13:42Z

I was playing with this by trying implicit conversion from Symbol to TypedColumn but I wasn't able to capture symbol value on type level this way. It should be possible with implicit macro, but we should somehow minimize an amount of macro code we write ourselves and rely on tools from shapeless.

imarios · 2016-10-10T05:00:48Z

Hey guys, I think this is a great feature and it will make writing expressions much cleaner. I was able to get this working:

def select[A](column: Witness.Lt[Symbol])(
    implicit
    exists: TypedColumn.Exists[T, column.T, A],
    encoder: TypedEncoder[A]): TypedDataset[A] = select(col(column))

It combines what the col method does to get the typed column and then feed the column to the existing select.

With this you can do select('foo) and it works.

Unfortunately this cause a strange issue when passing a TypedAggregateAndColumn to select. For example, test("count") in AggregateFunctionsTests.scala stopped compiling.

Obviously, the solution is not perfect (since it's causing an issue), but the direction might be promising? What do you guys think? Any ideas?

OlivierBlanvillain · 2016-10-10T06:13:59Z

The biggest challenge with this syntax (besides IDE support) is the support full scope of Spark Column expressions, that is, being able to write stuff like select('foo + 1).

Early work on the lib made use of shapeless' SingletonProductArgs macro to solve the non expression part of this problem, if you are interested here is the select implementation and test from git's history.

OlivierBlanvillain · 2016-10-10T06:45:46Z

I think we could make @kanterov idea of "implicit conversion from Symbol to TypedColumn" work in typelevel-scala:

scala> :paste
// Entering paste mode (ctrl-D to finish)

trait TypedColumn[S <: Singleton]

implicit def lift[S <: Singleton](s: S): TypedColumn[S] = new TypedColumn[S] {}

def select[A <: Singleton](p: TypedColumn[A]) = p

implicit class AddIntToTypedColumn(i: Int) {
  def plus[S <: Singleton](s: TypedColumn[S]) = s
}

// Exiting paste mode, now interpreting.

defined trait TypedColumn
lift: [S <: Singleton](s: S)TypedColumn[S]
select: [A <: Singleton](p: TypedColumn[A])TypedColumn[A]
defined class AddIntToTypedColumn

scala> select("hello")
res0: TypedColumn["hello"] = $anon$1@7e40c3aa

scala> select(1 plus "hello")
res1: TypedColumn["hello"] = $anon$1@5d864a5

imarios · 2016-10-10T16:56:20Z

@OlivierBlanvillain yes, supporting expressions with select should definetly be part of ay solution.
Btw the above snippet gives this error for me:

// Exiting paste mode, now interpreting.

defined trait TypedColumn
lift: [S <: Singleton](s: S)TypedColumn[S]
select: [A <: Singleton](p: TypedColumn[A])TypedColumn[A]
defined class AddIntToTypedColumn

scala> select("hello")
<console>:19: error: type mismatch;
 found   : String("hello")
 required: TypedColumn[?]
       select("hello")

OlivierBlanvillain · 2016-10-11T07:49:13Z

On my setup it works with the following:

$ cat build.sbt
scalaVersion := "2.11.8"

scalaOrganization := "org.typelevel"

libraryDependencies ++= Seq(
  "org.typelevel" %% "cats"      % "0.7.2",
  "com.chuusai"   %% "shapeless" % "2.3.2")

scalacOptions := Seq(
  "-deprecation",
  "-encoding", "UTF-8",
  "-feature",
  "-language:implicitConversions",
  "-unchecked",
  "-Xfuture",
  "-Xlint",
  "-Yinline-warnings",
  "-Yno-adapted-args",
  "-Ywarn-dead-code",
  "-Ywarn-numeric-widen",
  "-Ypartial-unification",
  "-Yliteral-types",
  "-Ywarn-value-discard")
$ cat project/build.properties 
sbt.version=0.13.13-RC2
$ sbt console
[...]

kanterov · 2016-10-11T08:03:49Z

@OlivierBlanvillain This is awesome. Does it require Typelevel Scala to compile user code? In this case, we might still want to investigate if we can reuse macro from shapeless somehow, it would be nice if we find cheap solution instead of forcing users to switch Scala compiler :).

OlivierBlanvillain · 2016-10-11T08:07:38Z

Yes, we would need this PR to be merged to have singelton types in Lightbend Scala.

joan38 · 2019-07-18T23:22:25Z

Wow, I'm looking forward to have this if one day Spark compiles with 2.13.0

cchantep · 2021-09-07T13:51:29Z

Hi, closing it for now with #449 merged.

OlivierBlanvillain mentioned this issue Oct 27, 2016

Experiment with new function syntax #60

Closed

imarios added enhancement question discussion and removed question labels May 17, 2017

cchantep closed this as completed Sep 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternate column referencing syntax for TypeDataset #39

Alternate column referencing syntax for TypeDataset #39

OlivierBlanvillain commented Jun 6, 2016

kanterov commented Jun 7, 2016

imarios commented Oct 10, 2016 •

edited

Loading

OlivierBlanvillain commented Oct 10, 2016 •

edited

Loading

OlivierBlanvillain commented Oct 10, 2016 •

edited

Loading

imarios commented Oct 10, 2016

OlivierBlanvillain commented Oct 11, 2016

kanterov commented Oct 11, 2016

OlivierBlanvillain commented Oct 11, 2016 •

edited

Loading

joan38 commented Jul 18, 2019

cchantep commented Sep 7, 2021

Alternate column referencing syntax for TypeDataset #39

Alternate column referencing syntax for TypeDataset #39

Comments

OlivierBlanvillain commented Jun 6, 2016

kanterov commented Jun 7, 2016

imarios commented Oct 10, 2016 • edited Loading

OlivierBlanvillain commented Oct 10, 2016 • edited Loading

OlivierBlanvillain commented Oct 10, 2016 • edited Loading

imarios commented Oct 10, 2016

OlivierBlanvillain commented Oct 11, 2016

kanterov commented Oct 11, 2016

OlivierBlanvillain commented Oct 11, 2016 • edited Loading

joan38 commented Jul 18, 2019

cchantep commented Sep 7, 2021

imarios commented Oct 10, 2016 •

edited

Loading

OlivierBlanvillain commented Oct 10, 2016 •

edited

Loading

OlivierBlanvillain commented Oct 10, 2016 •

edited

Loading

OlivierBlanvillain commented Oct 11, 2016 •

edited

Loading