Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DateType #420

Merged
merged 27 commits into from
Feb 19, 2020
Merged

Support DateType #420

merged 27 commits into from
Feb 19, 2020

Conversation

elvaliuliuliu
Copy link
Contributor

@elvaliuliuliu elvaliuliuliu commented Feb 7, 2020

This PR exposes DateType which will support one of the DataType in #26. This PR will also fix #389 and #410. It provides the DateType support as below:

  1. Support DataFrame.Collect() for DateType.
  2. Support CreateDataFrame that takes in Date. Please see examples as below.
  3. Support UDF that takes in Date and UDF that returns Date. Please see examples as below.
// create df using CreateDataFrame.
var data = new List<GenericRow>();
data.Add(new GenericRow(new object[] { "Alice", new Date(2020, 1, 1) }));
data.Add(new GenericRow(new object[] { "Bob", new Date(2020, 1, 2) }));

var schema = new StructType(new List<StructField>()
{
    new StructField("name", new StringType()),
    new StructField("date", new DateType())
});
DataFrame df = spark.CreateDataFrame(data, schema);

// PrintSchema() prints:
// root
//  |-- name: string (nullable = true)
//  |-- date: date (nullable = true)
df.PrintSchema();

// Show() prints:
// +-----+----------+
// | name|      date|
// +-----+----------+
// |Alice|2020-01-01|
// |  Bob|2020-01-02|
// +-----+----------+
df.Show();

// udf that takes in DateTime.
Func<Column, Column> udf1 = Udf<Date, string>(s => s.ToString());
DataFrame udfDf1 = df.Select(udf1(df["date"]).As("udf1"));
Row[] rows1 = udfDf1.Collect().ToArray();

// PrintSchema() prints:
// root
//  |-- udf1: string (nullable = true)
udfDf1.PrintSchema();

// Show() prints:
// +----------+
// |   udf1   |
// +----------+
// |2020-01-01|
// |2020-01-02|
// +---------+
udfDf1.Show();

//udf that returns Date.
Func<Column, Column> udf2 = Udf<string, Date>(s => new Date(2020, 2, 2));
DataFrame udfDf2 = df.Select(udf2(df["name"]).As("udf2"));
Row[] rows2 = udfDf2.Collect().ToArray();

// PrintSchema() prints:
// root
//  |-- udf2: date (nullable = true)
udfDf2.PrintSchema();

// Show() prints:
// +----------+
// |      udf2|
// +----------+
// |2020-02-02|
// |2020-02-02|
// +----------+
udfDf2.Show();

@elvaliuliuliu elvaliuliuliu changed the title Support DateType [WIP] Support DateType Feb 7, 2020
@elvaliuliuliu elvaliuliuliu changed the title [WIP] Support DateType Support DateType Feb 12, 2020
@imback82 imback82 added the enhancement New feature or request label Feb 12, 2020
@ghost
Copy link

ghost commented Feb 12, 2020

All,

While this is being worked on, are there any work around to create a DataFrame with DateType columns populated with values?

Thanks.

@elvaliuliuliu
Copy link
Contributor Author

All,

While this is being worked on, are there any work around to create a DataFrame with DateType columns populated with values?

Thanks.

Yes, this should fix both #389 and #410.

@imback82
Copy link
Contributor

While this is being worked on, are there any work around to create a DataFrame with DateType columns populated with values?

@lawrencetvo The workaround is to create a string column with date representation, and use ToDate function to transform the column to DateType. You can check here: https://sparkbyexamples.com/spark/spark-convert-timestamp-to-date/

src/csharp/Microsoft.Spark/Sql/Types/Date.cs Outdated Show resolved Hide resolved
src/csharp/Microsoft.Spark/Sql/Types/Date.cs Outdated Show resolved Hide resolved
src/csharp/Microsoft.Spark/Sql/Types/Date.cs Outdated Show resolved Hide resolved
src/csharp/Microsoft.Spark/Sql/Types/Date.cs Show resolved Hide resolved
src/csharp/Microsoft.Spark/Sql/Types/Date.cs Outdated Show resolved Hide resolved
src/csharp/Microsoft.Spark/Sql/Types/SimpleTypes.cs Outdated Show resolved Hide resolved
imback82
imback82 previously approved these changes Feb 19, 2020
Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one nit comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants