Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Timestamps can no longer be cast to long #884

Closed
AndreasVolkmann opened this issue Apr 6, 2021 · 6 comments
Closed

[BUG]: Timestamps can no longer be cast to long #884

AndreasVolkmann opened this issue Apr 6, 2021 · 6 comments
Labels
question Further information is requested

Comments

@AndreasVolkmann
Copy link

Describe the bug
In version 0.9.0 we were able to run the following:

row.GetAs<long>(timestampColumn);

Where the target column is of type Timestamp.

However, in later versions the same code throws the following exception:

System.InvalidCastException: Unable to cast object of type 'Microsoft.Spark.Sql.Types.Timestamp' to type 'System.Int64'.
    at Microsoft.Spark.Utils.TypeConverter.ConvertTo[T](Object obj)
   at Microsoft.Spark.Sql.Row.GetAs[T](String columnName)

This PR may be related: #428.

It starts breaking in version 0.11.0 and I also tried in the latest version.

To Reproduce
I wrote a test to reproduce this in my project:

[TestMethod]
public void Convert_Timestamp_ToLong()
{
	// Arrange
	const string columnName = "c";
	var structType = new StructType(new[] { new StructField(columnName, new TimestampType()) });
	var dateTime = DateTime.Parse("2021-04-06 10:20:30");
	var ticks = (long)dateTime.Subtract(new DateTime(1970, 1, 1)).TotalMicroseconds();
	Console.WriteLine(ticks);
	var values = new object[] { ticks };
	var row = CreateInstance<Row>(values, structType);

	// Act
	var result = row.GetAs<long>(columnName);

	// Assert
	result.Should().Be(ticks);
}


private static T CreateInstance<T>(params object[] args)
{
	var type = typeof(T);
	var instance = type.Assembly.CreateInstance(
		type.FullName!,
		false,
		BindingFlags.Instance | BindingFlags.NonPublic,
		null,
		args,
		null,
		null);
	return (T)instance;
}

Expected behavior
Expected the cast to succeed as before.

@AndreasVolkmann AndreasVolkmann added the bug Something isn't working label Apr 6, 2021
@imback82
Copy link
Contributor

imback82 commented Apr 6, 2021

@suhsteve Can you help when you get a chance?

@suhsteve
Copy link
Member

suhsteve commented Apr 6, 2021

The reason why this worked in 0.9.0 was because we didn't have complete support for TimestampType and we were just passing the raw unpickled/pickled long values to and from spark. The PR #428 you linked added support for the Timestamp type.

Looking at your example, it looks like row.GetAs<long>(timestampColumn) may be equivalent to something like row.GetAs<Timestamp>(timestampColumn).GetIntervalInMicroseconds() .

Since GetIntervalInMicroseconds() is internal, we either make this public or allow Timestamp columns to be converted to long when using Row.GetAs<> by adding support in Microsoft.Spark.Utils.TypeConverter ? I'm partial to the second approach.

What do you think @imback82 ?

@imback82
Copy link
Contributor

imback82 commented Apr 6, 2021

Thanks @suhsteve for looking into this.

@AndreasVolkmann Can you just work with C#'s DateTime object? You could get the Timestamp object first as @suhsteve was suggesting row.GetAs<Timestamp>(timestampColumn) and just call ToDateTime:

public DateTime ToDateTime() => _dateTime;

, and you can convert DateTime object to any value you want to work with.

@imback82
Copy link
Contributor

imback82 commented Apr 6, 2021

I am removing a bug label since casting the timestamp to long is not supported since 0.11; the current version is more type-safe.

@imback82 imback82 added question Further information is requested and removed bug Something isn't working labels Apr 6, 2021
@AndreasVolkmann
Copy link
Author

@imback82 We worked around it by calling the internal functions that Steve mentioned.
It would definitely be great to have either option 1 or 2 from @suhsteve implemented.

@imback82
Copy link
Contributor

Since GetIntervalInMicroseconds is simple to implement, can you introduce it in your app? We already expose DateTime object via ToDateTime. We are trying to maintain API parity for types that are available in Scala, Python, and .NET, and minimize .NET specific APIs if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants