Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Result mismatch in date format week year #7069

Open
ccat3z opened this issue Aug 30, 2024 · 5 comments
Open

[VL] Result mismatch in date format week year #7069

ccat3z opened this issue Aug 30, 2024 · 5 comments
Labels
bug Something isn't working triage

Comments

@ccat3z
Copy link
Contributor

ccat3z commented Aug 30, 2024

Backend

VL (Velox)

Bug description

'Y' means week-based-year in spark (SimpleDateFormat). But velox parse 'Y' as year.

Seq(
  ( 883584000), // 1998-01-01 00:00:00
  (1608912000), // 2020-12-26 00:00:00
  (1608998400), // 2020-12-27 00:00:00
  (1640361600), // 2021-12-25 00:00:00
  (1640448000), // 2021-12-26 00:00:00
  (1640966400), // 2022-01-01 00:00:00
  (1672416000), // 2022-12-31 00:00:00
  (1672502400), // 2023-01-01 00:00:00
  (1703865600), // 2023-12-30 00:00:00
  (1703952000), // 2023-12-31 00:00:00
).toDF("date")

spark.sql(s"""
  select
    from_unixtime(date, 'Y') as week_year,
    date
  from tmp
""")

/*
vanilla:                  gluten:
+---------+----------+    +---------+----------+
|week_year|      date|    |week_year|      date|
+---------+----------+    +---------+----------+
|     1998| 883584000|    |     1998| 883584000|
|     2020|1608912000|    |     2020|1608912000|
|     2021|1608998400|    |     2020|1608998400|
|     2021|1640361600|    |     2021|1640361600|
|     2022|1640448000|    |     2021|1640448000|
|     2022|1640966400|    |     2022|1640966400|
|     2022|1672416000|    |     2022|1672416000|
|     2023|1672502400|    |     2023|1672502400|
|     2023|1703865600|    |     2023|1703865600|
|     2024|1703952000|    |     2023|1703952000|
+---------+----------+    +---------+----------+
*/

I'm trying tofix this mismatch, and there are two issues that need to be resolved:

  1. JodaDateTimeFormatter interprets Y as the 'year of era', which is diff from SimpleDateFormat in java.

  2. Velox uses ISO standard to calc week date (e.g. Fix Spark WeekFunction on long years facebookincubator/velox#10713). But SimpleDateFormat use GregorianCalendar.

    The main difference is that SimpleDateFormat will define the firstDayOfWeek and the minimalDaysInFirstWeek based on locale. Tt is Sunday and 1 day by default. ISO8601 defines it as Monday and 4 days.

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@ccat3z ccat3z added bug Something isn't working triage labels Aug 30, 2024
@rui-mo
Copy link
Contributor

rui-mo commented Sep 3, 2024

Velox uses ISO standard to calc week date (e.g. facebookincubator/velox#10713). But SimpleDateFormat use GregorianCalendar.

Hi @ccat3z, does this imply that we need a Spark-specific implementation for the week function instead of reusing the Presto one? cc: @zml1206

@zml1206
Copy link
Contributor

zml1206 commented Sep 3, 2024

@ccat3z Are the time zone of the two tests consistent? You can use spark.sql.session.timeZone. from_unixtime depends on time zone. As far as I know, velox from_unixtimestamp is different from weekOfYear and it does not use the ISO standard. cc @rui-mo

@ccat3z
Copy link
Contributor Author

ccat3z commented Sep 4, 2024

@ccat3z Are the time zone of the two tests consistent? You can use spark.sql.session.timeZone. from_unixtime depends on time zone. As far as I know, velox from_unixtimestamp is different from weekOfYear and it does not use the ISO standard. cc @rui-mo

Two tests use same time zone.

@zml1206
Copy link
Contributor

zml1206 commented Sep 6, 2024

y and Y in SimpleDateFormat(spark) have different meanings. Y is week-based-year, y is year, but velox does not differentiate, and they are all processed by year.

@ccat3z
Copy link
Contributor Author

ccat3z commented Sep 6, 2024

facebookincubator/velox#10930 impl getWeekYear for both ISO 8601 and SimpleDateFormat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

3 participants