[SPARK-44717][PYTHON][PS] Respect TimestampNTZ in resampling #42392

HyukjinKwon · 2023-08-08T11:17:15Z

What changes were proposed in this pull request?

This PR proposes to respect TimestampNTZ type in resampling at pandas API on Spark.

Why are the changes needed?

It still operates as if the timestamps are TIMESTAMP_LTZ even when spark.sql.timestampType is set to TIMESTAMP_NTZ, which is unexpected.

Does this PR introduce any user-facing change?

This fixes a bug so end users can use exactly same behaviour with pandas with TimestampNTZType - pandas does not respect the local timezone with DST. While we might need to follow this even for TimestampType, this PR does not address the case as it might be controversial.

How was this patch tested?

Unittest was added.

HyukjinKwon · 2023-08-08T11:17:31Z

cc @zhengruifeng and @attilapiros FYI

HyukjinKwon · 2023-08-08T11:24:10Z

cc @gengliangwang too

attilapiros · 2023-08-08T19:10:18Z

python/pyspark/pandas/tests/test_resample.py

+    def test_series_resample(self):
+        self.check_series_resample()


We are still depending on the TZ environment setting indirectly. So just by running the test on a different TZ this test (and also the test_dataframe_resample) would simply fail.

We should either guarantee the correct TZ in the beginning of the test or validate the assumption and produce a meaningful error.

@HyukjinKwon this is still an issue. The old tests still are failing when the TZ is not UTC i.e in America/New_York.

See https://github.com/attilapiros/spark/actions/runs/5884585704/job/15960073184?pr=5

Anyway for production we have the "spark.sql.timestampType=TIMESTAMP_NTZ" settings.

Yeah ... this PR enables spark.sql.timestampType=TIMESTAMP_NTZ as a workaround for now ..
To fix this, we need a bigger scope of change .. and can be arguable in a way.

HyukjinKwon · 2023-08-09T00:56:06Z

python/pyspark/pandas/tests/connect/test_parity_resample.py

+class ResampleWithTimezoneTests(
+    ResampleWithTimezoneMixin, PandasOnSparkTestUtils, TestUtils, ReusedConnectTestCase
+):
+    @unittest.skip("SPARK-44731: Support 'spark.sql.timestampType' in Python Spark Connect client")


cc @ueshin FYI

HyukjinKwon · 2023-08-09T02:03:29Z

Merged to master and branch-3.5.

### What changes were proposed in this pull request? This PR proposes to respect `TimestampNTZ` type in resampling at pandas API on Spark. ### Why are the changes needed? It still operates as if the timestamps are `TIMESTAMP_LTZ` even when `spark.sql.timestampType` is set to `TIMESTAMP_NTZ`, which is unexpected. ### Does this PR introduce _any_ user-facing change? This fixes a bug so end users can use exactly same behaviour with pandas with `TimestampNTZType` - pandas does not respect the local timezone with DST. While we might need to follow this even for `TimestampType`, this PR does not address the case as it might be controversial. ### How was this patch tested? Unittest was added. Closes #42392 from HyukjinKwon/SPARK-44717. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit e05959e) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…in Python Spark Connect ### What changes were proposed in this pull request? This PR proposes: - Share the namespaces for `to_timestamp_ntz`, `to_timestamp_ltz` and `to_unix_timestamp` in Spark Connect. They were missed. - Adds the support of `TimestampNTZ` for literal handling in Python Spark Connect (by respecting `spark.sql.timestampType`). ### Why are the changes needed? For feature parity, and respect timestamp ntz in resampling in pandas API on Spark ### Does this PR introduce _any_ user-facing change? Yes, this virtually fixes the same bug: #42392 in Spark Connect with Python. ### How was this patch tested? Unittests reenabled. Closes #42445 from HyukjinKwon/SPARK-44731. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

…in Python Spark Connect ### What changes were proposed in this pull request? This PR proposes: - Share the namespaces for `to_timestamp_ntz`, `to_timestamp_ltz` and `to_unix_timestamp` in Spark Connect. They were missed. - Adds the support of `TimestampNTZ` for literal handling in Python Spark Connect (by respecting `spark.sql.timestampType`). ### Why are the changes needed? For feature parity, and respect timestamp ntz in resampling in pandas API on Spark ### Does this PR introduce _any_ user-facing change? Yes, this virtually fixes the same bug: #42392 in Spark Connect with Python. ### How was this patch tested? Unittests reenabled. Closes #42445 from HyukjinKwon/SPARK-44731. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org> (cherry picked from commit 73b0376) Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

…in Python Spark Connect ### What changes were proposed in this pull request? This PR proposes: - Share the namespaces for `to_timestamp_ntz`, `to_timestamp_ltz` and `to_unix_timestamp` in Spark Connect. They were missed. - Adds the support of `TimestampNTZ` for literal handling in Python Spark Connect (by respecting `spark.sql.timestampType`). ### Why are the changes needed? For feature parity, and respect timestamp ntz in resampling in pandas API on Spark ### Does this PR introduce _any_ user-facing change? Yes, this virtually fixes the same bug: apache#42392 in Spark Connect with Python. ### How was this patch tested? Unittests reenabled. Closes apache#42445 from HyukjinKwon/SPARK-44731. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

### What changes were proposed in this pull request? This PR proposes to respect `TimestampNTZ` type in resampling at pandas API on Spark. ### Why are the changes needed? It still operates as if the timestamps are `TIMESTAMP_LTZ` even when `spark.sql.timestampType` is set to `TIMESTAMP_NTZ`, which is unexpected. ### Does this PR introduce _any_ user-facing change? This fixes a bug so end users can use exactly same behaviour with pandas with `TimestampNTZType` - pandas does not respect the local timezone with DST. While we might need to follow this even for `TimestampType`, this PR does not address the case as it might be controversial. ### How was this patch tested? Unittest was added. Closes apache#42392 from HyukjinKwon/SPARK-44717. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…in Python Spark Connect ### What changes were proposed in this pull request? This PR proposes: - Share the namespaces for `to_timestamp_ntz`, `to_timestamp_ltz` and `to_unix_timestamp` in Spark Connect. They were missed. - Adds the support of `TimestampNTZ` for literal handling in Python Spark Connect (by respecting `spark.sql.timestampType`). ### Why are the changes needed? For feature parity, and respect timestamp ntz in resampling in pandas API on Spark ### Does this PR introduce _any_ user-facing change? Yes, this virtually fixes the same bug: apache#42392 in Spark Connect with Python. ### How was this patch tested? Unittests reenabled. Closes apache#42445 from HyukjinKwon/SPARK-44731. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

### What changes were proposed in this pull request? This PR proposes to respect `TimestampNTZ` type in resampling at pandas API on Spark. ### Why are the changes needed? It still operates as if the timestamps are `TIMESTAMP_LTZ` even when `spark.sql.timestampType` is set to `TIMESTAMP_NTZ`, which is unexpected. ### Does this PR introduce _any_ user-facing change? This fixes a bug so end users can use exactly same behaviour with pandas with `TimestampNTZType` - pandas does not respect the local timezone with DST. While we might need to follow this even for `TimestampType`, this PR does not address the case as it might be controversial. ### How was this patch tested? Unittest was added. Closes apache#42392 from HyukjinKwon/SPARK-44717. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…in Python Spark Connect ### What changes were proposed in this pull request? This PR proposes: - Share the namespaces for `to_timestamp_ntz`, `to_timestamp_ltz` and `to_unix_timestamp` in Spark Connect. They were missed. - Adds the support of `TimestampNTZ` for literal handling in Python Spark Connect (by respecting `spark.sql.timestampType`). ### Why are the changes needed? For feature parity, and respect timestamp ntz in resampling in pandas API on Spark ### Does this PR introduce _any_ user-facing change? Yes, this virtually fixes the same bug: apache#42392 in Spark Connect with Python. ### How was this patch tested? Unittests reenabled. Closes apache#42445 from HyukjinKwon/SPARK-44731. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

github-actions bot added PYTHON PANDAS API ON SPARK labels Aug 8, 2023

Respect TimestampNTZ in resampling

758016f

HyukjinKwon force-pushed the SPARK-44717 branch from cb0f65b to 758016f Compare August 8, 2023 11:18

zhengruifeng approved these changes Aug 8, 2023

View reviewed changes

attilapiros reviewed Aug 8, 2023

View reviewed changes

Address comments

1d3df69

HyukjinKwon commented Aug 9, 2023

View reviewed changes

HyukjinKwon closed this in e05959e Aug 9, 2023

HyukjinKwon mentioned this pull request Aug 11, 2023

[SPARK-44731][PYTHON][CONNECT] Make TimestampNTZ works with literals in Python Spark Connect #42445

Closed

HyukjinKwon deleted the SPARK-44717 branch January 15, 2024 00:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-44717][PYTHON][PS] Respect TimestampNTZ in resampling #42392

[SPARK-44717][PYTHON][PS] Respect TimestampNTZ in resampling #42392

HyukjinKwon commented Aug 8, 2023

HyukjinKwon commented Aug 8, 2023

HyukjinKwon commented Aug 8, 2023

attilapiros Aug 8, 2023

attilapiros Aug 17, 2023

HyukjinKwon Aug 21, 2023

HyukjinKwon Aug 9, 2023

HyukjinKwon commented Aug 9, 2023

[SPARK-44717][PYTHON][PS] Respect TimestampNTZ in resampling #42392

[SPARK-44717][PYTHON][PS] Respect TimestampNTZ in resampling #42392

Conversation

HyukjinKwon commented Aug 8, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

HyukjinKwon commented Aug 8, 2023

HyukjinKwon commented Aug 8, 2023

attilapiros Aug 8, 2023

Choose a reason for hiding this comment

attilapiros Aug 17, 2023

Choose a reason for hiding this comment

HyukjinKwon Aug 21, 2023

Choose a reason for hiding this comment

HyukjinKwon Aug 9, 2023

Choose a reason for hiding this comment

HyukjinKwon commented Aug 9, 2023