-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44717][PYTHON][PS] Respect TimestampNTZ in resampling #42392
Conversation
cc @zhengruifeng and @attilapiros FYI |
cb0f65b
to
758016f
Compare
cc @gengliangwang too |
def test_series_resample(self): | ||
self.check_series_resample() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still depending on the TZ environment setting indirectly. So just by running the test on a different TZ this test (and also the test_dataframe_resample) would simply fail.
We should either guarantee the correct TZ in the beginning of the test or validate the assumption and produce a meaningful error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon this is still an issue. The old tests still are failing when the TZ is not UTC i.e in America/New_York.
See https://github.com/attilapiros/spark/actions/runs/5884585704/job/15960073184?pr=5
Anyway for production we have the "spark.sql.timestampType=TIMESTAMP_NTZ" settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ... this PR enables spark.sql.timestampType=TIMESTAMP_NTZ
as a workaround for now ..
To fix this, we need a bigger scope of change .. and can be arguable in a way.
class ResampleWithTimezoneTests( | ||
ResampleWithTimezoneMixin, PandasOnSparkTestUtils, TestUtils, ReusedConnectTestCase | ||
): | ||
@unittest.skip("SPARK-44731: Support 'spark.sql.timestampType' in Python Spark Connect client") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ueshin FYI
Merged to master and branch-3.5. |
### What changes were proposed in this pull request? This PR proposes to respect `TimestampNTZ` type in resampling at pandas API on Spark. ### Why are the changes needed? It still operates as if the timestamps are `TIMESTAMP_LTZ` even when `spark.sql.timestampType` is set to `TIMESTAMP_NTZ`, which is unexpected. ### Does this PR introduce _any_ user-facing change? This fixes a bug so end users can use exactly same behaviour with pandas with `TimestampNTZType` - pandas does not respect the local timezone with DST. While we might need to follow this even for `TimestampType`, this PR does not address the case as it might be controversial. ### How was this patch tested? Unittest was added. Closes #42392 from HyukjinKwon/SPARK-44717. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit e05959e) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…in Python Spark Connect ### What changes were proposed in this pull request? This PR proposes: - Share the namespaces for `to_timestamp_ntz`, `to_timestamp_ltz` and `to_unix_timestamp` in Spark Connect. They were missed. - Adds the support of `TimestampNTZ` for literal handling in Python Spark Connect (by respecting `spark.sql.timestampType`). ### Why are the changes needed? For feature parity, and respect timestamp ntz in resampling in pandas API on Spark ### Does this PR introduce _any_ user-facing change? Yes, this virtually fixes the same bug: #42392 in Spark Connect with Python. ### How was this patch tested? Unittests reenabled. Closes #42445 from HyukjinKwon/SPARK-44731. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
…in Python Spark Connect ### What changes were proposed in this pull request? This PR proposes: - Share the namespaces for `to_timestamp_ntz`, `to_timestamp_ltz` and `to_unix_timestamp` in Spark Connect. They were missed. - Adds the support of `TimestampNTZ` for literal handling in Python Spark Connect (by respecting `spark.sql.timestampType`). ### Why are the changes needed? For feature parity, and respect timestamp ntz in resampling in pandas API on Spark ### Does this PR introduce _any_ user-facing change? Yes, this virtually fixes the same bug: #42392 in Spark Connect with Python. ### How was this patch tested? Unittests reenabled. Closes #42445 from HyukjinKwon/SPARK-44731. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org> (cherry picked from commit 73b0376) Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
…in Python Spark Connect ### What changes were proposed in this pull request? This PR proposes: - Share the namespaces for `to_timestamp_ntz`, `to_timestamp_ltz` and `to_unix_timestamp` in Spark Connect. They were missed. - Adds the support of `TimestampNTZ` for literal handling in Python Spark Connect (by respecting `spark.sql.timestampType`). ### Why are the changes needed? For feature parity, and respect timestamp ntz in resampling in pandas API on Spark ### Does this PR introduce _any_ user-facing change? Yes, this virtually fixes the same bug: apache#42392 in Spark Connect with Python. ### How was this patch tested? Unittests reenabled. Closes apache#42445 from HyukjinKwon/SPARK-44731. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
### What changes were proposed in this pull request? This PR proposes to respect `TimestampNTZ` type in resampling at pandas API on Spark. ### Why are the changes needed? It still operates as if the timestamps are `TIMESTAMP_LTZ` even when `spark.sql.timestampType` is set to `TIMESTAMP_NTZ`, which is unexpected. ### Does this PR introduce _any_ user-facing change? This fixes a bug so end users can use exactly same behaviour with pandas with `TimestampNTZType` - pandas does not respect the local timezone with DST. While we might need to follow this even for `TimestampType`, this PR does not address the case as it might be controversial. ### How was this patch tested? Unittest was added. Closes apache#42392 from HyukjinKwon/SPARK-44717. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…in Python Spark Connect ### What changes were proposed in this pull request? This PR proposes: - Share the namespaces for `to_timestamp_ntz`, `to_timestamp_ltz` and `to_unix_timestamp` in Spark Connect. They were missed. - Adds the support of `TimestampNTZ` for literal handling in Python Spark Connect (by respecting `spark.sql.timestampType`). ### Why are the changes needed? For feature parity, and respect timestamp ntz in resampling in pandas API on Spark ### Does this PR introduce _any_ user-facing change? Yes, this virtually fixes the same bug: apache#42392 in Spark Connect with Python. ### How was this patch tested? Unittests reenabled. Closes apache#42445 from HyukjinKwon/SPARK-44731. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
### What changes were proposed in this pull request? This PR proposes to respect `TimestampNTZ` type in resampling at pandas API on Spark. ### Why are the changes needed? It still operates as if the timestamps are `TIMESTAMP_LTZ` even when `spark.sql.timestampType` is set to `TIMESTAMP_NTZ`, which is unexpected. ### Does this PR introduce _any_ user-facing change? This fixes a bug so end users can use exactly same behaviour with pandas with `TimestampNTZType` - pandas does not respect the local timezone with DST. While we might need to follow this even for `TimestampType`, this PR does not address the case as it might be controversial. ### How was this patch tested? Unittest was added. Closes apache#42392 from HyukjinKwon/SPARK-44717. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…in Python Spark Connect ### What changes were proposed in this pull request? This PR proposes: - Share the namespaces for `to_timestamp_ntz`, `to_timestamp_ltz` and `to_unix_timestamp` in Spark Connect. They were missed. - Adds the support of `TimestampNTZ` for literal handling in Python Spark Connect (by respecting `spark.sql.timestampType`). ### Why are the changes needed? For feature parity, and respect timestamp ntz in resampling in pandas API on Spark ### Does this PR introduce _any_ user-facing change? Yes, this virtually fixes the same bug: apache#42392 in Spark Connect with Python. ### How was this patch tested? Unittests reenabled. Closes apache#42445 from HyukjinKwon/SPARK-44731. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
What changes were proposed in this pull request?
This PR proposes to respect
TimestampNTZ
type in resampling at pandas API on Spark.Why are the changes needed?
It still operates as if the timestamps are
TIMESTAMP_LTZ
even whenspark.sql.timestampType
is set toTIMESTAMP_NTZ
, which is unexpected.Does this PR introduce any user-facing change?
This fixes a bug so end users can use exactly same behaviour with pandas with
TimestampNTZType
- pandas does not respect the local timezone with DST. While we might need to follow this even forTimestampType
, this PR does not address the case as it might be controversial.How was this patch tested?
Unittest was added.