enh: add dt.timestamp #1220

raisadz · 2024-10-18T13:32:38Z

What type of PR is this? (check all applicable)

Related issues

Related issue #
Closes enh: add dt.timestamp #1182

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below.

…k pyarrow types to to_date test

MarcoGorelli · 2024-10-18T14:37:14Z

narwhals/_arrow/series.py

+                if time_unit == "ns":
+                    result = s_cast
+                if time_unit == "us":
+                    result = pc.divide(s_cast, 1_000)
+                if time_unit == "ms":
+                    result = pc.divide(s_cast, 1_000_000)


for all of these, it would be marginally more efficient to use elif, else we're doing similar comparisons

we're already validating time_unit in narwhals/expr.py, so this could just be

if time_unit == "ns": result = pc.multiply(s_cast, 1_000) elif time_unit == "us": result = s_cast else: result = pc.divide(s_cast, 1_000)

MarcoGorelli

thanks @raisadz , serious effort here, looks really nice, and this is a really valuable feature!

I'd suggest adding in a hypothesis test, something like

from hypothesis import given
import hypothesis.strategies as st

@given(
    inputs=st.datetimes(min_value=datetime(1960,1,1), max_value=datetime(1980,1,1)),
    time_unit=st.sampled_from(['ms', 'us', 'ns']),
    # We keep 'ms' out for now due to an upstream bug: https://github.com/pola-rs/polars/issues/19309
    starting_time_unit=st.sampled_from(['us', 'ns'])
)
def test_me(inputs, time_unit, starting_time_unit) -> None:
    import polars as pl
    import pandas as pd
    import pyarrow as pa
    @nw.narwhalify
    def func(s):
        return s.dt.timestamp(time_unit)
    result_pl = func(pl.Series([inputs], dtype=pl.Datetime(starting_time_unit)))
    result_pd = func(pd.Series([inputs], dtype=f'datetime64[{starting_time_unit}]'))
    result_pdpa = func(pd.Series([inputs], dtype=f'timestamp[{starting_time_unit}][pyarrow]'))
    result_pa = func(pa.chunked_array([[inputs]], type=pa.timestamp(starting_time_unit)))
    assert result_pl[0] == result_pd[0]
    assert result_pl[0] == result_pdpa[0]
    assert result_pl[0] == result_pa[0].as_py()

(but with a better name 😅 )

MarcoGorelli · 2024-10-18T14:38:05Z

narwhals/_arrow/series.py

+                    result = s_cast
+                if time_unit == "ms":
+                    result = pc.divide(s_cast, 1_000)
+            if unit == "ms":


same with these ifs

MarcoGorelli · 2024-10-18T14:56:34Z

narwhals/_arrow/series.py

+                if time_unit == "ns":
+                    result = s_cast
+                if time_unit == "us":
+                    result = pc.divide(s_cast, 1_000)


I think here we'll need floordiv_compat instead of divide so it's consistent with how we do it for the other backends

FBruzzesi · 2024-10-18T20:28:05Z

I am catching up with Marco's today's livestream - as he explains how the integer representation works, wouldn't it make sense to have something like the following pseudocode (details such as nulls, time units, time zones, date vs datetime, etc need to be taken care)

diff = series - datetime(1970, 1, 1, 0, 0, 0, tzinfo=...)
if time_unit == "ns":
    return diff.dt.total_nanoseconds()
elif ...
...

Maybe it ends up being similar effort, but feels less redundant than all the nested if cases. I just wanted to throw the idea out there, feel free to ignore it :)

Edit: pyarrow also has nanoseconds_between, microseconds_between and milliseconds_between two date/datetimes.

MarcoGorelli · 2024-10-20T07:52:07Z

tests/expr_and_series/dt/timestamp_test.py

+    @nw.narwhalify
+    def func(s: nw.Series) -> nw.Series:
+        return s.dt.timestamp(time_unit)  # type: ignore[return-value]


the type ignore is due to #1229

MarcoGorelli · 2024-10-20T08:36:28Z

thanks @FBruzzesi for taking a look! total_nanoseconds is for Duration, not Datetime, so we can't use it here unfortunately 😅

FBruzzesi · 2024-10-20T08:42:48Z

thanks @FBruzzesi for taking a look! total_nanoseconds is for Duration, not Datetime, so we can't use it here unfortunately 😅

I am most likely missing something - what I am suggesting is, instead of using the date(time) representation, compute the difference between the series/expr of interest and the scalar 1970-01-01 00:00:00 (unix_epoch). This would result in a Duration object (wouldn't it?) and then use the Duration methods. Sorry if I misunderstood the internal representation explanation

MarcoGorelli · 2024-10-20T08:47:25Z

ah sorry you're right 🤦 maybe that's indeed simpler, will try!

MarcoGorelli · 2024-10-20T09:24:20Z

i think a cast would always be more efficient, from a little timing test:

In [50]: %timeit (s - datetime(1970,1,1)).dt.total_seconds()
465 μs ± 64.1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [51]: %timeit s.astype('int64')
52.9 μs ± 10.8 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

FBruzzesi

Amazing stuff 🔥 I left a few comments for me to learn 🙈

FBruzzesi · 2024-10-20T20:34:05Z

tests/expr_and_series/dt/timestamp_test.py

+        ("s", "ms", [978307200000, None, 978480000000]),
+    ],
+)
+def test_timestamp_datetimes_tz_aware(


Nice one 👌

FBruzzesi · 2024-10-20T20:35:23Z

narwhals/series.py

@@ -4001,3 +4001,63 @@ def convert_time_zone(self, time_zone: str) -> Series:
        return self._narwhals_series._from_compliant_series(
            self._narwhals_series._compliant_series.dt.convert_time_zone(time_zone)
        )
+
+    def timestamp(self, time_unit: Literal["ns", "us", "ms"] = "us") -> Series:


Heads up if merging with main branch:

Suggested change

def timestamp(self, time_unit: Literal["ns", "us", "ms"] = "us") -> Series:

def timestamp(self: Self, time_unit: Literal["ns", "us", "ms"] = "us") -> T:

(and same in Expr)

FBruzzesi · 2024-10-20T20:36:44Z

narwhals/_pandas_like/series.py

+        is_pyarrow_dtype = "pyarrow" in str(self._pandas_series._native_series.dtype)
+        mask_na = s.isna()
+        if dtype == self._pandas_series._dtypes.Date:
+            s_cast = s.astype("Int32[pyarrow]")


That's because date can only be pyarrow backed, is that right?

yup! will add a comment

FBruzzesi · 2024-10-20T20:38:17Z

narwhals/_pandas_like/series.py

+                s_cast = s.view("Int64[pyarrow]") if is_pyarrow_dtype else s.view("int64")
+            else:
+                s_cast = (
+                    s.astype("Int64[pyarrow]") if is_pyarrow_dtype else s.astype("int64")
+                )


I assume .view is more efficient?

it's the most ridiculous thing - pandas 1.5 emits a warning telling you to use view instead of astype, and pandas 2.x emits a warning telling you to use astype instead of view because view is deprecated 🤦 🤦 🤦

MarcoGorelli

thanks @raisadz , and @FBruzzesi for reviewing!

raisadz added 12 commits October 17, 2024 10:02

add timestamp method

487a525

add docstring example for series

b1e568c

add docstring example for expr

5ecbed9

update example to use time unit

7ed770d

add tests

d3f76cb

Merge remote-tracking branch 'upstream/main' into add-dt-timestamp

c9a8ec0

preserve pyarrow types, add tests

839891b

fix dtype comparisons, add test for dates

485c1e1

add parametrization

bf909ad

resolve conflicts after merge

5301aa4

parametrize for other time units

a37e5cd

move common functions to utils, add tests for invalid inputs, add das…

fee2422

…k pyarrow types to to_date test

github-actions bot added the enhancement New feature or request label Oct 18, 2024

MarcoGorelli reviewed Oct 18, 2024

View reviewed changes

MarcoGorelli added 3 commits October 20, 2024 08:14

Merge remote-tracking branch 'upstream/main' into add-dt-timestamp

ee760e6

use more elif/else statements

dc86689

add timestamp_test

70d6462

MarcoGorelli reviewed Oct 20, 2024

View reviewed changes

MarcoGorelli added 3 commits October 20, 2024 08:57

version compat

b1e97c9

pandas versions compat

124d588

coverage

6277094

MarcoGorelli mentioned this pull request Oct 20, 2024

test: test_nested_dtypes fails with Polars==0.20.30 #1230

Closed

MarcoGorelli added 2 commits October 20, 2024 10:46

improve type hints

d158f22

insert a time zone for good measure

49d235c

MarcoGorelli added 7 commits October 20, 2024 10:55

set time zone to utc first

674b912

split time zone into separate test

263f093

more version-dependent xfails

532b07f

xfail strict=False for these

186f926

coverage

a5ed5f6

dask xfail

a022148

modin xfail

3a572e6

MarcoGorelli marked this pull request as ready for review October 20, 2024 11:43

FBruzzesi reviewed Oct 20, 2024

View reviewed changes

MarcoGorelli added 2 commits October 21, 2024 07:52

Merge remote-tracking branch 'upstream/main' into add-dt-timestamp

a01d9ad

typing

32bb502

MarcoGorelli approved these changes Oct 21, 2024

View reviewed changes

MarcoGorelli merged commit c692b81 into narwhals-dev:main Oct 21, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enh: add dt.timestamp #1220

enh: add dt.timestamp #1220

raisadz commented Oct 18, 2024

MarcoGorelli Oct 18, 2024

MarcoGorelli left a comment

MarcoGorelli Oct 18, 2024

MarcoGorelli Oct 18, 2024

FBruzzesi commented Oct 18, 2024 •

edited

Loading

MarcoGorelli Oct 20, 2024 •

edited

Loading

MarcoGorelli commented Oct 20, 2024

FBruzzesi commented Oct 20, 2024

MarcoGorelli commented Oct 20, 2024

MarcoGorelli commented Oct 20, 2024

FBruzzesi left a comment

FBruzzesi Oct 20, 2024

FBruzzesi Oct 20, 2024

FBruzzesi Oct 20, 2024

MarcoGorelli Oct 20, 2024

FBruzzesi Oct 20, 2024

MarcoGorelli Oct 20, 2024

MarcoGorelli left a comment

	def timestamp(self, time_unit: Literal["ns", "us", "ms"] = "us") -> Series:
	def timestamp(self: Self, time_unit: Literal["ns", "us", "ms"] = "us") -> T:

enh: add dt.timestamp #1220

enh: add dt.timestamp #1220

Conversation

raisadz commented Oct 18, 2024

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FBruzzesi commented Oct 18, 2024 • edited Loading

MarcoGorelli Oct 20, 2024 • edited Loading

Choose a reason for hiding this comment

MarcoGorelli commented Oct 20, 2024

FBruzzesi commented Oct 20, 2024

MarcoGorelli commented Oct 20, 2024

MarcoGorelli commented Oct 20, 2024

FBruzzesi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

FBruzzesi commented Oct 18, 2024 •

edited

Loading

MarcoGorelli Oct 20, 2024 •

edited

Loading