Missing precision for to_datetime in DataFrame serializer #328

Elfoniok · 2021-09-11T11:14:36Z

Steps to reproduce:

Create a pandas Data Frame with non nano seconds timestamps. Set Data Frame with a time column as an index.
Set the precision to 's' in my case cause timestamps are with seconds precision. Use write api to write it.
The timestamps will be converted to 1970.

Maybe I am doing something wrong but importing Pandas Data Frames with this API is poorly documented, especially when it comes to pre-existing date columns, which is very common scenario if you need to import Data Frames.

I have failed to find any docs how date should be specified (column name? data type?) in Data Frame. So after consulting code I have found that it actually has to be an index!

https://github.com/influxdata/influxdb-client-python/blob/master/influxdb_client/client/write/dataframe_serializer.py#L96

Well ok so be it. However I am not very familiar with PeriodIndex, is that common for Time Series Data Frames? I am always using plain int for date stamp column and I can make it an index. So i would fall in else clause. Despite TO DO that it might be now what I want, it is exactly what I want. Except only if I am using nano second timestamps ;(. I noticed lack of precision parameter passed to to_datetime. Patching this solve the issue for me. I could push the change but it bothers me that maybe I am doing something wrong?

Expected behavior:
Data points in influx should use timestamps from time index correctly.

Actual behavior:
All dates are converted to some silly date around 1970

Specifications:

Client Version:1.20
InfluxDB Version:2.0.8
Platform:5.4.0-81-generic 18.04.1-Ubuntu

The text was updated successfully, but these errors were encountered:

bednar · 2021-09-13T09:34:01Z

Hi @Elfoniok,

thanks for using our client.

Well ok so be it. However I am not very familiar with PeriodIndex, is that common for Time Series Data Frames? I am always using plain int for date stamp column and I can make it an index.

You can change your index via: data_frame = data_frame.set_index('a').

Can you share how your DataFrame looks like?

Regards

Elfoniok · 2021-09-25T10:13:30Z

Yes I know I can use set_index. Maybe I was not clear since you are quoting me but somehow missing the point.

In the link, I have pasted above, is a code branch for PeriodIndex. I don't know how it is used I am not very familiar with pandas. I am using dumb "int" like index or pandas Date. And yes I am using set_index on the chosen column.
Therefore my control flow is going through the else branch, and it seems clear it is working only if timestamps are in ns format. Which is the default for pandas to_datetime.

Anyway here is an example, and an attached data file.
1_result-data.txt

import pandas
from influxdb_client import InfluxDBClient
from influxdb_client.extras import pd, np
df = pandas.read_csv("1_result-data.txt", sep=' ', names=["secs_since_midnight", "Date", "uptime", "loadavg_1", "loadavg_5", "loadavg_15", "user", "nice", "system", "idle", "io", "IRQ/other", "cpu_avg_mhz", "cpu_degc", "total", "free", "buffers", "cached", "available", "swp-total", "swp-free", "swp-cached", "swp-compressed", "cpu2-degc" ])

df.set_index('Date', inplace=True)
pd.to_datetime(df.index, unit='s')

with InfluxDBClient(url=url, token=my_token, org=my_org, debug=True) as client:

    """
    Use batching API
    """
    with client.write_api() as write_api:
        write_api.write(bucket=bucket, record=df,
                        #data_frame_tag_columns=['user', 'system', 'idle'],
                        write_precision='s',
                        data_frame_measurement_name="Test_17")
        print()
        print("Wait to finishing ingesting DataFrame...")
        print()

print()
print(f'Import finished in:')
print()

bednar · 2021-09-30T08:16:55Z

In the link, I have pasted above, is a code branch for PeriodIndex. I don't know how it is used I am not very familiar with pandas. I am using dumb "int" like index or pandas Date. And yes I am using set_index on the chosen column.

You have to change your index to result of pd.to_datetime. Something like will work for you:

df.set_index('Date', inplace=True)
df.index = pandas.to_datetime(df.index, unit='s')

Therefore my control flow is going through the else branch, and it seems clear it is working only if timestamps are in ns format. Which is the default for pandas to_datetime.

Thanks for clarification, I will fix it ASAP, meanwhile you can used the above workaround ⬆️ .

Elfoniok · 2021-10-01T20:19:36Z

Awesome, thanks for adding the docs as well, I really appreciate that!

bednar added the question Further information is requested label Sep 13, 2021

bednar mentioned this issue Sep 30, 2021

feat: add supports for custom precision for index specified as number [DataFrame] #335

Merged

6 tasks

bednar added enhancement New feature or request and removed question Further information is requested labels Oct 4, 2021

bednar added this to the 1.22.0 milestone Oct 12, 2021

bednar closed this as completed in #335 Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing precision for to_datetime in DataFrame serializer #328

Missing precision for to_datetime in DataFrame serializer #328

Elfoniok commented Sep 11, 2021 •

edited

Loading

bednar commented Sep 13, 2021

Elfoniok commented Sep 25, 2021 •

edited

Loading

bednar commented Sep 30, 2021

Elfoniok commented Oct 1, 2021

Missing precision for to_datetime in DataFrame serializer #328

Missing precision for to_datetime in DataFrame serializer #328

Comments

Elfoniok commented Sep 11, 2021 • edited Loading

bednar commented Sep 13, 2021

Elfoniok commented Sep 25, 2021 • edited Loading

bednar commented Sep 30, 2021

Elfoniok commented Oct 1, 2021

Elfoniok commented Sep 11, 2021 •

edited

Loading

Elfoniok commented Sep 25, 2021 •

edited

Loading