Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing precision for to_datetime in DataFrame serializer #328

Closed
Elfoniok opened this issue Sep 11, 2021 · 4 comments · Fixed by #335
Closed

Missing precision for to_datetime in DataFrame serializer #328

Elfoniok opened this issue Sep 11, 2021 · 4 comments · Fixed by #335
Labels
enhancement New feature or request
Milestone

Comments

@Elfoniok
Copy link

Elfoniok commented Sep 11, 2021

Steps to reproduce:

  1. Create a pandas Data Frame with non nano seconds timestamps. Set Data Frame with a time column as an index.
  2. Set the precision to 's' in my case cause timestamps are with seconds precision. Use write api to write it.
  3. The timestamps will be converted to 1970.

Maybe I am doing something wrong but importing Pandas Data Frames with this API is poorly documented, especially when it comes to pre-existing date columns, which is very common scenario if you need to import Data Frames.

I have failed to find any docs how date should be specified (column name? data type?) in Data Frame. So after consulting code I have found that it actually has to be an index!

https://github.com/influxdata/influxdb-client-python/blob/master/influxdb_client/client/write/dataframe_serializer.py#L96

Well ok so be it. However I am not very familiar with PeriodIndex, is that common for Time Series Data Frames? I am always using plain int for date stamp column and I can make it an index. So i would fall in else clause. Despite TO DO that it might be now what I want, it is exactly what I want. Except only if I am using nano second timestamps ;(. I noticed lack of precision parameter passed to to_datetime. Patching this solve the issue for me. I could push the change but it bothers me that maybe I am doing something wrong?

Expected behavior:
Data points in influx should use timestamps from time index correctly.

Actual behavior:
All dates are converted to some silly date around 1970

Specifications:

  • Client Version:1.20
  • InfluxDB Version:2.0.8
  • Platform:5.4.0-81-generic 18.04.1-Ubuntu
@bednar
Copy link
Contributor

bednar commented Sep 13, 2021

Hi @Elfoniok,

thanks for using our client.

Well ok so be it. However I am not very familiar with PeriodIndex, is that common for Time Series Data Frames? I am always using plain int for date stamp column and I can make it an index.

You can change your index via: data_frame = data_frame.set_index('a').

Can you share how your DataFrame looks like?

Regards

@bednar bednar added the question Further information is requested label Sep 13, 2021
@Elfoniok
Copy link
Author

Elfoniok commented Sep 25, 2021

Yes I know I can use set_index. Maybe I was not clear since you are quoting me but somehow missing the point.

In the link, I have pasted above, is a code branch for PeriodIndex. I don't know how it is used I am not very familiar with pandas. I am using dumb "int" like index or pandas Date. And yes I am using set_index on the chosen column.
Therefore my control flow is going through the else branch, and it seems clear it is working only if timestamps are in ns format. Which is the default for pandas to_datetime.

Anyway here is an example, and an attached data file.
1_result-data.txt

import pandas
from influxdb_client import InfluxDBClient
from influxdb_client.extras import pd, np
df = pandas.read_csv("1_result-data.txt", sep=' ', names=["secs_since_midnight", "Date", "uptime", "loadavg_1", "loadavg_5", "loadavg_15", "user", "nice", "system", "idle", "io", "IRQ/other", "cpu_avg_mhz", "cpu_degc", "total", "free", "buffers", "cached", "available", "swp-total", "swp-free", "swp-cached", "swp-compressed", "cpu2-degc" ])

df.set_index('Date', inplace=True)
pd.to_datetime(df.index, unit='s')

with InfluxDBClient(url=url, token=my_token, org=my_org, debug=True) as client:

    """
    Use batching API
    """
    with client.write_api() as write_api:
        write_api.write(bucket=bucket, record=df,
                        #data_frame_tag_columns=['user', 'system', 'idle'],
                        write_precision='s',
                        data_frame_measurement_name="Test_17")
        print()
        print("Wait to finishing ingesting DataFrame...")
        print()

print()
print(f'Import finished in:')
print()

@bednar
Copy link
Contributor

bednar commented Sep 30, 2021

In the link, I have pasted above, is a code branch for PeriodIndex. I don't know how it is used I am not very familiar with pandas. I am using dumb "int" like index or pandas Date. And yes I am using set_index on the chosen column.

You have to change your index to result of pd.to_datetime. Something like will work for you:

df.set_index('Date', inplace=True)
df.index = pandas.to_datetime(df.index, unit='s')

Therefore my control flow is going through the else branch, and it seems clear it is working only if timestamps are in ns format. Which is the default for pandas to_datetime.

Thanks for clarification, I will fix it ASAP, meanwhile you can used the above workaround ⬆️ .

@Elfoniok
Copy link
Author

Elfoniok commented Oct 1, 2021

Awesome, thanks for adding the docs as well, I really appreciate that!

@bednar bednar added enhancement New feature or request and removed question Further information is requested labels Oct 4, 2021
@bednar bednar added this to the 1.22.0 milestone Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants