id column in the result of make_forecasting_frame have only (id, ?) as identifier #1077

heib6xinyu · 2024-06-07T16:11:03Z

The problem: make_forecasting_frame function provide frame and y with id column in the format of (id,1), (id,2) instead of like the documentation described (For identifying every subsequence, tsfresh uses the time stamp of the point that will be predicted together with the old identifier as “id”.)

# Step 1: Create Dummy Data
np.random.seed(42)  # For reproducibility

# Create a date range
date_range = pd.date_range(start='2023-01-01', periods=100, freq='D')

# Create dummy product IDs
product_ids = ['P001', 'P002', 'P003']

# Generate dummy data
data = []
for product_id in product_ids:
    for date in date_range:
        data.append([product_id, date, np.random.uniform(10, 100)])

# Create a DataFrame
df = pd.DataFrame(data, columns=['id', 'timestamp', 'price'])

# Create the forecasting frame
df_forecasting, y = make_forecasting_frame(df['price'], kind="price", max_timeshift=5, rolling_direction=1)

# Display the first few rows of the forecasting frame
print("\nForecasting Frame:")
print(df_forecasting.head())
print("\nTarget Variable (y):")
print(y.head())

You should be able to recreate my problem using the above code

Rolling: 100%|██████████| 300/300 [00:02<00:00, 132.49it/s]

Forecasting Frame:
        id  time      value   kind
1  (id, 1)     0  43.708611  price
3  (id, 2)     0  43.708611  price
4  (id, 2)     1  95.564288  price
6  (id, 3)     0  43.708611  price
7  (id, 3)     1  95.564288  price

Target Variable (y):
(id, 1)    95.564288
(id, 2)    75.879455
(id, 3)    63.879264
(id, 4)    24.041678
(id, 5)    24.039507
Name: value, dtype: float64

As you can see, there are only (id, time), I have no way to see which product the data actually belongs to.

Anything else we need to know?: I would hope the result looks like (p001,1),...,(p003, 4), but I don't know how to.

Environment:

Python version: python 3.10
Operating System: windows 10
tsfresh version:0.20.1
Install method (conda, pip, source):pip install tsfresh

The text was updated successfully, but these errors were encountered:

heib6xinyu · 2024-06-08T09:57:36Z

I look through the source code, there is some hard coding part in it. I copy the function to my local computer and modify it to receive a id field, to replace the 'id' hard coding part, it works for me now.

nils-braun · 2024-06-08T09:59:49Z

Hi @heib6xinyu
yes, your observation is correct. The input to the make_forecasting_frame function is a single time series (also technically, it is a pandas series, not a dataframe), so it is not meant to be used for multiple time series (e.g. the input data can not even have a id column, because it is not a dataframe).
The sentence you quote from the docs ("For identifying every subsequence, tsfresh uses the time stamp of the point that will be predicted together with the old identifier as “id”.") is actually referring to the more general (and more powerful) roll_time_series function (sorry if this is not clear, happy for any PR to fix this!).

As you already looked into the code, you have probably seen that the make_forecasting_frame function is just forwarding to the roll_time_series function and I would also recommend using this for anything more "complex". The make_forecasting_frame function is really just a convenience function for one single use-case :)

heib6xinyu · 2024-06-08T10:05:16Z

Hi @heib6xinyu yes, your observation is correct. The input to the make_forecasting_frame function is a single time series (also technically, it is a pandas series, not a dataframe), so it is not meant to be used for multiple time series (e.g. the input data can not even have a id column, because it is not a dataframe). The sentence you quote from the docs ("For identifying every subsequence, tsfresh uses the time stamp of the point that will be predicted together with the old identifier as “id”.") is actually referring to the more general (and more powerful) roll_time_series function (sorry if this is not clear, happy for any PR to fix this!).

As you already looked into the code, you have probably seen that the make_forecasting_frame function is just forwarding to the roll_time_series function and I would also recommend using this for anything more "complex". The make_forecasting_frame function is really just a convenience function for one single use-case :)

Yes roll time series is definitely more powerful, I just like the make forecasting frame for it automatically match the y, for some specific use cases (like my project with multiple time series and features), I just modify the make forecasting frame locally. So I can call it on a loop group by Id and feature, then concat the result. Definitely less efficient but easier for my lazy self lol.

heib6xinyu · 2024-06-08T10:05:58Z

Oh oops I accidentally hit the open with my tiny cell phone screen, don't mind me. Sorry for the trouble

heib6xinyu added the bug label Jun 7, 2024

heib6xinyu changed the title ~~make_forecasting_frame frame and y shape not align.~~ Sorry this is a wrong issue Jun 7, 2024

heib6xinyu changed the title ~~Sorry this is a wrong issue~~ id column in the result of make_forecasting_frame Jun 7, 2024

heib6xinyu changed the title ~~id column in the result of make_forecasting_frame~~ id column in the result of make_forecasting_frame have only (id, ?) as identifier Jun 7, 2024

heib6xinyu closed this as completed Jun 8, 2024

heib6xinyu reopened this Jun 8, 2024

heib6xinyu closed this as completed Jun 8, 2024

heib6xinyu mentioned this issue Jun 14, 2024

Added "id" parameter for make_forecaseting_frame so that it will not … #1080

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

id column in the result of make_forecasting_frame have only (id, ?) as identifier #1077

id column in the result of make_forecasting_frame have only (id, ?) as identifier #1077

heib6xinyu commented Jun 7, 2024 •

edited

Loading

heib6xinyu commented Jun 8, 2024

nils-braun commented Jun 8, 2024

heib6xinyu commented Jun 8, 2024

heib6xinyu commented Jun 8, 2024

id column in the result of make_forecasting_frame have only (id, ?) as identifier #1077

id column in the result of make_forecasting_frame have only (id, ?) as identifier #1077

Comments

heib6xinyu commented Jun 7, 2024 • edited Loading

heib6xinyu commented Jun 8, 2024

nils-braun commented Jun 8, 2024

heib6xinyu commented Jun 8, 2024

heib6xinyu commented Jun 8, 2024

heib6xinyu commented Jun 7, 2024 •

edited

Loading