Skip to content

Commit

Permalink
Random trt (#67)
Browse files Browse the repository at this point in the history
* Random time segments

* random segments doc

* bump version

* doc string

* bump np

* bump np

* np

* bump python

* change workflow

* update packages

* bump

* bump click

* bump click and typer

* bump black

* black bump

* format
  • Loading branch information
lgmoneda authored Jul 7, 2022
1 parent 16e057e commit 8234d63
Show file tree
Hide file tree
Showing 7 changed files with 650 additions and 522 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.6", "3.7"]
python-version: ["3.9"]

steps:
- uses: actions/checkout@v2
Expand Down
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,41 @@ opt_param = env_wise_hyper_opt(training_data[features + [time_column]],

Don't simply use a timestamp column from the dataset, make it discrete before and guarantee there is a reasonable amount of data points in every period. Example: use year if you have 3+ years of data. Notice the choice to make it discrete becomes a modeling choice you can optimize.

### Random segments

#### Selecting randomly from multiple time columns
The user can use a list instead of a string as the `time_column` argument. The model will select randomly from it when building every estimator from the defined `n_estimators`.

```python
from time_robust_forest.models import TimeForestClassifier

features = ["x_1", "x_2"]
time_columns = ["periods", "periods_2"]
target = "y"

model = TimeForestClassifier(time_column=time_columns)

model.fit(training_data[features + time_columns], training_data[target])
predictions = model.predict_proba(test_data[features])[:, 1]
```

#### Generating random segments from a timestamp column

The user can define a maximum number of segments (`random_segments`) and the model will split the data using the time stamp information. In the following example, the model segments the data in 1, 2, 3... 10 parts. For every estimator, it picks randomly one of the ten columns representing the `time_column` and use it. In this case, the `time_column` should be the time stamp information.

```python
from time_robust_forest.models import TimeForestClassifier

features = ["x_1", "x_2"]
time_column = "time_stamp"
target = "y"

model = TimeForestClassifier(time_column=time_column, random_segments=10)

model.fit(training_data[features + [time_column]], training_data[target])
predictions = model.predict_proba(test_data[features])[:, 1]
```

## License

[![License](https://img.shields.io/github/license/lgmoneda/time-robust-forest)](https://github.com/lgmoneda/time-robust-forest/blob/main/LICENSE)
Expand Down
Loading

0 comments on commit 8234d63

Please sign in to comment.