Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PerpetualBooster #641

Open
deadsoul44 opened this issue Sep 21, 2024 · 10 comments
Open

Add PerpetualBooster #641

deadsoul44 opened this issue Sep 21, 2024 · 10 comments

Comments

@deadsoul44
Copy link

Add PerpetualBooster as an additional algorithm.

https://github.com/perpetual-ml/perpetual

It does not need hyperparameter tuning and supports multi-output and multi-class cases.

I can create a pull request if you are willing to review and accept.

@PGijsbers
Copy link
Collaborator

I think it's interesting, but I am planning to try and add a feature soon that allows having integration scripts in separate independent repositories. I propose I'll leave another message here when I have something experimental going. Perhaps it would be interesting to try out?

@deadsoul44
Copy link
Author

It will be really helpful to benchmark our algorithm. I am waiting for it.

@PGijsbers
Copy link
Collaborator

You can always do local integration for yourself if you just want to use the benchmark with your framework. There is no need to have it included in this codebase for that.

@deadsoul44
Copy link
Author

deadsoul44 commented Oct 16, 2024

I compared PerpetualBooster against AutGluon (BQ), which is the number one framework in the benchmark, and got some promising results in local tests on small and medium tasks. I have some questions.

  • All tasks are classification tasks in small, medium, large yml files. Where are regression tasks?
  • I want to run the benchmark with only PerpetualBooster on AWS to compare the results against the rest of the frameworks. What is the default EC2 instance type? What is the correct command to run on AWS? I don't want to make a mistake due to costs.
  • Are you willing to review and merge a pull request to include PerpetualBooster in the repo and website if the results are good enough?
  • The default metrics for classification are AUC and LogLoss. But I think F1 score is a better metric because frameworks can overfit to logloss especially. Is it possible to include F1 as a default metric or as an additional metric?

P.s. I checked the repo and website before asking these. Thanks in advance.

@deadsoul44
Copy link
Author

Answering my own first two questions after reading the paper :)
https://jmlr.org/papers/volume25/22-0493/22-0493.pdf

Correct me if I am wrong.

@deadsoul44
Copy link
Author

Hello,

I am trying to run PerpetualBooster on AWS. But, I keep getting the following error:

[INFO] [amlb:19:19:50.735] Running benchmark `perpetualbooster` on `example` framework in `local` mode.
[INFO] [amlb.frameworks.definitions:19:19:50.791] Loading frameworks definitions from ['/s3bucket/user/frameworks.yaml'].
[INFO] [amlb.resources:19:19:50.794] Loading benchmark constraint definitions from ['/repo/resources/constraints.yaml'].
[INFO] [amlb.benchmarks.file:19:19:50.800] Loading benchmark definitions from /repo/resources/benchmarks/example.yaml.
[ERROR] [amlb:19:19:50.802] No module named 'frameworks.PerpetualBooster'
Traceback (most recent call last):
  File "/repo/runbenchmark.py", line 196, in <module>
    bench = bench_cls(**bench_kwargs)
  File "/repo/amlb/benchmark.py", line 115, in __init__
    self.framework_module = import_module(self.framework_def.module)
  File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'frameworks.PerpetualBooster'

I guess it was an installation error and I tried everyting to install the package on EC2 env.

frameworks.yaml file on user_dir:

# put this file in your ~/.config/automlbenchmark directory
# to override default configs
---
PerpetualBooster:
  version: 'stable'
  description: |
    A self-generalizing gradient boosting machine which doesn't need hyperparameter optimization.
  project: https://github.com/perpetual-ml/perpetual
  setup_cmd: 'pip install --no-cache-dir -U https://perpetual-whl.s3.eu-central-1.amazonaws.com/perpetual-0.6.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl'

But it didn't work.

I also tried requirements.txt in user_dir.

Any help appreciated @PGijsbers

@PGijsbers
Copy link
Collaborator

It cannot find the integration in the normal folder (frameworks/perpetualbooster). This is most likely because you are using the original automlbenchmark repo instead of your own fork which has the integration. You can specify which repository is downloaded to the EC2 instance with the project_repository field in the configuration:

project_repository: https://github.com/openml/automlbenchmark#stable # this is also the url used to clone the repository on ec2 instances

@deadsoul44
Copy link
Author

I made some progress but now I keep getting the following error in results.csv file:

ModuleNotFoundError: No module named 'perpetual'

My fork is here:
https://github.com/deadsoul44/automlbenchmark

I updated requirements files. Let me know what I am missing. Thanks in advance.

@PGijsbers
Copy link
Collaborator

From memory, hopefully it's correct:
It looks like you have set up the installation script to use a virtual environment (it's what the true is for here)
But you are not calling it from the environment (this should use run_in_venv (see e.g. autogluon).

The setup_cmd in your configuration is probably also superfluous.

@deadsoul44
Copy link
Author

Hello,

I was able to run the regression benchmark with 33 datasets. I get the following error when trying to upload the results to the website.

File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
    result = func()
             ^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 579, in code_to_exec
    exec(code, module.__dict__)
File "/mount/src/amlb-streamlit/pages/cd_diagram.py", line 61, in <module>
    mean_results = preprocess_data(mean_results)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mount/src/amlb-streamlit/core/data.py", line 55, in preprocess_data
    results = impute_results(
              ^^^^^^^^^^^^^^^
File "/mount/src/amlb-streamlit/core/data.py", line 40, in impute_results
    raise ValueError(f"{with_=} is not in `results`")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants