Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML-Plan: Added prediction runtime for test predictions. #238

Merged
merged 4 commits into from
May 3, 2021

Conversation

mwever
Copy link
Contributor

@mwever mwever commented Jan 5, 2021

  • Added feature for returning the time taken for making the predictions with the finally chosen candidate.

Note: ML-Plan takes the time for fitting the finally chosen solution candidate into account and tries to meet the timeout including fitting the solution. The prediction time for the test data, however, is not anticipated.

Furthermore, we just released a new version (0.2.4) with several bug fixes and a few enhancements. I did not want to mess up the frameworks.yaml files so I did not touch them. As far as I can see, except for the Q2.2020 file, the version of ML-Plan always refers to the "latest" version. If this is also the version which is benchmarked everything should be fine.

@mwever mwever changed the title Added prediction runtime for test predictions. ML-Plan: Added prediction runtime for test predictions. Jan 6, 2021
Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated #235 to fix ML-Plan to 0.2.4 for 2021Q1.

@@ -85,7 +85,8 @@ def run(dataset, config):

predictions = stats["predictions"]
truth = stats["truth"]
numEvals = stats["num_evaluations"]
num_evals = stats["num_evaluations"]
predict_time = stats["final_candidate_predict_time_ms"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use a safe way to add this information so it does not break older versions of ML-Plan?

@@ -102,8 +103,9 @@ def run(dataset, config):
probabilities=probabilities,
probabilities_labels=probabilities_labels,
target_is_encoded=is_classification,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the target is not (no longer?) encoded. Running python runbenchmark.py MLPlan openml/t/59 -f 0 fails due to predictions being a list with string labels (e.g. Iris-Setosa) while target_is_encoded is set to True.

@PGijsbers
Copy link
Collaborator

@mwever do you have time to make the suggested changes? It shouldn't take long. If not then I can probably do it, but it's a bit messy since I can't push to your fork.

@PGijsbers PGijsbers added the framework For issues with frameworks in the current benchmark label Mar 15, 2021
@mwever
Copy link
Contributor Author

mwever commented Mar 19, 2021

Hey @PGijsbers!
Sorry for the late response. I just fixed the issue myself and pushed it to my fork, which now appears here as an update.
I did my best to ensure backwards compatibility and tested with both versions 0.2.3 and 0.2.4 and for me it both worked in the test mode as well as the iris test run you mentioned in your comment.

@PGijsbers
Copy link
Collaborator

Thanks a lot! From a glance it looks good 👍 I'll verify (and hopefully merge) on Monday

@PGijsbers
Copy link
Collaborator

On my machine ML-Plan fails on the APSFailure dataset, tested with python runbenchmark.py mlplan validation -m docker (but I recommend you add -t APSFailure). The target of the arff file is @ATTRIBUTE class {pos, neg} but ML-Plan seems to fail with Exception in thread "main" ai.libs.mlplan.cli.module.UnsupportedModuleConfigurationException: ML-Plan for classification requires a categorical target attribute..

Is this reproducible for you? Could you see if you can fix this?

Full trace:

-------------------------------------------------------
Starting job local.validation.test.APSFailure.0.MLPlan.
Assigning 4 cores (total=12) for new task APSFailure.
Assigning 20030 MB (total=25625 MB) for new APSFailure task.
Running task APSFailure on framework MLPlan with config:
TaskConfig(framework='MLPlan', framework_params={}, framework_version='stable', type='classification', name='APSFailure', fold=0, metric='auc', metrics=['auc', 'logloss', 'acc', 'balacc'], seed=277658768, job_timeout_seconds=120, max_runtime_seconds=60, cores=4, max_mem_size_mb=20030, min_vol_size_mb=-1, input_dir='/input', output_dir='/output/mlplan.validation.test.docker.20210322T083948', output_predictions_file='/output/mlplan.validation.test.docker.20210322T083948/predictions/APSFailure/0/predictions.csv', ext={}, output_metadata_file='/output/mlplan.validation.test.docker.20210322T083948/predictions/APSFailure/0/metadata.json')
Running cmd `/bench/frameworks/MLPlan/venv/bin/python -W ignore /bench/frameworks/MLPlan/exec.py`
{ 'target': {'index': 0},
  'test': {'path': '/input/org/openml/www/datasets/41138/dataset_test_0.arff'},
  'train': { 'path': '/input/org/openml/www/datasets/41138/dataset_train_0.arff'}}
INFO:__main__:
**** ML-Plan [v0.2.4] ****

INFO:__main__:Running ML-Plan with backend weka in mode weka and a maximum time of 60s on 4 cores with 20030MB for the JVM, optimizing AUC.
INFO:__main__:Environment: environ({'AMLB_PATH': '/bench/amlb', 'PATH': '/bench/frameworks/MLPlan/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'PWD': '/bench', 'PYTHONPATH': '/bench', 'LC_CTYPE': 'C.UTF-8'})
INFO:amlb.utils.process:Running cmd `java -jar -Xmx19006M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_0.arff" -p "/input/org/openml/www/datasets/41138/dataset_test_0.arff" -t 60 -ncpus 4 -l AUC -m weka -s 277658768 -ooab /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/predictions.csv -os /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/statistics.json -tmp /tmp/tmp60eyxfbl`
Called ML-Plan CLI with the following params: >[-f, /input/org/openml/www/datasets/41138/dataset_train_0.arff, -p, /input/org/openml/www/datasets/41138/dataset_test_0.arff, -t, 60, -ncpus, 4, -l, AUC, -m, weka, -s, 277658768, -ooab, /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/predictions.csv, -os, /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/statistics.json, -tmp, /tmp/tmp60eyxfbl]<
Exception in thread "main" ai.libs.mlplan.cli.module.UnsupportedModuleConfigurationException: ML-Plan for classification requires a categorical target attribute.
        at ai.libs.mlplan.cli.module.slc.AMLPlan4ClassificationCLIModule.getLabelAttribute(AMLPlan4ClassificationCLIModule.java:86)
        at ai.libs.mlplan.cli.module.slc.MLPlan4WekaClassificationCLIModule.getMLPlanBuilderForSetting(MLPlan4WekaClassificationCLIModule.java:26)
        at ai.libs.mlplan.cli.module.slc.MLPlan4WekaClassificationCLIModule.getMLPlanBuilderForSetting(MLPlan4WekaClassificationCLIModule.java:14)
        at ai.libs.mlplan.cli.MLPlanCLI.runMLPlan(MLPlanCLI.java:337)
        at ai.libs.mlplan.cli.MLPlanCLI.main(MLPlanCLI.java:447)



ERROR:amlb.utils.process:Exception in thread "main" ai.libs.mlplan.cli.module.UnsupportedModuleConfigurationException: ML-Plan for classification requires a categorical target attribute.
        at ai.libs.mlplan.cli.module.slc.AMLPlan4ClassificationCLIModule.getLabelAttribute(AMLPlan4ClassificationCLIModule.java:86)
        at ai.libs.mlplan.cli.module.slc.MLPlan4WekaClassificationCLIModule.getMLPlanBuilderForSetting(MLPlan4WekaClassificationCLIModule.java:26)
        at ai.libs.mlplan.cli.module.slc.MLPlan4WekaClassificationCLIModule.getMLPlanBuilderForSetting(MLPlan4WekaClassificationCLIModule.java:14)
        at ai.libs.mlplan.cli.MLPlanCLI.runMLPlan(MLPlanCLI.java:337)
        at ai.libs.mlplan.cli.MLPlanCLI.main(MLPlanCLI.java:447)

ERROR:frameworks.shared.callee:Command 'java -jar -Xmx19006M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_0.arff" -p "/input/org/openml/www/datasets/41138/dataset_test_0.arff" -t 60 -ncpus 4 -l AUC -m weka -s 277658768 -ooab /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/predictions.csv -os /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/statistics.json -tmp /tmp/tmp60eyxfbl' returned non-zero exit status 1.
Traceback (most recent call last):

  File "/bench/frameworks/shared/callee.py", line 95, in call_run

    result = run_fn(ds, config)

  File "/bench/frameworks/MLPlan/exec.py", line 81, in run

    utils.run_cmd(cmd, _live_output_=True)

  File "/bench/amlb/utils/process.py", line 220, in run_cmd

    raise e

  File "/bench/amlb/utils/process.py", line 207, in run_cmd

    preexec_fn=params.preexec_fn)

  File "/bench/amlb/utils/process.py", line 75, in run_subprocess

    raise subprocess.CalledProcessError(retcode, process.args, output=stdout, stderr=stderr)

subprocess.CalledProcessError: Command 'java -jar -Xmx19006M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_0.arff" -p "/input/org/openml/www/datasets/41138/dataset_test_0.arff" -t 60 -ncpus 4 -l AUC -m weka -s 277658768 -ooab /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/predictions.csv -os /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/statistics.json -tmp /tmp/tmp60eyxfbl' returned non-zero exit status 1.




Command 'java -jar -Xmx19006M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_0.arff" -p "/input/org/openml/www/datasets/41138/dataset_test_0.arff" -t 60 -ncpus 4 -l AUC -m weka -s 277658768 -ooab /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/predictions.csv -os /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/statistics.json -tmp /tmp/tmp60eyxfbl' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/bench/amlb/benchmark.py", line 454, in run
    meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
  File "/bench/frameworks/MLPlan/__init__.py", line 30, in run
    input_data=data, dataset=dataset, config=config)
  File "/bench/frameworks/shared/caller.py", line 78, in run_in_venv
    raise NoResultError(res.error_message)
amlb.results.NoResultError: Command 'java -jar -Xmx19006M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_0.arff" -p "/input/org/openml/www/datasets/41138/dataset_test_0.arff" -t 60 -ncpus 4 -l AUC -m weka -s 277658768 -ooab /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/predictions.csv -os /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/0/statistics.json -tmp /tmp/tmp60eyxfbl' returned non-zero exit status 1.
Loading metadata from `/output/mlplan.validation.test.docker.20210322T083948/predictions/APSFailure/0/metadata.json`.
auc (nan, None)
logloss (nan, None)
acc (nan, None)
balacc (nan, None)
Metric scores: { 'acc': nan,
  'app_version': 'dev [NA, NA, NA]',
  'auc': nan,
  'balacc': nan,
  'constraint': 'test',
  'duration': nan,
  'fold': 0,
  'framework': 'MLPlan',
  'id': 'openml.org/t/168868',
  'info': "NoResultError: Command 'java -jar -Xmx19006M "
          '/bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f '
          '"/input/org/openml/www/datasets/41138/dataset_train_0.arff" -p '
          '"/input/org/openml/www/datase...',
  'logloss': nan,
  'metric': 'auc',
  'mode': 'docker',
  'models_count': nan,
  'params': '',
  'predict_duration': nan,
  'result': nan,
  'seed': 277658768,
  'task': 'APSFailure',
  'training_duration': nan,
  'utc': '2021-03-22T08:52:09',
  'version': '0.2.4'}
Job local.validation.test.APSFailure.0.MLPlan executed in 5.966 seconds.

-------------------------------------------------------
Starting job local.validation.test.APSFailure.1.MLPlan.
Assigning 4 cores (total=12) for new task APSFailure.
Assigning 22072 MB (total=25625 MB) for new APSFailure task.
Running task APSFailure on framework MLPlan with config:
TaskConfig(framework='MLPlan', framework_params={}, framework_version='stable', type='classification', name='APSFailure', fold=1, metric='auc', metrics=['auc', 'logloss', 'acc', 'balacc'], seed=277658769, job_timeout_seconds=120, max_runtime_seconds=60, cores=4, max_mem_size_mb=22072, min_vol_size_mb=-1, input_dir='/input', output_dir='/output/mlplan.validation.test.docker.20210322T083948', output_predictions_file='/output/mlplan.validation.test.docker.20210322T083948/predictions/APSFailure/1/predictions.csv', ext={}, output_metadata_file='/output/mlplan.validation.test.docker.20210322T083948/predictions/APSFailure/1/metadata.json')
Running cmd `/bench/frameworks/MLPlan/venv/bin/python -W ignore /bench/frameworks/MLPlan/exec.py`
{ 'target': {'index': 0},
  'test': {'path': '/input/org/openml/www/datasets/41138/dataset_test_1.arff'},
  'train': { 'path': '/input/org/openml/www/datasets/41138/dataset_train_1.arff'}}
INFO:__main__:
**** ML-Plan [v0.2.4] ****

INFO:__main__:Running ML-Plan with backend weka in mode weka and a maximum time of 60s on 4 cores with 22072MB for the JVM, optimizing AUC.
INFO:__main__:Environment: environ({'AMLB_PATH': '/bench/amlb', 'PATH': '/bench/frameworks/MLPlan/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'PWD': '/bench', 'PYTHONPATH': '/bench', 'LC_CTYPE': 'C.UTF-8'})
INFO:amlb.utils.process:Running cmd `java -jar -Xmx21048M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_1.arff" -p "/input/org/openml/www/datasets/41138/dataset_test_1.arff" -t 60 -ncpus 4 -l AUC -m weka -s 277658769 -ooab /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/predictions.csv -os /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/statistics.json -tmp /tmp/tmpfv528ae8`
Called ML-Plan CLI with the following params: >[-f, /input/org/openml/www/datasets/41138/dataset_train_1.arff, -p, /input/org/openml/www/datasets/41138/dataset_test_1.arff, -t, 60, -ncpus, 4, -l, AUC, -m, weka, -s, 277658769, -ooab, /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/predictions.csv, -os, /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/statistics.json, -tmp, /tmp/tmpfv528ae8]<
Exception in thread "main" ai.libs.mlplan.cli.module.UnsupportedModuleConfigurationException: ML-Plan for classification requires a categorical target attribute.
        at ai.libs.mlplan.cli.module.slc.AMLPlan4ClassificationCLIModule.getLabelAttribute(AMLPlan4ClassificationCLIModule.java:86)
        at ai.libs.mlplan.cli.module.slc.MLPlan4WekaClassificationCLIModule.getMLPlanBuilderForSetting(MLPlan4WekaClassificationCLIModule.java:26)
        at ai.libs.mlplan.cli.module.slc.MLPlan4WekaClassificationCLIModule.getMLPlanBuilderForSetting(MLPlan4WekaClassificationCLIModule.java:14)
        at ai.libs.mlplan.cli.MLPlanCLI.runMLPlan(MLPlanCLI.java:337)
        at ai.libs.mlplan.cli.MLPlanCLI.main(MLPlanCLI.java:447)



ERROR:amlb.utils.process:Exception in thread "main" ai.libs.mlplan.cli.module.UnsupportedModuleConfigurationException: ML-Plan for classification requires a categorical target attribute.
        at ai.libs.mlplan.cli.module.slc.AMLPlan4ClassificationCLIModule.getLabelAttribute(AMLPlan4ClassificationCLIModule.java:86)
        at ai.libs.mlplan.cli.module.slc.MLPlan4WekaClassificationCLIModule.getMLPlanBuilderForSetting(MLPlan4WekaClassificationCLIModule.java:26)
        at ai.libs.mlplan.cli.module.slc.MLPlan4WekaClassificationCLIModule.getMLPlanBuilderForSetting(MLPlan4WekaClassificationCLIModule.java:14)
        at ai.libs.mlplan.cli.MLPlanCLI.runMLPlan(MLPlanCLI.java:337)
        at ai.libs.mlplan.cli.MLPlanCLI.main(MLPlanCLI.java:447)

ERROR:frameworks.shared.callee:Command 'java -jar -Xmx21048M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_1.arff" -p "/input/org/openml/www/datasets/41138/dataset_test_1.arff" -t 60 -ncpus 4 -l AUC -m weka -s 277658769 -ooab /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/predictions.csv -os /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/statistics.json -tmp /tmp/tmpfv528ae8' returned non-zero exit status 1.
Traceback (most recent call last):

  File "/bench/frameworks/shared/callee.py", line 95, in call_run

    result = run_fn(ds, config)

  File "/bench/frameworks/MLPlan/exec.py", line 81, in run

    utils.run_cmd(cmd, _live_output_=True)

  File "/bench/amlb/utils/process.py", line 220, in run_cmd

    raise e

  File "/bench/amlb/utils/process.py", line 207, in run_cmd

    preexec_fn=params.preexec_fn)

  File "/bench/amlb/utils/process.py", line 75, in run_subprocess

    raise subprocess.CalledProcessError(retcode, process.args, output=stdout, stderr=stderr)

subprocess.CalledProcessError: Command 'java -jar -Xmx21048M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_1.arff" -p "/input/org/openml/www/datasets/41138/dataset_test_1.arff" -t 60 -ncpus 4 -l AUC -m weka -s 277658769 -ooab /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/predictions.csv -os /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/statistics.json -tmp /tmp/tmpfv528ae8' returned non-zero exit status 1.




Command 'java -jar -Xmx21048M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_1.arff" -p "/input/org/openml/www/datasets/41138/dataset_test_1.arff" -t 60 -ncpus 4 -l AUC -m weka -s 277658769 -ooab /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/predictions.csv -os /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/statistics.json -tmp /tmp/tmpfv528ae8' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/bench/amlb/benchmark.py", line 454, in run
    meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
  File "/bench/frameworks/MLPlan/__init__.py", line 30, in run
    input_data=data, dataset=dataset, config=config)
  File "/bench/frameworks/shared/caller.py", line 78, in run_in_venv
    raise NoResultError(res.error_message)
amlb.results.NoResultError: Command 'java -jar -Xmx21048M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_1.arff" -p "/input/org/openml/www/datasets/41138/dataset_test_1.arff" -t 60 -ncpus 4 -l AUC -m weka -s 277658769 -ooab /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/predictions.csv -os /output/mlplan.validation.test.docker.20210322T083948/mlplan_out/APSFailure/1/statistics.json -tmp /tmp/tmpfv528ae8' returned non-zero exit status 1.
Loading metadata from `/output/mlplan.validation.test.docker.20210322T083948/predictions/APSFailure/1/metadata.json`.
auc (nan, None)
logloss (nan, None)
acc (nan, None)
balacc (nan, None)
Metric scores: { 'acc': nan,
  'app_version': 'dev [NA, NA, NA]',
  'auc': nan,
  'balacc': nan,
  'constraint': 'test',
  'duration': nan,
  'fold': 1,
  'framework': 'MLPlan',
  'id': 'openml.org/t/168868',
  'info': "NoResultError: Command 'java -jar -Xmx21048M "
          '/bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f '
          '"/input/org/openml/www/datasets/41138/dataset_train_1.arff" -p '
          '"/input/org/openml/www/datase...',
  'logloss': nan,
  'metric': 'auc',
  'mode': 'docker',
  'models_count': nan,
  'params': '',
  'predict_duration': nan,
  'result': nan,
  'seed': 277658769,
  'task': 'APSFailure',
  'training_duration': nan,
  'utc': '2021-03-22T08:52:15',
  'version': '0.2.4'}
Job local.validation.test.APSFailure.1.MLPlan executed in 5.957 seconds.
All jobs executed in 746.772 seconds.
[1] CPU Utilization: 16.6%
[1] Memory Usage: 5.9%
[1] Disk Usage: 45.4%
Processing results for mlplan.validation.test.docker.20210322T083948
Scores saved to `/output/mlplan.validation.test.docker.20210322T083948/scores/MLPlan.benchmark_validation.csv`.
Scores saved to `/output/mlplan.validation.test.docker.20210322T083948/scores/results.csv`.
Scores saved to `/output/results.csv`.
Summing up scores for current run:
                     id                     task framework constraint fold    result   metric    mode version params       app_version                  utc  duration  training_duration  predict_duration models_count       seed                                                                                                                                                                                                      info       acc       auc    balacc   logloss
0     openml.org/t/9910              bioresponse    MLPlan       test    0  0.866023      auc  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:41:22      93.9               82.7              59.0           10  277658768                                                                                                                                                                                                      None  0.787234  0.866023  0.785226  0.490541
1     openml.org/t/9910              bioresponse    MLPlan       test    1  0.857544      auc  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:42:10      47.7               36.4              47.0            8  277658769                                                                                                                                                                                                      None  0.784000  0.857544  0.779184  0.484057
2   openml.org/t/125920            dresses-sales    MLPlan       test    0  0.597701      auc  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:43:00      50.0               49.3               1.0           33  277658768                                                                                                                                                                                                      None  0.620000  0.597701  0.633005  0.888918
3   openml.org/t/125920            dresses-sales    MLPlan       test    1  0.561576      auc  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:43:51      51.2               50.5               1.0           33  277658769                                                                                                                                                                                                      None  0.540000  0.561576  0.531199  0.857919
4     openml.org/t/2079               eucalyptus    MLPlan       test    0  0.842307  logloss  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:44:44      52.9               51.9               1.0           44  277658768                                                                                                                                                                                                      None  0.662162       NaN  0.619669  0.842307
5     openml.org/t/2079               eucalyptus    MLPlan       test    1  0.709569  logloss  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:45:32      48.2               47.5               1.0           29  277658769                                                                                                                                                                                                      None  0.743243       NaN  0.698402  0.709569
6   openml.org/t/167125  internet-advertisements    MLPlan       test    0  0.965695      auc  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:46:39      66.3               45.0              75.0           11  277658768                                                                                                                                                                                                      None  0.969512  0.965695  0.891304  0.397641
7   openml.org/t/167125  internet-advertisements    MLPlan       test    1  0.988282      auc  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:48:51     131.9              113.9              93.0           11  277658769                                                                                                                                                                                                      None  0.966463  0.988282  0.889531  0.106571
8     openml.org/t/9950               micro-mass    MLPlan       test    0  0.826521  logloss  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:49:41      50.4               48.2              24.0           21  277658768                                                                                                                                                                                                      None  0.879310       NaN  0.870833  0.826521
9     openml.org/t/9950               micro-mass    MLPlan       test    1  0.791845  logloss  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:50:27      45.8               43.3              20.0           21  277658769                                                                                                                                                                                                      None  0.929825       NaN  0.931667  0.791845
10    openml.org/t/3917                      kc1    MLPlan       test    0  0.807524      auc  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:51:16      49.6               49.0              23.0           54  277658768                                                                                                                                                                                                      None  0.786730  0.807524  0.656163  3.305440
11    openml.org/t/3917                      kc1    MLPlan       test    1  0.813809      auc  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:52:03      46.9               46.4              37.0           42  277658769                                                                                                                                                                                                      None  0.872038  0.813809  0.603788  0.337202
12  openml.org/t/168868               APSFailure    MLPlan       test    0       NaN      auc  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:52:09       6.0                NaN               NaN               277658768  NoResultError: Command 'java -jar -Xmx19006M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_0.arff" -p "/input/org/openml/www/datase...       NaN       NaN       NaN       NaN
13  openml.org/t/168868               APSFailure    MLPlan       test    1       NaN      auc  docker   0.2.4         dev [NA, NA, NA]  2021-03-22T08:52:15       6.0                NaN               NaN               277658769  NoResultError: Command 'java -jar -Xmx21048M /bench/frameworks/MLPlan/lib/mlplan/mlplan-cli-0.2.4.jar -f "/input/org/openml/www/datasets/41138/dataset_train_1.arff" -p "/input/org/openml/www/datase...       NaN       NaN       NaN       NaN
Job docker.validation.test.all_tasks.all_folds.MLPlan executed in 750.980 seconds.
All jobs executed in 750.981 seconds.

@PGijsbers
Copy link
Collaborator

I just noticed I had modified my test constraints to provide only 60 seconds (instead of 600) of runtime. The error looks unrelated, but I will re-run with 600 seconds per fold later.

@mwever
Copy link
Contributor Author

mwever commented Mar 22, 2021

Hey,
indeed. I can reproduce. According to the error message I would say that the problem is with the target column not being the last one. I still remember two lines I added in the very first version of integrating ML-Plan with the benchmark resorting the columns. Obviously, in this tasks the last column of the dataset is a numeric one, thus, leading to the exception. Moreover, the lines resorting the dataset columns are somewhat gone? I think that Auto-WEKA imposes the same assumption on the data isn't it?

@PGijsbers
Copy link
Collaborator

This is how that is solved with Auto-WEKA. Adding it to ML-Plan does seem to give some issues (because it looks like benchmark dependencies are not installed). I could look into it later.

@mwever
Copy link
Contributor Author

mwever commented Mar 22, 2021

Thank you a lot, @PGijsbers
Is this maybe because ML-Plan was already moved to the new benchmark structure?

@PGijsbers
Copy link
Collaborator

My bad, still learning 😓 the reason it works for Auto-WEKA is that it doesn't run in its own virtual environment.
For ML-Plan the columns should already be reordered (in __init__.py), but it is skipped because the framework configuration does not explicitly state that the backend is WEKA. I think the best way to resolve this is to provide a default similarly to how it's done in exec: backend = config.framework_params.get('_backend', 'weka').

@mwever
Copy link
Contributor Author

mwever commented Mar 23, 2021

Got it. So, the problem is that there is no actual MLPlan framework but you need to either use MLPlanWEKA or MLPlanSKLearn. If you choose MLPlanWEKA everything is working correctly. Probably, a default value for the backend param should be added to the supertype framework MLPlan. However, I am not sure whether this is indeed consistent with your conventions? Personally, I would argue for making MLPlan some sort of abstract so that you cannot choose it as a framework to run.

@PGijsbers
Copy link
Collaborator

The default framework definition in resources/frameworks.yaml shouldn't have any params section: this params section is intended for custom definitions, not default ones.

From the docs, but obviously MLPlanWEKA, MLPlanSKLearn and recently also autosklearn2 and frameworks which use some 'compete' mode don't adhere to this. I would probably be in favor of this, at least it keeps all the overwrites visible in one place. We just need to ensure we are strict not to allow tuning of the framework.

I am not aware of a possibility to set a definition as abstract. To prevent the configuration files (and their parsing) from becoming too complicated, I think I would favor the mlplan definition to set _backend: weka and have mlplansklearn overwrite it. Then mlplanweka can either stay (but is redundant) or be removed.

@sebhrusen your thoughts?

@sebhrusen
Copy link
Collaborator

sebhrusen commented Mar 24, 2021

Oh, I thought that the MLPlan definition was using a default backend. If it doesn't, then adding an abstract param to fail early on those definitions sounds like the most natural approach.

The default framework definition in resources/frameworks.yaml shouldn't have any params section: this params section is intended for custom definitions, not default ones.

@PGijsbers yeah, we probably need to be more explicit now about what is allowed and what is not by default in the params section.
Suggestion:
what is allowed?

  • ML engine type when the framework supports multiple ones (see for example MLPlan, or autosklearn).
  • high-level parameter when the framework can run in several modes/presets (see for example AutoGluon or MLJar).

what is not allowed?

  • hyperparameters.
  • resource parameters.
  • benchmark utility parameters (log level, log path, _save_artifacts...)

For the high-level parameter, did we agree that we allow only one additional definition with a non-default mode/preset?
Also, I think we agreed that for now, we will run only one definition of each framework for the OpenML benchmark, correct? This means that authors will have to tell us which one they want us to run if they're offering more than one definition.

@mwever
Copy link
Contributor Author

mwever commented Mar 24, 2021

Okay, sorry for the confusion! Writing an answer to @sebhrusen 's comment, I just noticed that the ckeck in __init__.py for the backend being weka is nonsense since we always read in for WEKA format first, also if we run with sklearn backend. So I changed the code to always reorder the columns if the target attribute is not at the last position.

To conclude, ML-Plan does not need any default parameter anymore in the resources file.

@PGijsbers
Copy link
Collaborator

Verified ML-Plan runs successfully on all validation tasks now.

@sebhrusen What's the policy on saving artifacts now? H2O produces a models folder with default configuration (with leaderboard files), MLPlan and mlplan_out folder (with predictions and statistics). I assumed nothing (but logs) would be saved by default. Is this intended, or should those results also only be saved if specified in _save_artifacts?

@sebhrusen
Copy link
Collaborator

sebhrusen commented Mar 30, 2021

@PGijsbers

@sebhrusen What's the policy on saving artifacts now? H2O produces a models folder with default configuration (with leaderboard files), MLPlan and mlplan_out folder (with predictions and statistics). I assumed nothing (but logs) would be saved by default. Is this intended, or should those results also only be saved if specified in _save_artifacts?

There's no strict policy, mainly a concern regarding what is uploaded to s3 / downloaded to benchmark runner by default.
Let's say that:

  • frameworks are allowed to generate "small" (a few kB) artifacts by default and store them in a dedicated folder created using output_subdir function.
  • any other extra artifact should not be created under the output_dir, but under a tmp folder instead: this is to avoid having many artifacts uploaded to s3, especially if the process was killed; anything under output_dir at the end of the process gets always uploaded!
  • it is possible for users to request additional artifacts explicitely through the _save_artifacts = ['list', 'of', 'artifacts', 'types'] param passed in the framework definition.
  • in the latter case, the framework integration code will copy (and ideally archive if many files and/or compressible files) the requested artifacts from the tmp (sub)folder to a dedicated folder created using the output_subdir already mentioned.

To zip a "filtered" directory stucture or apply a function to it,amlb exposes the zip_path and walk_apply functions, for example from AutoGluon:

from frameworks.shared.callee import utils

if 'models' in artifacts:
    shutil.rmtree(os.path.join(predictor.path, "utils"), ignore_errors=True)
    models_dir = output_subdir("models", config)
    utils.zip_path(predictor.path, os.path.join(models_dir, "models.zip"))
…

@sebhrusen
Copy link
Collaborator

sebhrusen commented Mar 30, 2021

I did not want to mess up the frameworks.yaml files so I did not touch them. As far as I can see, except for the Q2.2020 file, the version of ML-Plan always refers to the "latest" version. If this is also the version which is benchmarked everything should be fine.

@mwever the definition files with a versioned name/time should not be modified, so you did right.
We will create a new one containing fixed versions before OpenML runs, and will use it to run the benchmarks.

To conclude, ML-Plan does not need any default parameter anymore in the resources file.

Is it only since v0.2.4? By default parameter, you mean default/abstract definition?

@sebhrusen
Copy link
Collaborator

about the abstract definition property: #276

@sebhrusen
Copy link
Collaborator

sebhrusen commented Apr 14, 2021

@mwever I'm willing to merge this PR, so to try it locally, I rebased it on top of master and pushed it back to your branch after fixing the setup.sh to be able to upgrade the version locally.

However if the benchmark was working fine with your version 0.2.3, it systematically breaks with 0.2.4.
The first is just a warning (FYI), but the second a bad error.

WARN  [main] [ai.libs.jaicore.ml.core.dataset.serialization.ArffDatasetAdapter.readDataset (ai.libs.jaicore.ml.core.dataset.serialization.ArffDatasetAdapter:343)] - Invalid class index in the dataset's meta data (null): Assuming last column to be the target attribute!
Exception in thread "main" java.lang.ClassCastException: ai.libs.jaicore.ml.core.dataset.DenseInstance cannot be cast to org.api4.java.common.attributedobjects.IElementDecorator
	at org.api4.java.common.attributedobjects.IListDecorator.add(IListDecorator.java:30)
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
	at ai.libs.jaicore.ml.scikitwrapper.ScikitLearnWrapper.predict(ScikitLearnWrapper.java:286)
	at ai.libs.jaicore.ml.core.learner.ASupervisedLearner.predict(ASupervisedLearner.java:62)
	at ai.libs.jaicore.ml.core.learner.ASupervisedLearner.predict(ASupervisedLearner.java:18)
	at ai.libs.jaicore.ml.core.evaluation.evaluator.SupervisedLearnerExecutor.getReportForTrainedLearner(SupervisedLearnerExecutor.java:88)
	at ai.libs.jaicore.ml.core.evaluation.evaluator.SupervisedLearnerExecutor.execute(SupervisedLearnerExecutor.java:66)
	at ai.libs.mlplan.cli.MLPlanCLI.runMLPlan(MLPlanCLI.java:396)
	at ai.libs.mlplan.cli.MLPlanCLI.main(MLPlanCLI.java:447)

ERROR:amlb.utils.process:Exception in thread "main" java.lang.ClassCastException: ai.libs.jaicore.ml.core.dataset.DenseInstance cannot be cast to org.api4.java.common.attributedobjects.IElementDecorator
	at org.api4.java.common.attributedobjects.IListDecorator.add(IListDecorator.java:30)
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
	at ai.libs.jaicore.ml.scikitwrapper.ScikitLearnWrapper.predict(ScikitLearnWrapper.java:286)
	at ai.libs.jaicore.ml.core.learner.ASupervisedLearner.predict(ASupervisedLearner.java:62)
	at ai.libs.jaicore.ml.core.learner.ASupervisedLearner.predict(ASupervisedLearner.java:18)
	at ai.libs.jaicore.ml.core.evaluation.evaluator.SupervisedLearnerExecutor.getReportForTrainedLearner(SupervisedLearnerExecutor.java:88)
	at ai.libs.jaicore.ml.core.evaluation.evaluator.SupervisedLearnerExecutor.execute(SupervisedLearnerExecutor.java:66)
	at ai.libs.mlplan.cli.MLPlanCLI.runMLPlan(MLPlanCLI.java:396)
	at ai.libs.mlplan.cli.MLPlanCLI.main(MLPlanCLI.java:447)

I can still merge this PR and let you fix this on a separate PR if you want.

@sebhrusen sebhrusen self-requested a review April 14, 2021 15:44
Copy link
Collaborator

@sebhrusen sebhrusen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me, merging if you want me to, knowing that integration with mlplan 0.2.4 looks broken anyway, but unrelated to those changes.

@PGijsbers
Copy link
Collaborator

I am a bit confused, mlplan 0.2.4 worked for me on this PR before, and even now running the evaluation works OK. What command did you use to produce the error?

@mwever
Copy link
Contributor Author

mwever commented Apr 20, 2021

Looks to me as if there is an issue with the scikit-learn backend wrapper.
@PGijsbers: You probably only checked with the WEKA backend.

I will fix that issue asap.

@PGijsbers
Copy link
Collaborator

@mwever do you have an estimate of when you'll be able to add the fix?
@sebhrusen if the fix is too far out, I would be in favor of merging this PR and solve the scikitlearn backend in a separate PR. Wdyt?

@sebhrusen
Copy link
Collaborator

@PGijsbers, @mwever I already agreed to the merge, I'll let the decision to the PR author.

@mwever
Copy link
Contributor Author

mwever commented May 3, 2021

Meanwhile, I agree that it might be better to merge the PR already. It could be that I can fix the issue this or next week but I currently cannot give any guarantees on the time. However, since the WEKA backend is working right now and we previously agreed on only benchmarking the WEKA backend for now, I hope it is fine if we do the sklearn backend in another PR.

So, please go ahead and merge this one :-)

@sebhrusen sebhrusen merged commit 9b3ef45 into openml:master May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework For issues with frameworks in the current benchmark
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants