Skip to content

Commit

Permalink
Add support for XGBoost Operator with LightGBM example (#1603)
Browse files Browse the repository at this point in the history
* Add support for XGBoost Operator

* Specify Tag for LightGBM image
  • Loading branch information
andreyvelich authored Aug 2, 2021
1 parent 1b71a7c commit 287e868
Show file tree
Hide file tree
Showing 5 changed files with 140 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ Katib has these CRD examples in upstream:

- [Kubeflow `MPIJob`](https://www.kubeflow.org/docs/components/training/mpi/)

- [Kubeflow `XGBoostJob`](https://github.com/kubeflow/xgboost-operator)

- [Tekton `Pipeline`](https://github.com/tektoncd/pipeline)

Thus, Katib supports multiple frameworks with the help of different job kinds.
Expand Down
6 changes: 6 additions & 0 deletions examples/v1beta1/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -391,3 +391,9 @@ docker.io/inaccel/jupyter:lab
```
docker.io/kubeflow/mpi-horovod-mnist
```

- XGBoost operator LightGBM dist example, [source](https://github.com/kubeflow/xgboost-operator/tree/master/config/samples/lightgbm-dist).

```
docker.io/kubeflowkatib/xgboost-lightgbm
```
123 changes: 123 additions & 0 deletions examples/v1beta1/xgboost-lightgbm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
namespace: kubeflow
name: xgboost-lightgbm
spec:
objective:
type: maximize
goal: 0.99
objectiveMetricName: valid_1 auc
additionalMetricNames:
- valid_1 binary_logloss
- training auc
- training binary_logloss
metricsCollectorSpec:
source:
filter:
metricsFormat:
- "(\\w+\\s\\w+)\\s:\\s((-?\\d+)(\\.\\d+)?)"
algorithm:
algorithmName: random
parallelTrialCount: 2
maxTrialCount: 6
maxFailedTrialCount: 3
parameters:
- name: lr
parameterType: double
feasibleSpace:
min: "0.01"
max: "0.1"
- name: num-leaves
parameterType: int
feasibleSpace:
min: "50"
max: "60"
step: "1"
trialTemplate:
primaryPodLabels:
job-role: master
primaryContainerName: xgboostjob
successCondition: status.conditions.#(type=="Succeeded")#|#(status=="True")#
failureCondition: status.conditions.#(type=="Failed")#|#(status=="True")#
trialParameters:
- name: learningRate
description: Learning rate for the training model
reference: lr
- name: numberLeaves
description: Number of leaves for one tree
reference: num-leaves
trialSpec:
# TODO (andreyvelich): Change to kubeflow.org/v1 once all-in-one operator is finished.
apiVersion: xgboostjob.kubeflow.org/v1
kind: XGBoostJob
spec:
xgbReplicaSpecs:
Master:
replicas: 1
restartPolicy: Never
template:
spec:
containers:
- name: xgboostjob
image: docker.io/kubeflowkatib/xgboost-lightgbm:1.0
ports:
- containerPort: 9991
name: xgboostjob-port
imagePullPolicy: Always
args:
- --job_type=Train
- --metric=binary_logloss,auc
- --learning_rate=${trialParameters.learningRate}
- --num_leaves=${trialParameters.numberLeaves}
- --num_trees=100
- --boosting_type=gbdt
- --objective=binary
- --metric_freq=1
- --is_training_metric=true
- --max_bin=255
- --data=data/binary.train
- --valid_data=data/binary.test
- --tree_learner=feature
- --feature_fraction=0.8
- --bagging_freq=5
- --bagging_fraction=0.8
- --min_data_in_leaf=50
- --min_sum_hessian_in_leaf=50
- --is_enable_sparse=true
- --use_two_round_loading=false
- --is_save_binary_file=false
Worker:
replicas: 2
restartPolicy: ExitCode
template:
spec:
containers:
- name: xgboostjob
image: docker.io/kubeflowkatib/xgboost-lightgbm:1.0
ports:
- containerPort: 9991
name: xgboostjob-port
imagePullPolicy: Always
args:
- --job_type=Train
- --metric=binary_logloss,auc
- --learning_rate=${trialParameters.learningRate}
- --num_leaves=${trialParameters.numberLeaves}
- --num_trees=100
- --boosting_type=gbdt
- --objective=binary
- --metric_freq=1
- --is_training_metric=true
- --max_bin=255
- --data=data/binary.train
- --valid_data=data/binary.test
- --tree_learner=feature
- --feature_fraction=0.8
- --bagging_freq=5
- --bagging_fraction=0.8
- --min_data_in_leaf=50
- --min_sum_hessian_in_leaf=50
- --is_enable_sparse=true
- --use_two_round_loading=false
- --is_save_binary_file=false
2 changes: 2 additions & 0 deletions manifests/v1beta1/components/controller/controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ spec:
- "--trial-resources=TFJob.v1.kubeflow.org"
- "--trial-resources=PyTorchJob.v1.kubeflow.org"
- "--trial-resources=MPIJob.v1.kubeflow.org"
# TODO (andreyvelich): Change to v1.kubeflow.org once all-in-one operator is finished.
- "--trial-resources=XGBoostJob.v1.xgboostjob.kubeflow.org"
- "--trial-resources=PipelineRun.v1beta1.tekton.dev"
ports:
- containerPort: 8443
Expand Down
7 changes: 7 additions & 0 deletions manifests/v1beta1/components/controller/rbac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@ rules:
- mpijobs
verbs:
- "*"
# TODO (andreyvelich): Move to "apiGroup: kubeflow.org" once all-in-one operator is finished.
- apiGroups:
- xgboostjob.kubeflow.org
resources:
- xgboostjobs
verbs:
- "*"
- apiGroups:
- tekton.dev
resources:
Expand Down

0 comments on commit 287e868

Please sign in to comment.