Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement registration and listening of crd trainingjob #565

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ vendor
*~
*.pyc
*.idea
*.vscode
6 changes: 1 addition & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,7 @@ matrix:
- |
bash .tools/check_style.sh
RESULT=$?; if [ $RESULT -eq 0 ]; then true; else false; fi;
- ln -s $GOPATH/src/github.com/PaddlePaddle $GOPATH/src/github.com/paddlepaddle
- cd go && glide install && go get k8s.io/kubernetes || echo 1
- bash ./vendor/k8s.io/code-generator/generate-groups.sh "deepcopy,client,informer,lister" github.com/PaddlePaddle/cloud/go/pkg/client github.com/PaddlePaddle/cloud/go/pkg/apis paddlepaddle:v1alpha1
- grep "github.com/paddlepaddle/cloud" -nR pkg/client | awk -F ':' '{print $1}' | xargs sed -i 's|github.com/paddlepaddle/cloud|github.com/PaddlePaddle/cloud|g'
- bash .tools/gen_config.sh && glide install --strip-vendor && go test $(glide novendor)
- cd go && bash .tools/gen_config.sh && glide install --strip-vendor && go test $(glide novendor)
- language: python
python: 2.7
sudo: required
Expand Down
4 changes: 2 additions & 2 deletions doc/usage_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,12 +129,12 @@ def recordio_reader(filepath, parallelism, trainer_id):
paddlecloud命令集成了上传数据的功能,目前仅针对存储系统是CephFS的环境。如果希望上传,执行:

```bash
paddlecloud file src dest
paddlecloud file put src dest
```
- `src` 必须是当前目录的子目录,`../`是不允许的。
- `src` 如果以'/'结尾,则表示上传`src`目录下的文件,不会在`dest`下创建新的目录。
- `src` 如果没有以`/`结尾,则表示上传`src`目录,会在`dest`下创建一个新的目录。
- `dest` 必须包含`/pfs/{datacenter}/user/{username}`目录。
- `dest` 必须包含`/pfs/{datacenter}/home/{username}`目录。



Expand Down
7 changes: 7 additions & 0 deletions go/cmd/operator/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
FROM golang:1.9
RUN go get github.com/Masterminds/glide
COPY cloud $GOPATH/src/github.com/PaddlePaddle/cloud
WORKDIR $GOPATH/src/github.com/PaddlePaddle/cloud/go
RUN glide install --strip-vendor
RUN go build -o /usr/local/bin/operator github.com/PaddlePaddle/cloud/go/cmd/operator
CMD ["operator"]
59 changes: 59 additions & 0 deletions go/cmd/operator/operator.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
package main

import (
"flag"
"time"

"github.com/golang/glog"

apiextensionsclient "k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/tools/clientcmd"

paddleclientset "github.com/PaddlePaddle/cloud/go/pkg/client/clientset/versioned"
paddleinformers "github.com/PaddlePaddle/cloud/go/pkg/client/informers/externalversions"
paddlecontroller "github.com/PaddlePaddle/cloud/go/pkg/controller"
"github.com/PaddlePaddle/cloud/go/pkg/signals"
)

func init() {

}

func main() {
masterURL := flag.String("master", "", "Address of a kube master.")
kubeConfig := flag.String("kubeconfig", "", "Path to a kube config. Only required if out-of-cluster.")
flag.Parse()

stopCh := signals.SetupSignalHandler()

cfg, err := clientcmd.BuildConfigFromFlags(*masterURL, *kubeConfig)
if err != nil {
glog.Fatalf("Error building kubeconfig: %s", err.Error())
}

kubeClient, err := kubernetes.NewForConfig(cfg)
if err != nil {
glog.Fatalf("Error building kubernetes clientset: %s", err.Error())
}

extapiClient, err := apiextensionsclient.NewForConfig(cfg)
if err != nil {
glog.Fatalf("Error building kubernetes extension api clientset: %s", err.Error())
}

paddleClient, err := paddleclientset.NewForConfig(cfg)
if err != nil {
glog.Fatalf("Error building paddle clientset: %s", err.Error())
}

paddleInformer := paddleinformers.NewSharedInformerFactory(paddleClient, time.Second*10)

controller := paddlecontroller.New(kubeClient, extapiClient, paddleClient, paddleInformer)

go paddleInformer.Start(stopCh)

if controller.Run(2, stopCh); err != nil {
glog.Fatalf("Error running paddle trainingjob controller: %s", err.Error())
}
}
24 changes: 20 additions & 4 deletions go/glide.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion go/glide.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,6 @@ import:
- package: github.com/go-stack/stack
version: v1.6.0
- package: k8s.io/code-generator
version: kubernetes-1.8.5
version: kubernetes-1.8.6
- package: k8s.io/apiextensions-apiserver
version: kubernetes-1.8.6
15 changes: 15 additions & 0 deletions go/hack/custom-boilerplate.go.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
/*
Copyright (c) 2016 PaddlePaddle Authors All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
5 changes: 4 additions & 1 deletion go/hack/update-codegen.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,8 @@ echo ${CODEGEN_PKG}
# instead of the $GOPATH directly. For normal projects this can be dropped.
${CODEGEN_PKG}/generate-groups.sh "deepcopy,client,informer,lister" \
github.com/PaddlePaddle/cloud/go/pkg/client github.com/PaddlePaddle/cloud/go/pkg/apis \
paddlepaddle:v1alpha1
paddlepaddle:v1alpha1 \
--go-header-file ${SCRIPT_ROOT}/hack/custom-boilerplate.go.txt
# --output-base "$(dirname ${BASH_SOURCE})/../../../../.."

grep "github.com/paddlepaddle/cloud" -nR pkg/client | awk -F ':' '{print $1}' | xargs sed -i "" 's|github.com/paddlepaddle/cloud|github.com/PaddlePaddle/cloud|g'
15 changes: 9 additions & 6 deletions go/pkg/apis/paddlepaddle/v1alpha1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ import (
)

const (
CRDKind = "TraingingJob"
CRDKindPlural = "traingingjobs"
CRDKind = "TrainingJob"
CRDKindPlural = "trainingjobs"
CRDShortName = "tj"
CRDGroup = "paddlepaddle.org"
CRDVersion = "v1alpha1"
)
Expand All @@ -21,7 +22,6 @@ func CRDName() string {

// +genclient
// +genclient:noStatus
// +genclient:nonNamespaced
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// +resource:path=trainingjob

Expand All @@ -36,7 +36,10 @@ type TrainingJob struct {
// TrainingJobSpec is the spec for a TrainingJob resource
type TrainingJobSpec struct {
// General job attributes.
Image string `json:"image,omitempty"`
Image string `json:"image,omitempty"`
// If you want to use the hostnetwork instead of container network
// portmanager is necessary.
HostNetwork bool `json:"host_network,omitempty"`
Port int `json:"port,omitempty"`
PortsNum int `json:"ports_num,omitempty"`
PortsNumForSparse int `json:"ports_num_for_sparse,omitempty"`
Expand Down Expand Up @@ -125,10 +128,10 @@ type TrainingJobStatus struct {
// Reason is the reason of job phase failed
Reason string `json:"reason"`
// ScaleStatus is autoscale status of trainer jobs
// TODO(ZhengQi): this will used in autoscale mode in future.
// TODO(ZhengQi): this will be used in autoscale mode in future.
ScaleStatus TrainerJobScaleStatus `json:"scale_status"`
// ReplicaStatuses is detail status of resources
// TODO(ZhengQi): should we only considered trainer job now?
// TODO(ZhengQi): should we only consider trainer job now?
ReplicaStatuses []*TrainingResourceStatus `json:"replica_statuses"`
}

Expand Down
Loading