-
Notifications
You must be signed in to change notification settings - Fork 15
Specific instructions to run with Kubernetes cluster
A few possible ways:
- Install CVMFS on every node and mount.
- When no access to the nodes, there is also the possibility to install CVMFS through a K8S plugin. For example:
- CVMFS-CSI: https://github.com/cvmfs-contrib/cvmfs-csi
- OSG plugin: https://github.com/sfiligoi/prp-osg-cvmfs
- ATLAS customization of the OSG plugin: https://github.com/PanDAWMS/prp-osg-cvmfs
After finished Installation and Configuration, please install kubernetes python client:
pip install kubernetes
The kubectl command line tools are not used by Harvester, but very useful for operators to interact with the cluster:
cat <<EOF > /etc/yum-puppet.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
yum install -y kubectl
Source: https://kubernetes.io/docs/tasks/tools/install-kubectl/
Using the Job
kind to prevent Pod from occupying resources after it completed.
The following list shows parameters need to be modified:
Name | Description |
---|---|
metadata.name | Job name |
spec.template.spec.containers.name | Container name |
spec.template.spec.containers.image | atlasadc/atlas-grid-centos7 |
The command
content will be executed to setup container environment and then pull pilot to run. Current job templates can be found here, depending on some site specific configuration and the installation method for CVMFS:
https://github.com/PanDAWMS/harvester_configurations/tree/master/K8S/job_templates
Before start Kubernetes plugin, the module and class name should be set in $PANDA_HOME/etc/panda/panda_queueconfig.json. Also some parameters need to be adjusted:
Name | Description |
---|---|
proxySecretPath | Path of the proxy file inside container. Can work with the k8s secret managing proxy |
x509UserProxy | Proxy file path on Harvester node to pass to container. Only works if proxySecretPath NOT set |
cpuAdjustRatio | Set ratio to adjust resource of CPU before pod creating (default is 100) |
memoryAdjustRatio | Set ratio to adjust resource of memory before pod creating (default is 100) |
k8s_yaml_file | YAML file path which creates a kubernetes job |
k8s_config_file | Configuration file path for Kubernetes client authentication |
k8s_namespace | If you want to distinguish multiple teams or projects on cluster. See:https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/#when-to-use-multiple-namespaces |
For example,
"ANALY_TAIWAN_TEST": {
…...
"submitter": {
"name":"K8sSubmitter",
"module":"pandaharvester.harvestersubmitter.k8s_submitter",
"x509UserProxy": "/root/atlas-production.proxy",
"cpuAdjustRatio": 90,
"memoryAdjustRatio": 100
…...
"monitor": {
"name":"K8sMonitor",
"module":"pandaharvester.harvestermonitor.k8s_monitor"
},
"sweeper": {
"name": "K8sSweeper",
"module": "pandaharvester.harvestersweeper.k8s_sweeper"
},
"common": {
"k8s_yaml_file": "/home/harvesteruser/atlas_job.yaml",
"k8s_config_file": "/home/harvesteruser/.kube/config",
"k8s_namespace": "default"
}
},
Now Harvester has a credential manager plugin k8s_secret_cred_manager to create/update a k8s secret object.
One can thus export proxy files into containers via k8s secret, and configure harvester to use k8s_secret_cred_manager to update the proxy periodically.
One needs to create a configuration file in JSON format for k8s_secret_credmanager.
Important keys are k8s_namespace
, k8s_config_file
, proxy_files
.
Note the proxy files listed in proxy_files
must be updated periodically by harvester no_voms_credmanager or in other ways. Thus, k8s_secret_credmanager can update the newest proxy files to k8s secret.
Example of config json file of k8s_secret_credmanager (/opt/harvester_k8s/k8s_secret_cred_manager_config.json
in the example above):
{
"k8s_namespace": "",
"k8s_config_file": "/opt/harvester_k8s/kubeconf",
"proxy_files": ["/data/atlpan/atlas.prod.proxy", "/data/atlpan/atlas.pilot.proxy"]
}
In panda_harvester.cfg, one needs to add lines of k8s_secret_credmanager in credmanager
block.
Here moduleName
is pandaharvester.harvestercredmanager.k8s_secret_cred_manager
and className
is K8sSecretCredManager
.
Put the path of configuration file of k8s_secret_credmanager mentioned above at certFile
.
Many other attributes are useless for k8s_secret_credmanager.
Example of credmanager
block in panda_harvester.cfg:
[credmanager]
# module name
moduleName =
...
pandaharvester.harvestercredmanager.k8s_secret_cred_manager
# class name
className =
...
K8sSecretCredManager
# original certificate file to generate new short-lived certificate
certFile =
...
/opt/harvester_k8s/k8s_secret_cred_manager_config.json
# the name of short-lived certificate
outCertFile =
...
useless_string
# voms
voms =
...
useless_string
# sleep interval in sec
sleepTime = 1800
In queue configuration submitter
block, one needs to add the line of proxySecretPath
. Note the value of proxySecretPath
must be the proxy file path inside the container, basically corresponding to the mountPath
setup in yaml and proxy_files
defined in configuration json of k8s_secret_cred_manager.
Example of queue configuration json file:
"CERN-EXTENSION_K8S_HARVESTER": {
"queueStatus": "online",
"prodSourceLabel": "managed",
"nQueueLimitWorker": 100,
"maxWorkers": 1000,
"maxNewWorkersPerCycle": 30,
"runMode":"slave",
"mapType": "NoJob",
"truePilot": true,
"preparator": {
"name": "DummyPreparator",
"module": "pandaharvester.harvesterpreparator.dummy_preparator"
},
"submitter": {
"name":"K8sSubmitter",
"module":"pandaharvester.harvestersubmitter.k8s_submitter",
"proxySecretPath":"/proxy/atlas.prod.proxy",
"x509UserProxy": "/data/atlpan/x509up_u25606_production",
"cpuAdjustRatio": 100,
"memoryAdjustRatio": 100
},
"workerMaker": {
"name": "SimpleWorkerMaker",
"module": "pandaharvester.harvesterworkermaker.simple_worker_maker"
},
"messenger": {
"name": "SharedFileMessenger",
"module": "pandaharvester.harvestermessenger.shared_file_messenger",
"accessPoint": "/data/atlpan/harvester_wdirs/${harvesterID}/${_workerID_3.2}/${_workerID_1.0}/${workerID}"
},
"stager": {
"name": "DummyStager",
"module": "pandaharvester.harvesterstager.dummy_stager"
},
"monitor": {
"name":"K8sMonitor",
"module":"pandaharvester.harvestermonitor.k8s_monitor"
},
"sweeper": {
"name": "K8sSweeper",
"module": "pandaharvester.harvestersweeper.k8s_sweeper"
},
"common": {
"k8s_yaml_file": "/opt/harvester_k8s/k8s_atlas_job_prod_secret.yaml",
"k8s_config_file": "/opt/harvester_k8s/kubeconf",
"k8s_namespace": ""
}
}
K8s default scheduling spreads the pods across the nodes with a Round Robin algorithm. This can cause single core pods to spread across all nodes and preventing multi core pods to be scheduled. You can define a custom scheduling policy. Here is the example that worked for us:
- On the master node define the policy file, including priority stratagy "{"name" : "MostRequestedPriority", "weight" : 1}" at /etc/kubernetes/scheduler-policy.json
{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
{"name" : "GeneralPredicates"},
{"name" : "MatchInterPodAffinity"},
{"name" : "NoDiskConflict"},
{"name" : "NoVolumeZoneConflict"},
{"name" : "PodToleratesNodeTaints"}
],
"priorities" : [
{"name" : "MostRequestedPriority", "weight" : 1},
{"name" : "InterPodAffinityPriority", "weight" : 2}
]
}
- In /etc/kubernetes/scheduler refer to the policy config file in KUBE_SCHEDULER_ARGS:
KUBE_SCHEDULER_ARGS="--leader-elect=true --policy-config-file /etc/kubernetes/scheduler-policy.json"
- Then restart scheduler to make the changes take effect:
$ systemctl restart kube-scheduler.service
It seems that more recent k8s clusters, including those built with kubespray, deploy schedulers as pods rather than systemctl services. Here are instructions for deploying a pod to run a custom node-packing scheduler, and using the custom scheduler to schedule production jobs.
You'll need to make the kube-scheduler binary for your k8s version. You can check your k8s version as follows:
kubectl version
I get the following output, indicating that my k8s version is 1.14.3:
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", [...]
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", [...]
In order to build the k8s code, you'll need to have the latest version of go ang gcc. Download these if needed:
# Install gcc
yum -y install gcc
# Install go
wget https://dl.google.com/go/go1.12.7.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.12.7.linux-amd64.tar.gz
# Add /usr/local/go/bin to the PATH environment variable
export PATH=$PATH:/usr/local/go/bin
To clone and build the k8s code, run:
git clone https://github.com/kubernetes/kubernetes.git
cd kubernetes
# If needed, check out the version of k8s that you're using
git checkout release-[your version (eg. 1.13)]
make
Copy the kube-scheduler binary to the main repo level, then you can remove the (rather large...) kubernetes directory:
cd ..
cp kubernetes/_output/local/bin/linux/amd64/kube-scheduler .
rm -rf kubernetes
First, create a file named scheduler-policy.json (touch scheduler-policy.json
), and fill it with the custom node-packing scheduler policy:
# scheduler-policy.json
{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
{"name" : "GeneralPredicates"},
{"name" : "MatchInterPodAffinity"},
{"name" : "NoDiskConflict"},
{"name" : "NoVolumeZoneConflict"},
{"name" : "PodToleratesNodeTaints"}
],
"priorities" : [
{"name" : "MostRequestedPriority", "weight" : 1},
{"name" : "InterPodAffinityPriority", "weight" : 2}
]
}
Create a Dockerfile (touch Dockerfile
), with the following content:
# Dockerfile
FROM busybox
ADD kube-scheduler /usr/local/bin/kube-scheduler
ADD ./scheduler-policy.json /etc/kubernetes/scheduler-policy.json
Lastly, build the custom kube-scheduler image, and push it to docker hub (assuming you have a docker hub account)
docker login -u [your-docker-username]
docker build -t [your-docker-username]/node-packing-scheduler . # Name it whatever you want, just be sure to include your docker username at the beginning
docker push [your-docker-username]/node-packing-scheduler
Create a file named node-packing-scheduler.yaml
with the following content:
# node-packing-scheduler.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-packing-scheduler
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: node-packing-scheduler-as-kube-scheduler
subjects:
- kind: ServiceAccount
name: node-packing-scheduler
namespace: kube-system
roleRef:
kind: ClusterRole
name: system:kube-scheduler
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
component: scheduler
tier: control-plane
name: node-packing-scheduler
namespace: kube-system
spec:
selector:
matchLabels:
component: scheduler
tier: control-plane
replicas: 1
template:
metadata:
labels:
component: scheduler
tier: control-plane
version: second
spec:
serviceAccountName: node-packing-scheduler
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"
containers:
- command:
- /usr/local/bin/kube-scheduler
- --address=0.0.0.0
- --leader-elect=false
#- --lock-object-namespace=lock-object-namespace
#- --lock-object-name=lock-object-name
- --scheduler-name=node-packing-scheduler
- --policy-config-file=/etc/kubernetes/scheduler-policy.json
image: [your-docker-username]/node-packing-scheduler
livenessProbe:
httpGet:
path: /healthz
port: 10251
initialDelaySeconds: 15
name: kube-second-scheduler
readinessProbe:
httpGet:
path: /healthz
port: 10251
resources:
requests:
cpu: '0.1'
securityContext:
privileged: false
volumeMounts: []
hostNetwork: false
hostPID: false
volumes: []
Now, create the custom kube-scheduler pod:
kubectl create -f node-packing-scheduler.yaml
Check that the scheduler pod is running:
kubectl get pods --namespace=kube-system
You should see something like:
NAME READY STATUS RESTARTS AGE
node-packing-scheduler-7df5697487-55n27 0/1 Running 0 5s
Edit the system: kube-scheduler
cluster role:
kubectl edit clusterrole system:kube-scheduler
Add - node-packing-scheduler
under kube-scheduler
in resourceNames
, and copy the following to the bottom of the file:
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
verbs:
- watch
- list
- get
The scheduler can now be tested by creating a pod that gets scheduled by the custom scheduler by creating the scheduler_test_pod.yaml file (see line 8 in scheduler_test_pod.yaml for the syntax to specify that jobs should use the custom scheduler):
kubectl create -f scheduler_test_pod.yaml
After 30s or so, you should see it scheduled and running:
kubectl get pods
should include something like:
NAME READY STATUS RESTARTS AGE
...
annotation-second-scheduler 1/1 Running 0 30s
...
Authored by FaHui Lin, MingJyuan Yang
Getting started |
---|
Installation and configuration |
Testing and running |
Debugging |
Work with Middleware |
Admin FAQ |
Development guides |
---|
Development workflow |
Tagging |
Production & commissioning |
---|
Scale up submission |
Condor experiences |
Commissioning on the grid |
Production servers |
Service monitoring |
Auto Queue Configuration with CRIC |
SSH+RPC middleware setup |
Kubernetes section |
---|
Kubernetes setup |
X509 credentials |
AWS setup |
GKE setup |
CERN setup |
CVMFS installation |
Generic service accounts |
Advanced payloads |
---|
Horovod integration |