An environment for core simulation based on Docker Swarm
- Ubuntu 16.04
- Python 3
- Docker >=17.12 (with experimental feature)
- CRIU
- Flask
- ZeroMQ
- Swagger
- MongoDB
- NFS
# configure SC/FE/JM
./dependences.sh
# configure GM
./dependences.sh GM
# configure Worker
./dependences.sh Worker
# configure DB
# example: ./dependences.sh DB myusr mypwd mydb
./dependences.sh DB $usr $pwd $db $address
-
Actor:Main components in the skeleton
- FE: FrontEnd
- JM: JobManager
- GM: GlobalManager
- DC: Discovery
-
Job: A list of containers with same network
- Job name format: 'job'Submit_time-session_id
example: job1533585500968-1533585495
- Job name format: 'job'Submit_time-session_id
-
Task: Container in Docker
- Task name format: jobname_task
example: job1533585500968-1533585495_task1
- Task name format: jobname_task
-
Session:
- ManagementEngine launch the system once
-
Job Definition(sample):
{ "job_name": "job1533585500968-1533585495", "job_info": { "network": { "name": "RESTfulSwarmNetwork", "driver": "overlay", "subnet": "129.59.0.0/16" }, "tasks": { "job1533585500968-1533585495_task1": { "container_name": "job1533585500968-1533585495_task1", "node": "", "image": "image_name", "detach": true, "command": "", "req_cores": 2, "cpuset_cpus": "", "mem_limit": "10m", "ports": {"3000/tcp": 3000}, "volumes": {}, "environment": {}, "status": "Ready" } } }, "status": "Ready", "start_time": 0, "end_time": 0 }
-
Define arguments for actors(required arguments for running actor script):
{ "FE": { "address": "129.0.0.1" }, "JM": { "address": "129.0.0.2", "wait_time": 0.1, "scheduling_strategy": { "best-fit": 1, "first-fit": 0, "first-fit-decreasing": 0, "best-fit-decreasing": 0, "no-scheduler": 0 } }, "GM": { "address": "129.0.0.3" }, "DC": { "address": "129.0.0.4" } }
-
Start System in Experimental mode (FE, JM, GM and DC are in same node):
- Start ManagementEngine
python3 Management.py
- Start worker
# frequency: time interval of sending containers status to DC python3 Worker.py $frequency
- Start ManagementEngine
- Role: Generate test data and feed data into the system
- SC types:
- Steady Stress Client
- F(t) = a, a is steady constant
- Bursty Stress Client
- Exponential distribution, with configurable constant lambda
- Incremental Stress Client
- F(t) = at + b, a and b are constant
- Random Stress Client
- F(t) = a, a is random constant
- Steady Stress Client
- Role: Receive data from Stress Client and initialize Job collection in MongoDB
- Swagger Interface
-
Role:
- Send RESTful call to GM
- Apply scheduling strategy on requested & free resources to get scheduling and waiting decision
- Maintain a job queue to buffer waiting jobs
- Update MongoDB when job info changed
-
Provided Functions:
- Deploy job (POST)
- Deploy single task (POST)
- Migrate task (POST)
- Migrate job (POST)
- Update task (POST)
- Checkpoint task (POST)
- Remove worker in Swarm mode (GET)
- Describe Worker nodes (GET)
- Describe GM (GET)
-
Scheduling Strategies
- Bin Packing
- Best Fit
- First Fit
- Best Fit Decreasing
- First Fit Decreasing
- Node Scheduling
- Client specify worker node and cpuset_cpus information
- Priority Scheduling (in-progress)
- Bin Packing
-
Job state chart
-
Role:
- Notify worker nodes to conduct commands passed from JM
- Initialize Swarm environment and play the role of master node
- Update MongoDB
- Periodically prune unused resources in swarm cluster
- NFS master
-
Swagger Interface
- Role:
- Worker node in Swarm mode
- Deploy tasks
- Periodically collect finished tasks info and notify Discovery to update task status
- NFS client
- Role:
- Received notification from Workers to update Job and tasks status
- Role:
- Launch SC, FE, JM, GM, DIS, ME as separate processes in one VM/PM
- Launch all remote workers using SSH
- Terminate processes running in either local machine or remote worker machines
Note: StressClient, Discovery, FrontEnd and JobManager have been dockerrized.