Skip to content

Commit

Permalink
prepare release v0.10 with cooperative multi agent
Browse files Browse the repository at this point in the history
  • Loading branch information
Stefan Schneider committed Sep 18, 2020
1 parent ac7514b commit 35cdb8b
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 11 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Using deep RL for mobility management.

![example](docs/gifs/v09.gif)
![example](docs/gifs/v010.gif)

## Setup

Expand Down
Binary file added docs/gifs/v010.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 26 additions & 9 deletions docs/mdp.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,46 @@

Using the multi-agent environment with the latest common configuration.

Observations: Observation for each agent (controlling a single UE)
*Observations*: Observation for each agent (controlling a single UE)

* Achievable data rate to each BS. Processed/normlaized to `[0, 1]` by dividing with 100,
which is the data rate, where the max utility of +20 is reached
* Total current data rate of the UE over all its current connections. Also normalized to `[0,1]`.
* Currently connected BS (binary vector).
* Currently connected BS (binary vector)
* Achievable data rate to each BS. Processed/normlaized to `[0, 1]` by dividing with the max. data rate of all possible BS connections
* Total utility of the UE. Also normalized to `[0,1]`.
* Multi-agent only: Binary vector of which BS are currently idle, ie, without any UEs

Actions:
*Actions*:

* Discrete selection of either noop (0) or one of the BS.
* The latter toggles the connection status and either tries to connects or disconnect the UE to/from the BS, depending on whether it currently already is connected.
* All UEs take an action simultaneously in every time step

Reward: Immediate rewards for each time step
*Reward*: Immediate rewards for each time step

* Utility for each UE based on the current total data rate: `np.clip(10 * np.log10(curr_dr), -20, 20)`
* 0 utility for 1 dr, 20 utility (max) for 100 dr
* Configurable penalty for any concurrent connections (eg, for cost/overhead of joint transmission). Currently 0
* Central PPO: Rewards of all UEs are summed up
* Normalized to `[-1, 1]`
* Central PPO: Rewards of all UEs are summed
* Multi-agent PPO: Mix of own utility and utility of other UEs at the same BS to learn fair behavior: `alpha * own_utility + beta * avg_utility_neighbors`


## Release Details and MDP Changes

### [v0.10](https://github.com/CN-UPB/deep-rl-mobility-management/releases/tag/v0.10): Fair, cooperative multi-agent

* A big drawback of the multi-agent RL so far was that each agent/UE only saw its own observations and optimized only its own utility
* This lead to greedy behavior of all UEs connecting to every BS in range
* This againg lead to overall lower total data rate and utility as UEs were stealing resources from each other
* This release comes with a new observation space, where data rates are normalized differently and where idle BS are indicated, which can be selected without harming other BS
* The reward for multi-agent PPO is also adjusted to contain the avg. reward of other UEs that are connected to the same BS
* With this multi-agent PPO learns cooperative behavior, where each agent only connects to its strongest BS without stealing resources from other UEs (in high-load scenarios)
* In low-load scenarios, both PPO agents still learn to use available resources
* Thanks to the new observation space, PPO central now often finds the optimal solution and performs perfect handovers

Example: Cooperative multi-agent PPO after 500k training (converged after 200k)

![v0.10 example](gifs/v010.gif)


### [v0.9](https://github.com/CN-UPB/deep-rl-mobility-management/releases/tag/v0.9): Preparation for Evaluation

* New variants for observation (components, normalization, ...) and reward (utility function and penalties)
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
# TODO: update on final release
setup(
name='deepcomp',
version=0.9,
version=0.10,
description="DeepCoMP: Coordinated Multipoint Using Multi-Agent Deep Reinforcement Learning",
url=None,
packages=find_packages(),
Expand Down

0 comments on commit 35cdb8b

Please sign in to comment.