Skip to content

Dynamic multi-cell selection for cooperative multipoint (CoMP) using (multi-agent) deep reinforcement learning

License

Notifications You must be signed in to change notification settings

CN-UPB/DeepCoMP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deep-rl-mobility-management

Using deep RL for mobility management.

example

The latest version uses the RLlib library for multi-agent RL. There is also an older version using stable_baselines for single-agent RL in the stable_baselines branch (used for v0.1-v0.3). The current version does not support stable_baselines anymore.

Setup

To install everything, run

pip install -r requirements

Tested on Ubuntu 20.04 (on WSL) with Python 3.8. RLlib does not (yet) run on Windows, but it does on WSL.

It may fail installing gym[atari], which needs the following dependencies that can be installed with apt: cmake, build-essentials, zlib1g-dev.

For saving videos and gifs, you also need to install ffmpeg (not on Windows) and ImageMagick. On Ubuntu:

sudo apt install ffmpeg imagemagick

Usage

Adjust and run main.py in drl_mobile:

cd drl_mobile
python main.py

Training logs, results, videos, and trained agents are saved in the training directory.

Tensorboard

To view learning curves (and other metrics) when training an agent, use Tensorboard:

tensorboard --logdir training

Run the command in a WSL not a PyCharm terminal. Tensorboard is available at http://localhost:6006

Research

Findings

  • Binary observations: [BS available?, BS connected?] work very well
  • Replacing binary "BS available?" with achievable data rate by BS does not work at all
  • Probably, because data rate is magnitudes larger (up to 150x) than "BS connected?" --> agent becomes blind to 2nd part of obs
  • Just cutting the data rate off at some small value (eg, 3 Mbit/s) leads to much better results
  • Agent keeps trying to connect to all BS, even if out of range. --> Subtracting req. dr by UE + higher penalty (both!) solves the issue
  • Normalizing loses info about which BS has enough dr and connectivity --> does not work as well

Todos

  • Multiple UEs:
  • Improve radio model: See notes in model.md (fairness, scheduling, freq. reuse, S*c > N)
  • Generic utlitiy function: Currently, reward is a step function (pos if enough rate, neg if not). Could also be any other function of the rate, eg, logarithmic
  • Efficient caching of connection data rate:
    • Currently always recalculate the data rate per connection per UE, eg, when calculating reward or checking whether we can connect
    • Safe & easy, but probably slow for many UEs/BSs. Let's see
    • Instead, write the dr per connection into a dict (conn --> curr dr); then derive total curr connection etc from that in O(1)
    • Needs to be updated whenever the UE moves or any UE changes its connections (this or another UE)
    • Eg, 1st move all UEs, 2nd check & update connections of all UEs, 3rd calculate reward etc

Development

Multi-Agent RL with rllib

Notes on RLlib

Training

  • agent.train() runs one training iteration. Calling it in a loop, continues training for multiple iterations.
  • The number of environment steps (not episodes) per iteration is set in config['train_batch_size']
  • config['sgd_minibatch_size'] sets how many steps/experiences are used per training epoch
  • config['train_batch_size'] >= config['sgd_minibatch_size']
  • I still don't quite get the details. Sometimes, config['sgd_minibatch_size'] is ignored and RLlib just trains longer.
  • In the results of each training iteration,
    • results['hist_stats']['episode_reward'] is a list of the last 100 episode rewards from all training iterations so far. Useful for plotting.
    • results['info']['num_steps_trained'] shows the total number of training steps,
    • which is at most results['info']['num_steps_sampled'], based on the train_batch_size