Probabilistic planning in continuous state-action MDPs using TensorFlow.
tf-mdp is an implementation based on the paper:
Thiago P. Bueno; Leliane N. de Barros; Denis D. Mauá; Scott Sanner
Deep Reactive Policies for Planning in Stochastic Nonlinear Domains
In AAAI, 2019.
tf-mdp is a Python3.6+ package available in PyPI.
$ pip3 install tf-mdp
Please make sure you have a running TensorFlow version on your system before pip-installing this package.
tf-mdp solves discrete-time continuous state-action MDPs.
The domains/instances are specified using the RDDL language.
It is built on the following packages available on the Python3 RDDL toolkit:
- pyrddl: RDDL lexer/parser.
- rddlgym: A toolkit for working with RDDL domains.
- rddl2tf: RDDL2TensorFlow compiler.
- tf-rddlsim: A RDDL simulator running in TensorFlow.
Please refer to each project documentation for further details.
$ tfmdp --help
usage: tfmdp [-h] [-l LAYERS [LAYERS ...]]
[-a {none,sigmoid,tanh,relu,relu6,crelu,elu,selu,softplus,softsign}]
[-iln] [-b BATCH_SIZE] [-hr HORIZON] [-e EPOCHS]
[-lr LEARNING_RATE]
[-opt {Adadelta,Adagrad,Adam,GradientDescent,ProximalGradientDescent,ProximalAdagrad,RMSProp}]
[-lfn {linear,mse}] [-ld LOGDIR] [-v]
rddl
Probabilistic planning in continuous state-action MDPs using TensorFlow.
positional arguments:
rddl RDDL file or rddlgym domain id
optional arguments:
-h, --help show this help message and exit
-l LAYERS [LAYERS ...], --layers LAYERS [LAYERS ...]
number of units in each hidden layer in policy network
-a {none,sigmoid,tanh,relu,relu6,crelu,elu,selu,softplus,softsign}, --activation {none,sigmoid,tanh,relu,relu6,crelu,elu,selu,softplus,softsign}
activation function for hidden layers in policy
network
-iln, --input-layer-norm
input layer normalization flag
-b BATCH_SIZE, --batch-size BATCH_SIZE
number of trajectories in a batch (default=256)
-hr HORIZON, --horizon HORIZON
number of timesteps (default=40)
-e EPOCHS, --epochs EPOCHS
number of timesteps (default=200)
-lr LEARNING_RATE, --learning-rate LEARNING_RATE
optimizer learning rate (default=0.001)
-opt {Adadelta,Adagrad,Adam,GradientDescent,ProximalGradientDescent,ProximalAdagrad,RMSProp}, --optimizer {Adadelta,Adagrad,Adam,GradientDescent,ProximalGradientDescent,ProximalAdagrad,RMSProp}
loss optimizer (default=RMSProp)
-lfn {linear,mse}, --loss-fn {linear,mse}
loss function (default=linear)
-ld LOGDIR, --logdir LOGDIR
log directory for data summaries (default=/tmp/tfmdp)
-v, --verbose verbosity mode
$ tfmdp Reservoir-20 -l 2048 -iln -a elu -b 256 -hr 40 -e 200 -lr 0.001 -lfn mse -v
Running tf-mdp v0.5.4 ...
>> RDDL: Reservoir-20
>> logdir: /tmp/tfmdp
>> Policy Net:
layers = [2048]
activation = elu
input layer norm = True
>> Hyperparameters:
epochs = 200
learning rate = 0.001
batch size = 256
horizon = 40
>> Optimization:
optimizer = RMSProp
loss function = mse
>> Loading model ...
Done in 0.018952 sec.
>> Optimizing...
2021-06-23 22:56:18.873731: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-06-23 22:56:18.895765: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-06-23 22:56:18.896462: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x46628b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-23 22:56:18.896514: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Epoch 199: loss = 1201677952.000000
Done in 28.525183 sec.
>> Performance:
total reward = -3653.9695, reward per timestep = -91.3492
$ tfmdp HVAC-3 -l 256 128 64 32 -iln -a elu -b 256 -hr 40 -e 200 -lr 0.0001 -lfn mse -v
Running tf-mdp v0.5.4 ...
>> RDDL: HVAC-3
>> logdir: /tmp/tfmdp
>> Policy Net:
layers = [256,128,64,32]
activation = elu
input layer norm = True
>> Hyperparameters:
epochs = 200
learning rate = 0.0001
batch size = 256
horizon = 40
>> Optimization:
optimizer = RMSProp
loss function = mse
>> Loading model ...
Done in 0.017646 sec.
>> Optimizing...
2021-06-23 22:54:05.766434: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-06-23 22:54:05.787832: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-06-23 22:54:05.788607: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x49a4d00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-23 22:54:05.788690: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Epoch 199: loss = 103798661120.0000000
Done in 15.748765 sec.
>> Performance:
total reward = -315724.4688, reward per timestep = -7893.1117
$ tfmdp Navigation-v2 -l 256 128 64 32 -a elu -b 128 -hr 20 -e 200 -lr 0.001 -lfn mse -v
Running tf-mdp v0.5.4 ...
>> RDDL: Navigation-v2
>> logdir: /tmp/tfmdp
>> Policy Net:
layers = [256,128,64,32]
activation = elu
input layer norm = False
>> Hyperparameters:
epochs = 200
learning rate = 0.001
batch size = 128
horizon = 20
>> Optimization:
optimizer = RMSProp
loss function = mse
>> Loading model ...
Done in 0.012209 sec.
>> Optimizing...
2021-06-23 22:50:59.732002: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-06-23 22:50:59.751959: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-06-23 22:50:59.752494: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5bc6a20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-23 22:50:59.752514: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Epoch 199: loss = 6452.3613285
Done in 6.466699 sec.
>> Performance:
total reward = -78.3427, reward per timestep = -3.9171
Please refer to [https://tf-mdp.readthedocs.io/][readthedocs] for the code documentation.
If you are having issues with tf-mdp, please let me know at: thiago.pbueno@gmail.com.
Copyright (c) 2018-2021 Thiago Pereira Bueno All Rights Reserved.
tf-mdp is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
tf-mdp is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with tf-mdp. If not, see http://www.gnu.org/licenses/.