This is a collection of research papers for model-based reinforcement learning (mbrl). And the repository will be continuously updated to track the frontier of model-based rl.
Welcome to follow and star!
[2024.10.27] New: We update the NeurIPS 2024 paper list of model-based rl! [2024.05.20] We update the ICML 2024 paper list of model-based rl. [2023.11.29] We update the ICLR 2024 paper list of model-based rl. [2023.09.29] We update the NeurIPS 2023 paper list of model-based rl. [2023.06.15] We update the ICML 2023 paper list of model-based rl. [2023.02.05] We update the ICLR 2023 paper list of model-based rl. [2022.11.03] We update the NeurIPS 2022 paper list of model-based rl. [2022.07.06] We update the ICML 2022 paper list of model-based rl. [2022.02.13] We update the ICLR 2022 paper list of model-based rl. [2021.12.28] We release the awesome model-based rl.
We’ll start this section with a disclaimer: it’s really quite hard to draw an accurate, all-encompassing taxonomy of algorithms in the Model-Based RL space, because the modularity of algorithms is not well-represented by a tree structure. So we will publish a series of related blogs to explain more Model-Based RL algorithms.
A non-exhaustive, but useful taxonomy of algorithms in modern Model-Based RL.
We simply divide Model-Based RL
into two categories: Learn the Model
and Given the Model
.
-
Learn the Model
mainly focuses on how to build the environment model. -
Given the Model
cares about how to utilize the learned model.
And we give some examples as shown in the figure above. There are links to algorithms in taxonomy.
[1] World Models: Ha and Schmidhuber, 2018
[2] I2A (Imagination-Augmented Agents): Weber et al, 2017
[3] MBMF (Model-Based RL with Model-Free Fine-Tuning): Nagabandi et al, 2017
[4] MBVE (Model-Based Value Expansion): Feinberg et al, 2018
[5] ExIt (Expert Iteration): Anthony et al, 2017
[6] AlphaZero: Silver et al, 2017
[7] POPLIN (Model-Based Policy Planning): Wang et al, 2019
[8] M2AC (Masked Model-based Actor-Critic): Pan et al, 2020
format:
- [title](paper link) [links]
- author1, author2, and author3
- Key: key problems and insights
- OpenReview: optional
- ExpEnv: experiment environments
Toggle
-
Dyna, an integrated architecture for learning, planning, and reacting
- Richard S. Sutton. ACM 1991
- Key: dyna architecture
- ExpEnv: None
-
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
- Marc Peter Deisenroth, Carl Edward Rasmussen. ICML 2011
- Key: probabilistic dynamics model
- ExpEnv: cart-pole system, robotic unicycle
-
Learning Complex Neural Network Policies with Trajectory Optimization
- Sergey Levine, Vladlen Koltun. ICML 2014
- Key: guided policy search
- ExpEnv: mujoco
-
Learning Continuous Control Policies by Stochastic Value Gradients
- Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, Tom Erez. NIPS 2015
- Key: backpropagation through paths, gradient on real trajectory
- ExpEnv: mujoco
-
- Junhyuk Oh, Satinder Singh, Honglak Lee. NIPS 2017
- Key: value-prediction model
- ExpEnv: collect domain, atari
-
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
- Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee. NIPS 2018
- Key: ensemble model and Qnet, value expansion
- ExpEnv: mujoco, roboschool
-
Recurrent World Models Facilitate Policy Evolution
- David Ha, Jürgen Schmidhuber. NIPS 2018
- Key: vae(representation), rnn(predictive model)
- ExpEnv: car racing, vizdoom
-
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
-
When to Trust Your Model: Model-Based Policy Optimization
- Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine. NeurIPS 2019
- Key: ensemble model, sac, k-branched rollout
- ExpEnv: mujoco
-
Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees
- Yuping Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma. ICLR 2019
- Key: Discrepancy Bounds Design, ME-TRPO with multi-step, Entropy regularization
- ExpEnv: mujoco
-
Model-Ensemble Trust-Region Policy Optimization
- Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel. ICLR 2018
- Key: ensemble model, TRPO
- ExpEnv: mujoco
-
Dream to Control: Learning Behaviors by Latent Imagination
- Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi. ICLR 2019
- Key: DreamerV1, latent space imagination
- ExpEnv: deepmind control suite, atari, deepmind lab
-
Exploring Model-based Planning with Policy Networks
- Tingwu Wang, Jimmy Ba. ICLR 2020
- Key: model-based policy planning in action space and parameter space
- ExpEnv: mujoco
-
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver. Nature 2020
- Key: MCTS, value equivalence
- ExpEnv: chess, shogi, go, atari
Toggle
-
The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning
-
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
- Abdullah Akgül, Manuel Haussmann, Melih Kandemir
- Key: The paper argues that uncertainty-based reward penalization introduces excessive conservatism, potentially resulting in suboptimal policies through underestimation.
- ExpEnv: d4rl
-
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
-
Model-Based Transfer Learning for Contextual Reinforcement Learning
- Jung-Hoon Cho, Vindula Jayawardana, Sirui Li, Cathy Wu
- Key: bayesian optimization, contextual rl
- ExpEnv: gaussian process, traffic signal, eco-driving, advisory autonomy, control tasks
-
- Guhao Feng, Han Zhong
- Key: rl representation complexity
- ExpEnv: mujoco
Toggle
-
HarmonyDream: Task Harmonization Inside World Models
- Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long
- Key: observation modeling and reward modeling analysis in world models
- ExpEnv: meta-world, rlbench, deepmind control suite, atari 100k
-
CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents
- Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie
- Key: propose a competitive framework for LLM-based agents; build a simulated competitive environment
- ExpEnv: a virtual town with only restaurants and customers
-
Model-based Reinforcement Learning for Parameterized Action Spaces
- Renhao Zhang, Haotian Fu, Yilin Miao, George Konidaris
- Key: discrete-continuous hybrid action space, dynamics model with parameterized actions, MPC with parameterized actions
- ExpEnv: platform, goal, hard goal, catch point, hard move
-
Learning Latent Dynamic Robust Representations for World Models
- Ruixiang Sun, Hongyu Zang, Xin Li, Riashat Islam
- Key: modified Dreamer architecture, hybrid-recurrent state space model
- ExpEnv: deepmind control suite, distracted deepmind control suite, mani-skill2
-
AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors
- Yucen Wang, Shenghua Wan, Le Gan, Shuai Feng, De-Chuan Zhan
- Key: implicit action generator, action-conditioned separated world models
- ExpEnv: deepmind control suite
-
Hieros: Hierarchical Imagination on Structured State Space Sequence World Models
- Paul Mattes, Rainer Schlosser, Ralf Herbrich
- Key: state-space models, multilayered hierarchical imagination, S5 based world model
- ExpEnv: atari 100k
-
Improving Token-Based World Models with Parallel Observation Prediction
- Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor
- Key: pixel-based mbrl, token-based world models, retentive environment model
- ExpEnv: atari 100k
-
Do Transformer World Models Give Better Policy Gradients?
- Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon
- Key: actions world model
- ExpEnv: double-pendulum, Myriad
-
Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming
- Hany Hamed, Subin Kim, Dongyeong Kim, Jaesik Yoon, Sungjin Ahn
- Key: during strategeic dreaming, train three policies -- highway policy, explorer policy and achiever policy, and then achieve downstream tasks
- ExpEnv: 2D Navigation, 3D-Maze Navigation, RoboKitchen
-
Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption
- Chenlu Ye, Jiafan He, Quanquan Gu, Tong Zhang
- Key: theoretical analysis of adversarial corruption for model-based rl, encompassing both online and offline settings
- ExpEnv: None
-
Model-based Reinforcement Learning for Confounded POMDPs
- Mao Hong, Zhengling Qi, Yanxun Xu
- Key: model-based RL, POMDP
- ExpEnv: None
Toggle
-
Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning
- Chengxing Jia, Chenxiao Gao, Hao Yin, Fuxiang Zhang, Xiong-Hui Chen, Tian Xu, Lei Yuan, Zongzhang Zhang, Zhi-Hua Zhou, Yang Yu
- Key: Reinforcement Learning, Model-based Reinforcement Learning, Offline Reinforcement Learning
- OpenReview: 8, 8, 8, 6
- ExpEnv: d4rl
-
Efficient Dynamics Modeling in Interactive Environments with Koopman Theory
- Arnab Kumar Mondal, Siba Smarak Panigrahi, Sai Rajeswar, Kaleem Siddiqi, Siamak Ravanbakhsh
- Key: Koopman Theory, Reinforcement Learning, Dynamical System, Planning, Longe range dynamics prediction models, Efficient forward dynamics
- OpenReview: 8, 6, 5, 3
- ExpEnv: mujoco
-
Combining Spatial and Temporal Abstraction in Planning for Better Generalization
- Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio
- Key: Reinforcement Learning, Planning, Neural Networks, Temporal Difference Learning, Generalization, Deep Reinforcement Learning
- OpenReview: 6, 6, 6, 5
- ExpEnv: MiniGrid-BabyAI framework
-
Mastering Memory Tasks with World Models
- Mohammad Reza Samsami, Artem Zholus, Janarthanan Rajendran, Sarath Chandar
- Key: recall to imagine module, based on DreamerV3
- OpenReview: 10, 8, 6
- ExpEnv: bsuite, popgym, atari, deepmind control suite, memory maze
-
Privileged Sensing Scaffolds Reinforcement Learning
- Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman
- Key: privileged information, based on DreamerV3
- OpenReview: 10, 8, 8, 8
- ExpEnv: gymnasium robotics
-
TD-MPC2: Scalable, Robust World Models for Continuous Control
- Nicklas Hansen, Hao Su, Xiaolong Wang
- Key: implicit world model, model predictive control, generalist td-mpc2
- OpenReview: 8, 8, 8, 8
- ExpEnv: deepmind control suite, Meta-World, maniskill2, myosuite
-
Robust Model Based Reinforcement Learning Using L1 Adaptive Control
- Minjun Sung, Sambhu Harimanas Karumanchi, Aditya Gahlawat, Naira Hovakimyan
- Key: L1 Adaptive Control
- OpenReview: 8, 6, 6, 6
- ExpEnv: mujoco
-
Learning Hierarchical World Models with Adaptive Temporal Abstractions from Discrete Latent Dynamics
- Christian Gumbsch, Noor Sajid, Georg Martius, Martin V. Butz
- Key: Context-specific Recurrent State Space Model, hierarchical world model
- OpenReview: 8, 6, 6
- ExpEnv: MiniHack, VisualPinPad, MultiWorld
-
Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
- Lunjun Zhang, Yuwen Xiong, Ze Yang, Sergio Casas, Rui Hu, Raquel Urtasun
- Key: discrete diffusion; world model; autonomous driving
- OpenReview: 10, 8, 6, 6, 6
- ExpEnv: NuScenes, KITTI Odometry, Argoverse2 Lidar
-
COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL
- Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn Wongkamjan, Huazhe Xu, Furong Huang
- Key: conservative model rollouts, optimistic environment exploration
- OpenReview: 6, 6, 6
- ExpEnv: mujoco, deepmind control suite
-
Efficient Multi-agent Reinforcement Learning by Planning
- Qihan Liu, Jianing Ye, Xiaoteng Ma, Jun Yang, Bin Liang, Chongjie Zhang
- Key: mcts, optimistic search lambda, advantage-weighted policy optimization
- OpenReview: 8, 6, 6, 6
- ExpEnv: smac
-
Differentiable Trajectory Optimization as a Policy Class for Reinforcement and Imitation Learning
- Weikang Wan, Yufei Wang, Zackory Erickson, David Held
- Key: differentiable trajectory optimization
- OpenReview: 10, 8, 8, 5
- ExpEnv: deepmind control suite, robomimic, maniskill
-
- Zhihe YANG, Yunjian Xu
- Key: conditional diffusion, offline RL
- OpenReview: 8, 8, 6, 6
- ExpEnv: d4rl
-
MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning
- Zohar Rimon, Tom Jurgenson, Orr Krupnik, Gilad Adler, Aviv Tamar
- Key: context-based meta-RL, based on dreamer
- OpenReview: 6, 6, 6, 6
- ExpEnv: Point Robot Navigation, Escape Room, Reacher Sparse
-
Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning
-
DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing
- Vint Lee, Pieter Abbeel, Youngwoon Lee
- Key: learn to predict a temporally-smoothed reward rather than the exact reward at each timestep
- OpenReview: 6, 6, 6, 5
- ExpEnv: robodesk, hand, earthmoving
-
Informed POMDP: Leveraging Additional Information in Model-Based RL
- Gaspard Lambrechts, Adrien Bolland, Damien Ernst
- Key: informed world model, based on DreamerV3
- OpenReview: 6, 6, 6, 5
- ExpEnv: varying mountain hike, deepmind control suite, pop gym, flickering atari and flickering control
Toggle
-
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
- Zirui Zhao, Wee Sun Lee, David Hsu
- Key: LLM-MCTS
- ExpEnv: VirtualHome
-
- Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian (Shawn) Ma, Yitao Liang
- Key: interactive planning approach based on LLM
- ExpEnv: minecraft
-
Facing Off World Model Backbones: RNNs, Transformers, and S4
- Fei Deng, Junyeong Park, Sungjin Ahn
- Key: world model backbones
- ExpEnv: MiniGrid, memory maze
-
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning
- Jialong Wu, Haoyu Ma, Chaoyi Deng, Mingsheng Long
- Key: Contextualized World Models
- ExpEnv: CARLA, deepmind control suite
-
Conformal Prediction for Uncertainty-Aware Planning with Diffusion Dynamics Model
-
LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
- Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu
- Key: MCTS-style benchmark
- ExpEnv: board games, atari, mujoco, gobigger
-
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
- Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li
- Key: GPT-based diffusion model for planning and data synthesizing
- ExpEnv: Meta-World, Maze2D
-
MoVie: Visual Model-Based Policy Adaptation for View Generalization
- Sizhe Yang, Yanjie Ze, Huazhe Xu
- Key: view generalization, spatial adaptive encoder
- ExpEnv: deepmind control suite, adroit, xArm
-
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
- Shenao Zhang, Boyi Liu, Zhaoran Wang, Tuo Zhao
- Key: model-based reparameterization policy gradient method, smoothness regularization
- ExpEnv: mujoco
-
- Lin Guan, Karthik Valmeekam, Sarath Sreedharan, Subbarao Kambhampati
- Key: construct an explicit world (domain) model in planning domain definition language
- ExpEnv: household-robot domain, tyreworld and logistics
-
RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability
- Chuning Zhu, Max Simchowitz, Siri Gadipudi, Abhishek Gupta
- Key: representation resilience for visual RL
- ExpEnv: deepmind control suite, maniskill
-
Model-Based Control with Sparse Neural Dynamics
- Ziang Liu, Jeff He, Genggeng Zhou, Tobia Marcucci, Fei-Fei Li, Jiajun Wu, Yunzhu Li
- Key: network sparsification, mixed-integer formulation of ReLU neural dynamics
- ExpEnv: gym, cartpole, reacher
-
Optimal Exploration for Model-Based RL in Nonlinear Systems
- Andrew Wagenmaker, Guanya Shi, Kevin Jamieson
- Key: optimal sample complexity for nonlinear dynamical systems
- ExpEnv: affine dynamics system
-
State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
- Devleena Das, Sonia Chernova, Been Kim
- Key: a joint embedding model between state-action pairs and concept-based explanations
- ExpEnv: connect4, lunar lander
-
Efficient Exploration in Continuous-time Model-based Reinforcement Learning
- Lenart Treven, Jonas Hübotter, Bhavya, Florian Dorfler, Andreas Krause
- Key: nonlinear ordinary differential equations, regret bound, measurement selection strategies
- ExpEnv: system’s tasks
-
Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models
- Xingyuan Zhang, Philip Becker-Ehmck, Patrick van der Smagt, Maximilian Karl
- Key: pretrained world models, imitation learning from observation only
- ExpEnv: deepmind control suite
-
STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning
- Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, Gao Huang
- Key: categorical-VAE, transformer structure, DreamerV3
- ExpEnv: atari
Toggle
-
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
- Sai Rajeswar Mudumba, Pietro Mazzaglia, Tim Verbelen, Alexandre Piche, Bart Dhoedt, Aaron Courville, Alexandre Lacoste
- Key: unsupervised pretrain, task-aware finetune, dyna-mpc
- ExpEnv: URLB benchmark, RWRL suite
-
Reparameterized Policy Learning for Multimodal Trajectory Optimization
- Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su
- Key: multimodal policy learning, reparameterized policy gradient
- ExpEnv: Meta-World, mujoco
-
Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy
- Xiyao Wang, Wichayaporn Wongkamjan, Ruonan Jia, Furong Huang
- Key: policy-adapted model learning, weight design
- ExpEnv: mujoco
-
Predictable MDP Abstraction for Unsupervised Model-Based RL
- Seohong Park, Sergey Levine
- Key: predictable MDP abstraction, tackle model exploitation
- ExpEnv: mujoco
-
Investigating the Role of Model-Based Learning in Exploration and Transfer
- Jacob C Walker, Eszter Vértes, Yazhe Li, Gabriel Dulac-Arnold, Ankesh Anand, Jessica Hamrick, Theophane Weber
- Key Insights: (1) Is there an advantage to an agent being model-based during unsupervised exploration and/or fine-tuning? (2) What are the contributions of each component of a model-based agent for downstream task learning? (3) How well does the model-based agent deal with environmental shift between the unsupervised and downstream phases?
- ExpEnv: Crafter, RoboDesk, Meta-World
-
The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
- Anirudh Vemula, Yuda Song, Aarti Singh, J. Bagnell, Sanjiban Choudhury
- Key: objective mismatch, mbrl framework
- ExpEnv: Helicopter, WideTree, Linear Dynamical System, Maze, mujoco
-
The Benefits of Model-Based Generalization in Reinforcement Learning
- Kenny Young, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber
- Key: experience replay, when and how learned model generalization
- ExpEnv: ProcMaze, ButtonGrid, PanFlute
-
STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning
- Souradip Chakraborty, Amrit Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha
- Key: information directed sampling, kernelized Stein discrepancy
- ExpEnv: DeepSea
-
Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators
- Paavo Parmas, Takuma Seno, Yuma Aoki
- Key: extension of Dreamer, total propagation computation graph
- ExpEnv: deepmind control suite
-
Reinforcement Learning with History Dependent Dynamic Contexts
- Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier
- Key: non-Markov context dynamics, logistic DCMDPs, theoretical analysis, extension of MuZero
- ExpEnv: MovieLens dataset
-
Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning
-
Simplified Temporal Consistency Reinforcement Learning
- Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, Joni Pajarinen
- Key: representation learning, temporal consistency
- ExpEnv: deepmind control suite
-
Curious Replay for Model-based Adaptation
- Isaac Kauvar, Chris Doyle, Linqi Zhou, Nick Haber
- Key: extension of DreamerV3, curious replay, count-based replay, adversarial replay
- ExpEnv: Crafter, deepmind control suite
-
On Many-Actions Policy Gradient
- Michal Nauman, Marek Cygan
- Key: bias and variance, theoretical analysis
- ExpEnv: deepmind control suite
-
Posterior Sampling for Deep Reinforcement Learning
- Remo Sasso, Michelangelo Conserva, Paulo Rauber
- Key: posterior sampling, continual value network
- ExpEnv: atari
-
Model-based Offline Reinforcement Learning with Count-based Conservatism
- Byeongchan Kim, Min-hwan Oh
- Key: count estimation, theoretical analysis
- ExpEnv: d4rl
Toggle
-
Transformers are Sample-Efficient World Models
- Vincent Micheli, Eloi Alonso, François Fleuret
- Key: discrete autoencoder, transformer based world model
- OpenReview: 8, 8, 8, 8
- ExpEnv: atari
-
Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization
- Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher Abdulhai, Scott Sanner
- Key: model-based offline, bayesian posterior value estimate
- OpenReview: 8, 8, 6, 6
- ExpEnv: d4rl
-
User-Interactive Offline Reinforcement Learning
- Phillip Swazinna, Steffen Udluft, Thomas Runkler
- Key: let the user adapt the policy behavior after training is finished
- OpenReview: 10, 8, 6, 3
- ExpEnv: 2d-world, industrial benchmark
-
CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning
- Sheng Yue, Guanbo Wang, Wei Shao, Zhaofeng Zhang, Sen Lin, Ju Ren, Junshan Zhang
- Key: offline IRL, reward extrapolation error
- OpenReview: 8, 8, 6, 6
- ExpEnv: d4rl
-
Efficient Offline Policy Optimization with a Learned Model
- Zichen Liu, Siyi Li, Wee Sun Lee, Shuicheng YAN, Zhongwen Xu
- Key: offline rl, analysis of MuZero Unplugged, one-step look-ahead policy improvement
- OpenReview: 8, 6, 5
- ExpEnv: atari dataset
-
Efficient Planning in a Compact Latent Action Space
- zhengyao jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian
- Key: planning with VQ-VAE
- OpenReview: 6, 6, 6, 6
- ExpEnv: d4rl dataset
-
- Ruijie Zheng, Xiyao Wang, Huazhe Xu, Furong Huang
- Key: lipschitz regularization
- OpenReview: 8, 8, 6, 6
- ExpEnv: mujoco
-
MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations
- Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar, Aravind Rajeswaran
- Key: three phases -- policy pretraining, targeted exploration, interactive learning
- OpenReview: 8, 6, 6, 6
- ExpEnv: adroit, meta-world, deepmind control suite
-
- Raj Ghugare, Homanga Bharadhwaj, Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov
- Key: Aligned Latent Models
- OpenReview: 8, 6, 6, 6, 6
- ExpEnv: mujoco
-
Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning
- Daniel Palenicek, Michael Lutter, Joao Carvalho, Jan Peters
- Key: longer horizons yield diminishing returns in terms of sample efficiency
- OpenReview: 8, 6, 6, 6
- ExpEnv: brax
-
Planning Goals for Exploration
- Edward S. Hu, Richard Chang, Oleh Rybkin, Dinesh Jayaraman
- Key: sampling-based planning, set goals for each training episode to directly optimize an intrinsic exploration reward
- OpenReview: 8, 8, 8, 8, 6
- ExpEnv: point maze, walker, ant maze, 3-block stack
-
Making Better Decision by Directly Planning in Continuous Control
- Jinhua Zhu, Yue Wang, Lijun Wu, Tao Qin, Wengang Zhou, Tie-Yan Liu, Houqiang Li
- Key: deep differentiable dynamic programming planner
- OpenReview: 8, 8, 8, 6
- ExpEnv: mujoco
-
Latent Variable Representation for Reinforcement Learning
- Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, sujay sanghavi, Dale Schuurmans, Bo Dai
- Key: variational learning, representation learning
- OpenReview: 8, 6, 6, 3
- ExpEnv: mujoco, deepmind control suite
-
SpeedyZero: Mastering Atari with Limited Data and Time
- Yixuan Mei, Jiaxuan Gao, Weirui Ye, Shaohuai Liu, Yang Gao, Yi Wu
- Key: distributed model-based rl, speed up EfficientZero
- OpenReview: 6, 6, 5
- ExpEnv: atari 100k
-
Transformer-based World Models Are Happy With 100k Interactions
- Jan Robine, Marc Höftmann, Tobias Uelwer, Stefan Harmeling
- Key: autoregressive world model, Transformer-XL, balanced cross-entropy loss, balanced dataset sampling
- OpenReview: 8, 6, 6, 6
- ExpEnv: atari 100k
-
On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning
- Yifan Xu, Nicklas Hansen, Zirui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu
- Key: offline multi-task pretraining, online finetuning
- OpenReview: 6, 6, 6, 6
- ExpEnv: atari 100k
-
Become a Proficient Player with Limited Data through Watching Pure Videos
- Weirui Ye, Yunsheng Zhang, Pieter Abbeel, Yang Gao
- Key: unsupervised pre-training, finetune with down-stream tasks
- OpenReview: 8, 6, 6, 5
- ExpEnv: atari 100k
-
EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model
- Yifu Yuan, Jianye HAO, Fei Ni, Yao Mu, YAN ZHENG, Yujing Hu, Jinyi Liu, Yingfeng Chen, Changjie Fan
- Key: jointly pretrain the multi-headed dynamics model and unsupervised exploration policy, finetune to downstream tasks
- OpenReview: 6, 6, 6, 6
- ExpEnv: URLB benchmark
-
Choreographer: Learning and Adapting Skills in Imagination
- Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Alexandre Lacoste, Sai Rajeswar
- Key: world model, skill discovery, skill learning, Skill adaptation
- OpenReview: 8, 8, 6, 6
- ExpEnv: deepmind control suite, Meta-World
Toggle
-
Bidirectional Learning for Offline Infinite-width Model-based Optimization
- Can Chen, Yingxue Zhang, Jie Fu, Xue Liu, Mark Coates
- Key: model-based, offline
- OpenReview: 7, 6, 5
- ExpEnv: design-bench
-
A Unified Framework for Alternating Offline Model Training and Policy Learning
- Shentao Yang, Shujian Zhang, Yihao Feng, Mingyuan Zhou
- Key: model-based, offline, marginal importance weight
- OpenReview: 7, 6, 6, 5
- ExpEnv: d4rl dataset
-
Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief
- Kaiyang Guo, Shao Yunfeng, Yanhui Geng
- Key: model-based, offline
- OpenReview: 8, 8, 7, 7
- ExpEnv: d4rl dataset
-
- Jiafei Lyu, Xiu Li, Zongqing Lu
- Key: double check mechanism, bidirectional modeling, offline RL
- OpenReview: 7, 6, 6
- ExpEnv: d4rl dataset
-
- XiaoPeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu
- Key: multi-agent, model-based
- OpenReview: 7, 6, 4, 3
- ExpEnv: mpe, google research football
-
Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning
- Zhiwei Xu, Dapeng Li, Bin Zhang, Yuan Zhan, Yunpeng Bai, Guoliang Fan
- Key: multi-agent, model-based
- OpenReview: 6, 5
- ExpEnv: StarCraft II, Google Research Football, Multi-Agent Discrete MuJoCo
-
MoCoDA: Model-based Counterfactual Data Augmentation
- Silviu Pitis, Elliot Creager, Ajay Mandlekar, Animesh Garg
- Key: data augmentation framework, offline RL
- OpenReview: 7, 7, 7, 6
- ExpEnv: 2D Navigation, Hook-Sweep
-
When to Update Your Model: Constrained Model-based Reinforcement Learning
- Tianying Ji, Yu Luo, Fuchun Sun, Mingxuan Jing, Fengxiang He, Wenbing Huang
- Key: event-triggered mechanism, constrained model-shift lower-bound optimization
- OpenReview: 6, 6, 5, 5
- ExpEnv: mujoco
-
- Ashish Jayant, Shalabh Bhatnagar
- Key: constrained RL, model-based
- OpenReview: 7, 6, 5, 5
- ExpEnv: safety gym
-
Learning to Attack Federated Learning: A Model-based Reinforcement Learning Attack Framework
- Henger Li, Xiaolin Sun, Zizhan Zheng
- Key: attack & defense, federated learning, model-based
- OpenReview: 6, 6, 6, 5
- ExpEnv: MNIST, FashionMNIST, EMNIST, CIFAR-10 and synthetic dataset
-
Model-Based Imitation Learning for Urban Driving
- Anthony Hu, Gianluca Corrado, Nicolas Griffiths, Zachary Murez, Corina Gurau, Hudson Yeo, Alex Kendall, Roberto Cipolla, Jamie Shotton
- Key: model-based, imitation learning, autonomous driving
- OpenReview: 7, 6, 6
- ExpEnv: CARLA
-
Data-Driven Model-Based Optimization via Invariant Representation Learning
- Han Qi, Yi Su, Aviral Kumar, Sergey Levine
- Key: domain adaptation, invariant objective models, representation learning (no about model-based RL)
- OpenReview: 7, 6, 6, 5, 5
- ExpEnv: design-bench
-
Model-based Lifelong Reinforcement Learning with Bayesian Exploration
- Haotian Fu, Shangqun Yu, Michael Littman, George Konidaris
- Key: lifelong RL, variational bayesian
- OpenReview: 7, 6, 6
- ExpEnv: mujoco, meta-world
-
Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning
-
Joint Model-Policy Optimization of a Lower Bound for Model-Based RL
- Benjamin Eysenbach, Alexander Khazatsky, Sergey Levine, Russ Salakhutdinov
- Key: unified objective for model-based RL
- OpenReview: 8, 8, 7, 6
- ExpEnv: gridworld, mujoco, ROBEL manipulation
-
RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
- Marc Rigter, Bruno Lacerda, Nick Hawes
- Key: offline rl, model-based rl, two-player game, adversarial model training
- OpenReview: 6, 6, 6, 4
- ExpEnv: d4rl
-
Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
- Shenao Zhang
- Key: posterior sampling RL, referential update, constrained conservative update
- OpenReview: 7, 7, 5, 5
- ExpEnv: mujoco, N-Chain MDPs
-
Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning
- Chenyang Wu, Tianci Li, Zongzhang Zhang, Yang Yu
- Key: optimism in the face of uncertainty(OFU), BOO Regret
- OpenReview: 6, 6, 5
- ExpEnv: RiverSwim, Chain, Random MDPs
-
Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity
- Alekh Agarwal, Tong Zhang
- Key: posterior sampling RL, Bellman error decoupling framework
- OpenReview: 7, 7, 7, 6
- ExpEnv: None
-
Exponential Family Model-Based Reinforcement Learning via Score Matching
- Gene Li, Junbo Li, Nathan Srebro, Zhaoran Wang, Zhuoran Yang
- Key: optimistic model-based, score matching
- OpenReview: 7, 7, 6
- ExpEnv: None
-
Deep Hierarchical Planning from Pixels
- Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel
- Key: hierarchical RL, long-horizon and sparse reward tasks
- OpenReview: 6, 6, 5
- ExpEnv: atari, deepmind control suite, deepmind lab, crafter
-
Continuous MDP Homomorphisms and Homomorphic Policy Gradient
- Sahand Rezaei-Shoshtari, Rosie Zhao, Prakash Panangaden, David Meger, Doina Precup
- Key: Homomorphic Policy Gradient, Continuous MDP Homomorphisms, Lax Bisimulation Loss
- OpenReview: 7, 7, 7
- ExpEnv: deepmind control suite
Toggle
-
DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations
- Fei Deng, Ingook Jang, Sungjin Ahn
- Key: dreamer, prototypes
- ExpEnv: deepmind control suite
-
Denoised MDPs: Learning World Models Better Than the World Itself
- Tongzhou Wang, Simon Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian
- Key: representation learning, denoised model
- ExpEnv: deepmind control suite, RoboDesk
-
- Qi Wang, Herke van Hoof
- Key: graph structured surrogate model, meta training
- ExpEnv: atari, mujoco
-
Towards Adaptive Model-Based Reinforcement Learning
- Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm van Seijen
- Key: local change adaptation
- ExpEnv: GridWorldLoCA, ReacherLoCA, MountaincarLoCA
-
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation
- Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause
- Key: model-based multi-agent, confidence bound
- ExpEnv: SMART
-
- Shentao Yang, Yihao Feng, Shujian Zhang, Mingyuan Zhou
- Key: offline rl, model-based rl, stationary distribution regularization
- ExpEnv: d4rl
-
Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization
- Brandon Trabucco, Xinyang Geng, Aviral Kumar, Sergey Levine
- Key: benchmark, offline MBO
- ExpEnv: Design-Bench Benchmark Tasks
-
Temporal Difference Learning for Model Predictive Control
- Nicklas Hansen, Hao Su, Xiaolong Wang
- Key: td-learning, MPC
- ExpEnv: deepmind control suite, Meta-World
Toggle
-
Revisiting Design Choices in Offline Model Based Reinforcement Learning
- Cong Lu, Philip Ball, Jack Parker-Holder, Michael Osborne, Stephen J. Roberts
- Key: model-based offline, uncertainty quantification
- OpenReview: 8, 8, 6, 6, 6
- ExpEnv: d4rl dataset
-
Value Gradient weighted Model-Based Reinforcement Learning
- Claas A Voelcker, Victor Liao, Animesh Garg, Amir-massoud Farahmand
- Key: Value-Gradient weighted Model loss
- OpenReview: 8, 8, 6, 6
- ExpEnv: mujoco
-
Planning in Stochastic Environments with a Learned Model
- Ioannis Antonoglou, Julian Schrittwieser, Sherjil Ozair, Thomas K Hubert, David Silver
- Key: MCTS, stochastic MuZero
- OpenReview: 10, 8, 8, 5
- ExpEnv: 2048 game, Backgammon, Go
-
Policy improvement by planning with Gumbel
- Ivo Danihelka, Arthur Guez, Julian Schrittwieser, David Silver
- Key: Gumbel AlphaZero, Gumbel MuZero
- OpenReview: 8, 8, 8, 6
- ExpEnv: go, chess, atari
-
Model-Based Offline Meta-Reinforcement Learning with Regularization
- Sen Lin, Jialin Wan, Tengyu Xu, Yingbin Liang, Junshan Zhang
- Key: model-based offline Meta-RL
- OpenReview: 8, 6, 6, 6
- ExpEnv: d4rl dataset
-
Information Prioritization through Empowerment in Visual Model-based RL
- Homanga Bharadhwaj, Mohammad Babaeizadeh, Dumitru Erhan, Sergey Levine
- Key: mutual information, visual model-based RL
- OpenReview: 8, 8, 8, 6
- ExpEnv: deepmind control suite, Kinetics dataset
-
Transfer RL across Observation Feature Spaces via Model-Based Regularization
- Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew E Cohen, Furong Huang
- Key: latent dynamics model, transfer RL
- OpenReview: 8, 6, 5, 5
- ExpEnv: CartPole, Acrobot and Cheetah-Run, mujoco, 3DBall
-
Learning State Representations via Retracing in Reinforcement Learning
- Changmin Yu, Dong Li, Jianye HAO, Jun Wang, Neil Burgess
- Key: representation learning, learning via retracing
- OpenReview: 8, 6, 5, 3
- ExpEnv: deepmind control suite
-
Model-augmented Prioritized Experience Replay
- Youngmin Oh, Jinwoo Shin, Eunho Yang, Sung Ju Hwang
- Key: prioritized experience replay, mbrl
- OpenReview: 8, 8, 6, 5
- ExpEnv: pybullet
-
Evaluating Model-Based Planning and Planner Amortization for Continuous Control
- Arunkumar Byravan, Leonard Hasenclever, Piotr Trochim, Mehdi Mirza, Alessandro Davide Ialongo, Yuval Tassa, Jost Tobias Springenberg, Abbas Abdolmaleki, Nicolas Heess, Josh Merel, Martin Riedmiller
- Key: model predictive control
- OpenReview: 8, 6, 6, 6
- ExpEnv: mujoco
-
Gradient Information Matters in Policy Optimization by Back-propagating through Model
- Chongchong Li, Yue Wang, Wei Chen, Yuting Liu, Zhi-Ming Ma, Tie-Yan Liu
- Key: two-model-based method, analyze model error and policy gradient
- OpenReview: 8, 8, 6, 6
- ExpEnv: mujoco
-
Pareto Policy Pool for Model-based Offline Reinforcement Learning
- Yijun Yang, Jing Jiang, Tianyi Zhou, Jie Ma, Yuhui Shi
- Key: model-based offline, model return-uncertainty trade-off
- OpenReview: 8, 8, 6, 5
- ExpEnv: d4rl dataset
-
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
- Masatoshi Uehara, Wen Sun
- Key: model-based offline theory, PAC bounds
- OpenReview: 8, 6, 6, 5
- ExpEnv: None
-
Know Thyself: Transferable Visual Control Policies Through Robot-Awareness
- Edward S. Hu, Kun Huang, Oleh Rybkin, Dinesh Jayaraman
- Key: world models that transfer to new robots
- OpenReview: 8, 6, 6, 5
- ExpEnv: mujoco, WidowX and Franka Panda robot
Toggle
-
On Effective Scheduling of Model-based Reinforcement Learning
-
COMBO: Conservative Offline Model-Based Policy Optimization
- Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn
- Key: offline reinforcement learning, model-based reinforcement learning, deep reinforcement learning
- OpenReview: 6, 7, 6, 8
- ExpEnv: d4rl dataset
-
Safe Reinforcement Learning by Imagining the Near Future
- Garrett Thomas, Yuping Luo, Tengyu Ma
- Key: safe rl, reward penalty, theory about model-based rollouts
- OpenReview: 8, 6, 6
- ExpEnv: mujoco
-
Model-Based Reinforcement Learning via Imagination with Derived Memory
- Yao Mu, Yuzheng Zhuang, Bin Wang, Guangxiang Zhu, Wulong Liu, Jianyu Chen, Ping Luo, Shengbo Eben Li, Chongjie Zhang, Jianye HAO
- Key: extension of dreamer, prediction-reliability weight
- OpenReview: 6, 6, 6, 6
- ExpEnv: deepmind control suite
-
MobILE: Model-Based Imitation Learning From Observation Alone
-
Model-Based Episodic Memory Induces Dynamic Hybrid Controls
- Hung Le, Thommen Karimpanal George, Majid Abdolshah, Truyen Tran, Svetha Venkatesh
- Key: model-based, episodic control
- OpenReview: 7, 7, 6, 6
- ExpEnv: 2D maze navigation, cartpole, mountainCar and lunarlander, atari, 3D navigation: gym-miniworld
-
A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning
- Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio
- Key: mbrl, set representation
- OpenReview: 7, 7, 7, 6
- ExpEnv: MiniGrid-BabyAI framework
-
Mastering Atari Games with Limited Data
- Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao
- Key: muzero, self-supervised consistency loss
- OpenReview: 7, 7, 7, 5
- ExpEnv: atrai 100k, deepmind control suite
-
Online and Offline Reinforcement Learning by Planning with a Learned Model
- Julian Schrittwieser, Thomas K Hubert, Amol Mandhane, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver
- Key: muzero, reanalyse, offline
- OpenReview: 8, 8, 7, 6
- ExpEnv: atrai dataset, deepmind control suite dataset
-
Self-Consistent Models and Values
- Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver
- Key: new model learning way
- OpenReview: 7, 7, 7, 6
- ExpEnv: tabular MDP, Sokoban, atari
-
- Christopher Grimm, Andre Barreto, Gregory Farquhar, David Silver, Satinder Singh
- Key: value equivalence, value-based planning, muzero
- OpenReview: 8, 7, 7, 6
- ExpEnv: four rooms, atari
-
MOPO: Model-based Offline Policy Optimization
- Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma
- Key: model-based, offline
- OpenReview: None
- ExpEnv: d4rl dataset, halfcheetah-jump and ant-angle
-
RoMA: Robust Model Adaptation for Offline Model-based Optimization
- Sihyun Yu, Sungsoo Ahn, Le Song, Jinwoo Shin
- Key: model-based, offline
- OpenReview: 7, 6, 6
- ExpEnv: design-bench
-
Offline Reinforcement Learning with Reverse Model-based Imagination
- Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang
- Key: model-based, offline
- OpenReview: 7, 6, 6, 5
- ExpEnv: d4rl dataset
-
Offline Model-based Adaptable Policy Learning
- Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Tony Qin, Shang Wenjie, Jieping Ye
- Key: model-based, offline
- OpenReview: 6, 6, 6, 4
- ExpEnv: d4rl dataset
-
Weighted model estimation for offline model-based reinforcement learning
- Toru Hishinuma, Kei Senda
- Key: model-based, offline, off-policy evaluation
- OpenReview: 7, 6, 6, 6
- ExpEnv: pendulum, d4rl dataset
-
Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
- Weitong Zhang, Dongruo Zhou, Quanquan Gu
- Key: learning theory, model-based reward-free RL, linear function approximation
- OpenReview: 6, 6, 5, 5
- ExpEnv: None
-
- Kefan Dong, Jiaqi Yang, Tengyu Ma
- Key: learning theory, model-based bandit RL, nonlinear function approximation
- OpenReview: 7, 7, 7, 6
- ExpEnv: None
-
Discovering and Achieving Goals via World Models
- Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak
- Key: unsupervised goal reaching, goal-conditioned RL
- OpenReview: 6, 6, 6, 6, 6
- ExpEnv: walker, quadruped, bins, kitchen
Toggle
-
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
- Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu
- Key: model-based, behavior cloning (warmup), trpo
- OpenReview: 8, 7, 7, 5
- ExpEnv: d4rl dataset
-
Control-Aware Representations for Model-based Reinforcement Learning
- Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh
- Key: representation learning, model-based soft actor-critic
- OpenReview: 6, 6, 6
- ExpEnv: planar system, inverted pendulum – swingup, cartpole, 3-link manipulator — swingUp & balance
-
Mastering Atari with Discrete World Models
- Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba
- Key: DreamerV2, many tricks(multiple categorical variables, KL balancing, etc)
- OpenReview: 9, 8, 5, 4
- ExpEnv: atari
-
Model-Based Visual Planning with Self-Supervised Functional Distances
- Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine
- Key: goal-reaching task, dynamics learning, distance learning (goal-conditioned Q-function)
- OpenReview: 7, 7, 7, 7
- ExpEnv: sawyer, door sliding
-
- Arthur Argenson, Gabriel Dulac-Arnold
- Key: model-based, offline
- OpenReview: 8, 7, 5, 5
- ExpEnv: RL Unplugged(RLU), d4rl dataset
-
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation
- Justin Fu, Sergey Levine
- Key: model-based, offline
- OpenReview: 8, 6, 6
- ExpEnv: design-bench
-
On the role of planning in model-based deep reinforcement learning
- Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar Veličković, Théophane Weber
- Key: discussion about planning in MuZero
- OpenReview: 7, 7, 6, 5
- ExpEnv: atari, go, deepmind control suite
-
Representation Balancing Offline Model-based Reinforcement Learning
- Byung-Jun Lee, Jongmin Lee, Kee-Eung Kim
- Key: Representation Balancing MDP, model-based, offline
- OpenReview: 7, 7, 7, 6
- ExpEnv: d4rl dataset
-
- Balázs Kégl, Gabriel Hurtado, Albert Thomas
- Key: mixture density nets, heteroscedasticity
- OpenReview: 7, 7, 7, 6, 5
- ExpEnv: acrobot system
Toggle
-
Conservative Objective Models for Effective Offline Model-Based Optimization
- Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine
- Key: conservative objective model, offline mbrl
- ExpEnv: design-bench
-
Continuous-Time Model-Based Reinforcement Learning
- Çağatay Yıldız, Markus Heinonen, Harri Lähdesmäki
- Key: continuous-time
- ExpEnv: pendulum, cartPole and acrobot
-
Model-Based Reinforcement Learning via Latent-Space Collocation
- Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine
- Key: latent space collocation
- ExpEnv: sparse metaworld tasks
-
Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
- David A Bruns-Smith
- Key: worst-case bounds
- ExpEnv: ope-tools
-
Muesli: Combining Improvements in Policy Optimization
- Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt
- Key: value equivalence
- ExpEnv: atari
-
Vector Quantized Models for Planning
- Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals
- Key: VQVAE, MCTS
- ExpEnv: chess datasets, DeepMind Lab
-
PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration
- Yuda Song, Wen Sun
- Key: sample complexity, kernelized nonlinear regulators, linear MDPs
- ExpEnv: mountain car, antmaze, mujoco
-
Temporal Predictive Coding For Model-Based Planning In Latent Space
- Tung Nguyen, Rui Shu, Tuan Pham, Hung Bui, Stefano Ermon
- Key: temporal predictive coding with a RSSM, latent space
- ExpEnv: deepmind control suite
-
Model-based Reinforcement Learning for Continuous Control with Posterior Sampling
- Ying Fan, Yifei Ming
- Key: regret bound of psrl, mpc
- ExpEnv: continuous cartpole, pendulum swingup, mujoco
-
A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
- Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
- Key: learning theory, multi-agent, model-based self play, two-player zero-sum Markov games
- ExpEnv: None
-
- Yuqi Wang, Jiawei He, Lue Fan, Hongxin Li, Yuntao Chen, Zhaoxiang Zhang CVPR 2024
- Key: AutoDrive world modeling
- ExpEnv: nuScenes
-
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
-
Masked Trajectory Models for Prediction, Representation, and Control
- Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran ICLR 2023 Workshop RRL
- Key: offline RL, learning for control, sequence modeling
- ExpEnv: d4rl
-
World Models via Policy-Guided Trajectory Diffusion
- Marc Rigter, Jun Yamada, Ingmar Posner Arxiv 2023
- Key: Diffusion model, world model
- ExpEnv: deepmind control suite, gridworld
-
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization
- Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters Arxiv 2023
- Key: cumulative rewards uncertainty estimation in MBRL
- ExpEnv: mujoco
-
- Thomas Bi, Raffaello D'Andrea. Arxiv 2023
- Key: Data-Augmented, DreamerV3
- ExpEnv: Real-World Labyrinth Game
-
Mastering Diverse Domains through World Models
- Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap. Arxiv 2023
- Key: DreamerV3, scaling property to world model
- ExpEnv: deepmind control suite, atari, DMLab, minecraft
-
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning
- Chuming Li, Ruonan Jia, Jiawei Yao, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang. IJCAI Workshop 2023
- Key: extended policy improvement, model regularization, planning theorem
- ExpEnv: mujoco
- [Video] Csaba Szepesvári - The challenges of model-based reinforcement learning and how to overcome them
- [Blog] Model-Based Reinforcement Learning: Theory and Practice
Our purpose is to make this repo even better. If you are interested in contributing, please refer to HERE for instructions in contribution.
Awesome Model-Based RL is released under the Apache 2.0 license.