How to compute the gradient of reward w.r.t. model parameters? #8

c4cld · 2022-05-12T03:35:53Z

My research interest is robustness of RL algorithms to environment parameters. I want to modify currents RL algorithms to make them achieve good performance when they are tested in environments with unfamilar parameters. (For example, an agent is trained in Cartpole environment with 1m pole. I want it achieve good performance in Cartpole environment with 3m pole.)
To achieve this goal, I want to get the relationship between model parameter values and RL algorithm's performance (reward). As a result, I want to get the gradient of reward with respect to the model parameters.
Mujoco simulator has applied in my experiments. But Mujoco simulator is not implemented by pure python. So I cannot get the the gradient of reward with respect to the model parameters.
So my queation is:
Can pytorch_kinematics compute the gradient of reward with respect to the model parameters? If so, how should I use it to achieve this goal?

LemonPi · 2023-02-23T16:57:04Z

I'm not sure because:

PK is not a simulator because it only does kinematics (so no forces or velocity is considered)
I don't know what your reward is

If you're using RL to learn a control policy, the first case (that PK is not a simulator) may or may not work for you depending on if your environment and problem is quasi-static. If it is quasi-static, then you could maybe use PK as the mapping from what your RL-learned model outputs and what you need to compute a rewards function.

The second point depends on if your reward depends on the output of kinematics. For example, distance of a transformed link to some goal would be a reward/cost function that you could differentiate through with PK.

Specific to your cartpole environment, I think it's doable. If you formulate the pole length as a prismatic joint, then you could set requires_grad=True on the pole length joint value. Use it in forward kinematics, compute the reward based on the output transform (e.g. distance of end of cartpole link to a goal set)

PeterMitrano · 2023-10-23T19:26:53Z

Closing since there hasn't been any follow up.

PeterMitrano closed this as completed Oct 23, 2023

JonathanKuelz mentioned this issue Nov 27, 2023

Gradients w.r.t. joint displacements #28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to compute the gradient of reward w.r.t. model parameters? #8

How to compute the gradient of reward w.r.t. model parameters? #8

c4cld commented May 12, 2022

LemonPi commented Feb 23, 2023

PeterMitrano commented Oct 23, 2023

How to compute the gradient of reward w.r.t. model parameters? #8

How to compute the gradient of reward w.r.t. model parameters? #8

Comments

c4cld commented May 12, 2022

LemonPi commented Feb 23, 2023

PeterMitrano commented Oct 23, 2023