You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My research interest is robustness of RL algorithms to environment parameters. I want to modify currents RL algorithms to make them achieve good performance when they are tested in environments with unfamilar parameters. (For example, an agent is trained in Cartpole environment with 1m pole. I want it achieve good performance in Cartpole environment with 3m pole.)
To achieve this goal, I want to get the relationship between model parameter values and RL algorithm's performance (reward). As a result, I want to get the gradient of reward with respect to the model parameters.
Mujoco simulator has applied in my experiments. But Mujoco simulator is not implemented by pure python. So I cannot get the the gradient of reward with respect to the model parameters.
So my queation is:
Can pytorch_kinematics compute the gradient of reward with respect to the model parameters? If so, how should I use it to achieve this goal?
The text was updated successfully, but these errors were encountered:
PK is not a simulator because it only does kinematics (so no forces or velocity is considered)
I don't know what your reward is
If you're using RL to learn a control policy, the first case (that PK is not a simulator) may or may not work for you depending on if your environment and problem is quasi-static. If it is quasi-static, then you could maybe use PK as the mapping from what your RL-learned model outputs and what you need to compute a rewards function.
The second point depends on if your reward depends on the output of kinematics. For example, distance of a transformed link to some goal would be a reward/cost function that you could differentiate through with PK.
Specific to your cartpole environment, I think it's doable. If you formulate the pole length as a prismatic joint, then you could set requires_grad=True on the pole length joint value. Use it in forward kinematics, compute the reward based on the output transform (e.g. distance of end of cartpole link to a goal set)
My research interest is robustness of RL algorithms to environment parameters. I want to modify currents RL algorithms to make them achieve good performance when they are tested in environments with unfamilar parameters. (For example, an agent is trained in Cartpole environment with 1m pole. I want it achieve good performance in Cartpole environment with 3m pole.)
To achieve this goal, I want to get the relationship between model parameter values and RL algorithm's performance (reward). As a result, I want to get the gradient of reward with respect to the model parameters.
Mujoco simulator has applied in my experiments. But Mujoco simulator is not implemented by pure python. So I cannot get the the gradient of reward with respect to the model parameters.
So my queation is:
Can pytorch_kinematics compute the gradient of reward with respect to the model parameters? If so, how should I use it to achieve this goal?
The text was updated successfully, but these errors were encountered: