Mixed double precision for PPO algorithm #155

lopatovsky · 2024-06-10T16:38:07Z

Mixed precision

Motivation:

Inspired by RLGames, we implemented automatic mixed double precision to boost performance of PPO.

Sources:

https://pytorch.org/docs/stable/amp.html

https://pytorch.org/docs/stable/notes/amp_examples.html

Speed eval:

Big neural network (units: [2048, 1024, 1024, 512])
10000 steps
Running on top of Oige env simulation (constant for each run)
Skrl uses single forward pass implementation


Library	Mixed-Precision	Time (s)	slowing factor Base: rlgames, mixed pr. = True
RLGames	No	448	1.322x
RLGames	Yes	339	1 (base)
SKRL	No	475	1.401x
SKRL	Yes	373	1.1x
SKRL	Yes *	358	1.056x

* in this run mixed precision was used also for inference during data collection phase

Quality eval:

We trained a policy for our task with each of the configurations multiple times. We didn’t observe any statistically significant difference in quality of the final results.

Toni-SM · 2024-09-15T14:24:43Z

skrl/agents/torch/ppo/ppo.py

@@ -388,55 +398,62 @@ def compute_gae(rewards: torch.Tensor,
            # mini-batches loop
            for sampled_states, sampled_actions, sampled_log_prob, sampled_values, sampled_returns, sampled_advantages in sampled_batches:

-                sampled_states = self._state_preprocessor(sampled_states, train=not epoch)
+                with torch.autocast(device_type=self._device_type, enabled=self._mixed_precision):


Is it necessary to apply autocast to

skrl/skrl/agents/torch/ppo/ppo.py

Lines 369 to 373 in c15f3ce

with torch.no_grad():

self.value.train(False)

last_values, _, _ = self.value.act({"states": self._state_preprocessor(self._current_next_states.float())}, role="value")

self.value.train(True)

last_values = self._value_preprocessor(last_values, inverse=True)

?

Toni-SM · 2024-09-15T14:28:03Z

skrl/agents/torch/ppo/ppo.py

@@ -219,8 +227,9 @@ def act(self, states: torch.Tensor, timestep: int, timesteps: int) -> torch.Tens
            return self.policy.random_act({"states": self._state_preprocessor(states)}, role="policy")

        # sample stochastic actions
-        actions, log_prob, outputs = self.policy.act({"states": self._state_preprocessor(states)}, role="policy")
-        self._current_log_prob = log_prob
+        with torch.autocast(device_type=self._device_type, enabled=(self._mixed_precision)):


Why self._mixed_precision in all with torch.autocast(device_type=self._device_type, enabled=(self._mixed_precision)): statements is between ()?

lopatovsky added 3 commits June 10, 2024 18:35

Add mixed precision option into ppo algorithm

696a9f0

Expand mixed precision to forward passes during data sampling phase

3d5ba05

Merge with main

06fbf2e

lopatovsky force-pushed the ll_mixed_precision branch from 9c98abe to 06fbf2e Compare July 15, 2024 11:17

lopatovsky changed the base branch from main to develop July 15, 2024 11:23

lopatovsky and others added 2 commits July 15, 2024 13:32

Merge branch 'develop' into ll_mixed_precision

4567e80

Group setup statements

c15f3ce

Toni-SM reviewed Sep 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed double precision for PPO algorithm #155

Mixed double precision for PPO algorithm #155

lopatovsky commented Jun 10, 2024 •

edited

Loading

Toni-SM Sep 15, 2024

Toni-SM Sep 15, 2024 •

edited

Loading

	with torch.no_grad():
	self.value.train(False)
	last_values, _, _ = self.value.act({"states": self._state_preprocessor(self._current_next_states.float())}, role="value")
	self.value.train(True)
	last_values = self._value_preprocessor(last_values, inverse=True)

Mixed double precision for PPO algorithm #155

Are you sure you want to change the base?

Mixed double precision for PPO algorithm #155

Conversation

lopatovsky commented Jun 10, 2024 • edited Loading

Mixed precision

Toni-SM Sep 15, 2024

Choose a reason for hiding this comment

Toni-SM Sep 15, 2024 • edited Loading

Choose a reason for hiding this comment

lopatovsky commented Jun 10, 2024 •

edited

Loading

Toni-SM Sep 15, 2024 •

edited

Loading