-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory error when sampling from collector #184
Comments
Could you please provide a sample of your self-defined env data format? (obs, act, ...) |
I will show part of the reset() function: def reset(self):
...
self.goal_distance = self.getNavGoalDistance()
realsense_data, rel_dis, roll, pitch, yaw, rel_theta, diff_angle, done, arrive = self.getState()
realsense_data = [i/12 for i in realsense_data]
# Normalize the state
state = realsense_data + [rel_dis / diagonal_dis, (roll+180)/360, (pitch+180)/360, yaw / 360, rel_theta / 360, diff_angle / 180]
return np.asarray(state) For the action, I will take step() function as example: def step(self, action):
linear_vel = self.action_space_discrete[action][0]
ang_vel = self.action_space_discrete[action][1]
# print(linear_vel, ang_vel)
vel_cmd = Twist()
vel_cmd.linear.x = linear_vel / 4
vel_cmd.angular.z = ang_vel
self.pub_cmd_vel.publish(vel_cmd)
# Update sensor data
# self.getSensor()
# Update state observation
realsense_data, rel_dis, roll, pitch, yaw, rel_theta, diff_angle, done, arrive = self.getState()
# Normalize the state
'''
Realsense: [0, 12] => [0,1]
LiDAR: [0, 30] => [0,1]
roll, pitch:[-180, 180] => [0,1]
'''
# scan_data = [i/30 for i in scan_data]
state = realsense_data + [rel_dis / diagonal_dis, (roll+180)/360, (pitch+180)/360, yaw / 360, rel_theta / 360, diff_angle / 180]
reward = self.setReward(done, arrive)
return np.asarray(state), reward, done, {} Besides, I defined the action space with |
Okay, so how many RAM does your machine have? |
The machine have 16GiB RAM and 16GiB Swap area. |
So try opening |
Really thanks, I will close the issue if there's no further problems. |
Solved after updating the RAM to 24Gib |
1. add policy.eval() in all test scripts' "watch performance" 2. remove dict return support for collector preprocess_fn 3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)` 4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184) 5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard 6. add test_returns (both GAE and nstep) 7. change the type-checking order in batch.py and converter.py in order to meet the most often case first 8. fix shape inconsistency for torch.Tensor in replay buffer 9. remove `**kwargs` in ReplayBuffer 10. remove default value in batch.split() and add merge_last argument (#185) 11. improve nstep efficiency 12. add max_batchsize in onpolicy algorithms 13. potential bugfix for subproc.wait 14. fix RecurrentActorProb 15. improve the code-coverage (from 90% to 95%) and remove the dead code 16. fix some incorrect type annotation The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).
1. add policy.eval() in all test scripts' "watch performance" 2. remove dict return support for collector preprocess_fn 3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)` 4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (thu-ml#184) 5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard 6. add test_returns (both GAE and nstep) 7. change the type-checking order in batch.py and converter.py in order to meet the most often case first 8. fix shape inconsistency for torch.Tensor in replay buffer 9. remove `**kwargs` in ReplayBuffer 10. remove default value in batch.split() and add merge_last argument (thu-ml#185) 11. improve nstep efficiency 12. add max_batchsize in onpolicy algorithms 13. potential bugfix for subproc.wait 14. fix RecurrentActorProb 15. improve the code-coverage (from 90% to 95%) and remove the dead code 16. fix some incorrect type annotation The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).
The collector raised the memory error when I training under the self-defined environment with PPO algorithm. The full error message is as below:
MemoryError: Unable to allocate 8.05 GiB for an array with shape (20000, 54006) and data type float64
The self-defined environment uses List state observation and List discrete action. Is there any necessary to change the observation type to dict to fit tianshou and use a smaller buffer size?
I tried to change the sample size of on-policy but it will raise the dimension matching problem(of course). Any answer is welcome for the stupid problem.
The text was updated successfully, but these errors were encountered: