Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save and load a module trained with DataParallel or DistributedDataParallel #55

Merged
merged 3 commits into from
Oct 21, 2020

Conversation

shu65
Copy link
Member

@shu65 shu65 commented Sep 3, 2020

This PR fixes save and load of agent to load a model trained with DataParallel or DistributedDataParallel.

After learning with DistributedDataParallel and loading the model without using DistributedDataParallel, the following error occurred

Traceback (most recent call last):
  File "./train_dqn_gym.py", line 268, in <module>
    main()
  File "./train_dqn_gym.py", line 194, in main
    agent.load(args.load)
  File "/home/shu/workspace/pfrl/packages/pfrl/pfrl/agent.py", line 113, in load
    self.__load(dirname, [])
  File "/home/shu/workspace/pfrl/packages/pfrl/pfrl/agent.py", line 132, in __load
    os.path.join(dirname, "{}.pt".format(attr)), map_location
  File "/home/shu/.pyenv/versions/pfrl/lib/python3.7/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for FCStateQFunctionWithDiscreteAction:
	Missing key(s) in state_dict: "model.hidden_layers.0.weight", "model.hidden_layers.0.bias", "model.hidden_layers.1.weight", "model.hidden_layers.1.bias", "model.output.weight", "model.output.bias". 
	Unexpected key(s) in state_dict: "module.model.hidden_layers.0.weight", "module.model.hidden_layers.0.bias", "module.model.hidden_layers.1.weight", "module.model.hidden_layers.1.bias", "module.model.output.weight", "module.model.output.bias". 

To solve this error, I made the PR.

@ummavi ummavi self-assigned this Sep 9, 2020
@ummavi
Copy link
Member

ummavi commented Oct 7, 2020

/test

@pfn-ci-bot
Copy link

  [NOT_FOUND] API failed: /a/github_check_membership: HTTP error: 404 Not Found: https://api.github.com/orgs/pfnet/members/ummavi
  2020-10-07 11:22:31.660098 call.go:280] API failed: /a/github_check_membership
  
  Stack trace:
    github.com/pfnet/flexci/internal/common/api.callInternal (call.go:280)
    github.com/pfnet/flexci/internal/common/api.Call (call.go:128)
    github.com/pfnet/flexci/internal/common/api.CallWithRetry (call.go:311)
    github.com/pfnet/flexci/internal/common/api.GithubCheckMembership (call.go:533)
    github.com/pfnet/flexci/internal/frontend/handler/apihandler.(*githubWebhookIssueCommentFlow).triggerTest (github_issue_comment.go:213)
    github.com/pfnet/flexci/internal/frontend/handler/apihandler.(*githubWebhookIssueCommentFlow).Do (github_issue_comment.go:99)
    github.com/pfnet/flexci/internal/frontend/handler/apihandler.githubIssueCommentHandler (github_issue_comment.go:47)
    runtime.call64 (asm_amd64.s:523)
    reflect.Value.call (value.go:447)
    reflect.Value.Call (value.go:308)
    github.com/pfnet/flexci/internal/frontend/core.(*registerHandlerFlow).registerHandler.func1.1 (handler.go:178)
    github.com/pfnet/flexci/internal/frontend/core.(*apiHandlerFlow).callHandler (handler.go:466)
    github.com/pfnet/flexci/internal/frontend/core.(*apiHandlerFlow).doInternal (handler.go:318)
    github.com/pfnet/flexci/internal/frontend/core.(*apiHandlerFlow).Do (handler.go:277)
    github.com/pfnet/flexci/internal/frontend/core.(*registerHandlerFlow).registerHandler.func1 (handler.go:175)
    github.com/pfnet/flexci/internal/frontend/core.(*handlerFlow).Do (handler.go:713)
    github.com/pfnet/flexci/internal/frontend/core.(*registerHandlerFlow).Register.func1 (handler.go:116)
    net/http.HandlerFunc.ServeHTTP (server.go:1964)
    net/http.(*ServeMux).ServeHTTP (server.go:2361)
    github.com/pfnet/flexci/internal/common/api.callInternal.func2 (call.go:204)
    github.com/pfnet/flexci/internal/common/api.callInternal (call.go:212)
    github.com/pfnet/flexci/internal/common/api.Call (call.go:128)
    github.com/pfnet/flexci/internal/common/api.GithubIssueComment (call.go:519)
    github.com/pfnet/flexci/internal/frontend/handler/xternalhandler.(*githubHookFlow).doInternal (github_webhook.go:146)
    github.com/pfnet/flexci/internal/frontend/handler/xternalhandler.(*githubHookFlow).Do (github_webhook.go:39)
    github.com/pfnet/flexci/internal/frontend/handler/xternalhandler.githubWebhookHandler (github_webhook.go:29)
    github.com/pfnet/flexci/internal/frontend/core.(*handlerFlow).Do (handler.go:713)
    github.com/pfnet/flexci/internal/frontend/core.(*registerHandlerFlow).Register.func1 (handler.go:116)
    net/http.HandlerFunc.ServeHTTP (server.go:1964)
    net/http.(*ServeMux).ServeHTTP (server.go:2361)
    google.golang.org/appengine/internal.executeRequestSafely (api.go:165)
    google.golang.org/appengine/internal.handleHTTP (api.go:124)
    net/http.HandlerFunc.ServeHTTP (server.go:1964)
    net/http.serverHandler.ServeHTTP (server.go:2741)
    net/http.(*conn).serve (server.go:1847)
    runtime.goexit (asm_amd64.s:1333)
  
  Cause: [NOT_FOUND] HTTP error: 404 Not Found: https://api.github.com/orgs/pfnet/members/ummavi
  2020-10-07 11:22:31.655612 github_create_comment.go:91] HTTP error: 404 Not Found: https://api.github.com/orgs/pfnet/members/ummavi
  
  Stack trace:
    github.com/pfnet/flexci/internal/frontend/handler/apihandler.callGithubAPI (github_create_comment.go:91)
    github.com/pfnet/flexci/internal/frontend/handler/apihandler.callGithubAPIWithRetry (github_create_comment.go:115)
    github.com/pfnet/flexci/internal/frontend/handler/apihandler.githubCheckMembershipHandler (github_check_membership.go:29)
    runtime.call64 (asm_amd64.s:523)
    reflect.Value.call (value.go:447)
    reflect.Value.Call (value.go:308)
    github.com/pfnet/flexci/internal/frontend/core.(*registerHandlerFlow).registerHandler.func1.1 (handler.go:178)
    github.com/pfnet/flexci/internal/frontend/core.(*apiHandlerFlow).callHandler (handler.go:466)
    github.com/pfnet/flexci/internal/frontend/core.(*apiHandlerFlow).doInternal (handler.go:318)
    github.com/pfnet/flexci/internal/frontend/core.(*apiHandlerFlow).Do (handler.go:277)
    github.com/pfnet/flexci/internal/frontend/core.(*registerHandlerFlow).registerHandler.func1 (handler.go:175)
    github.com/pfnet/flexci/internal/frontend/core.(*handlerFlow).Do (handler.go:713)
    github.com/pfnet/flexci/internal/frontend/core.(*registerHandlerFlow).Register.func1 (handler.go:116)
    net/http.HandlerFunc.ServeHTTP (server.go:1964)
    net/http.(*ServeMux).ServeHTTP (server.go:2361)
    github.com/pfnet/flexci/internal/common/api.callInternal.func2 (call.go:204)
    github.com/pfnet/flexci/internal/common/api.callInternal (call.go:212)
    github.com/pfnet/flexci/internal/common/api.Call (call.go:128)
    github.com/pfnet/flexci/internal/common/api.CallWithRetry (call.go:311)
    github.com/pfnet/flexci/internal/common/api.GithubCheckMembership (call.go:533)
    github.com/pfnet/flexci/internal/frontend/handler/apihandler.(*githubWebhookIssueCommentFlow).triggerTest (github_issue_comment.go:213)
    github.com/pfnet/flexci/internal/frontend/handler/apihandler.(*githubWebhookIssueCommentFlow).Do (github_issue_comment.go:99)
    github.com/pfnet/flexci/internal/frontend/handler/apihandler.githubIssueCommentHandler (github_issue_comment.go:47)
    runtime.call64 (asm_amd64.s:523)
    reflect.Value.call (value.go:447)
    reflect.Value.Call (value.go:308)
    github.com/pfnet/flexci/internal/frontend/core.(*registerHandlerFlow).registerHandler.func1.1 (handler.go:178)
    github.com/pfnet/flexci/internal/frontend/core.(*apiHandlerFlow).callHandler (handler.go:466)
    github.com/pfnet/flexci/internal/frontend/core.(*apiHandlerFlow).doInternal (handler.go:318)
    github.com/pfnet/flexci/internal/frontend/core.(*apiHandlerFlow).Do (handler.go:277)
    github.com/pfnet/flexci/internal/frontend/core.(*registerHandlerFlow).registerHandler.func1 (handler.go:175)
    github.com/pfnet/flexci/internal/frontend/core.(*handlerFlow).Do (handler.go:713)
    github.com/pfnet/flexci/internal/frontend/core.(*registerHandlerFlow).Register.func1 (handler.go:116)
    net/http.HandlerFunc.ServeHTTP (server.go:1964)
    net/http.(*ServeMux).ServeHTTP (server.go:2361)
    github.com/pfnet/flexci/internal/common/api.callInternal.func2 (call.go:204)
    github.com/pfnet/flexci/internal/common/api.callInternal (call.go:212)
    github.com/pfnet/flexci/internal/common/api.Call (call.go:128)
    github.com/pfnet/flexci/internal/common/api.GithubIssueComment (call.go:519)
    github.com/pfnet/flexci/internal/frontend/handler/xternalhandler.(*githubHookFlow).doInternal (github_webhook.go:146)
    github.com/pfnet/flexci/internal/frontend/handler/xternalhandler.(*githubHookFlow).Do (github_webhook.go:39)
    github.com/pfnet/flexci/internal/frontend/handler/xternalhandler.githubWebhookHandler (github_webhook.go:29)
    github.com/pfnet/flexci/internal/frontend/core.(*handlerFlow).Do (handler.go:713)
    github.com/pfnet/flexci/internal/frontend/core.(*registerHandlerFlow).Register.func1 (handler.go:116)
    net/http.HandlerFunc.ServeHTTP (server.go:1964)
    net/http.(*ServeMux).ServeHTTP (server.go:2361)
    google.golang.org/appengine/internal.executeRequestSafely (api.go:165)
    google.golang.org/appengine/internal.handleHTTP (api.go:124)
    net/http.HandlerFunc.ServeHTTP (server.go:1964)
    net/http.serverHandler.ServeHTTP (server.go:2741)
    net/http.(*conn).serve (server.go:1847)
    runtime.goexit (asm_amd64.s:1333)

@ummavi
Copy link
Member

ummavi commented Oct 7, 2020

/test

@pfn-ci-bot
Copy link

Successfully created a job for commit a3033cf:

@ummavi
Copy link
Member

ummavi commented Oct 14, 2020

/test

@pfn-ci-bot
Copy link

Successfully created a job for commit 867c1bb:

Copy link
Member

@ummavi ummavi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ummavi ummavi merged commit 8ad2cd0 into pfnet:master Oct 21, 2020
@muupan muupan added the enhancement New feature or request label Dec 11, 2020
@muupan muupan added this to the v0.2.0 milestone Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants