-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Make program thread-local to support multi-threading #338
Conversation
ef28b31
to
c378481
Compare
@yzh119 This PR should fix the multi-gpu issue. Can you double check? |
Thanks, the PR solved my problem. |
Lingfan, could you run treelstm on multi gpu to see the improvement?
…On Sat, Jan 5, 2019, 2:18 PM Zihao Ye ***@***.***> wrote:
Thanks, the PR solved my problem.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmlc_dgl_pull_338-23issuecomment-2D451683128&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=TjZkSg9Jv2ODj9HXvCAcwgw9aSXPP3jh5bcRbCbUO5s&m=e_NWBIH78C4C3HHVJGYWN2YvbHbzQceYgT1zlp8P8s8&s=CYbUFKHMGqUyLc1NvpF01-6Oz3o7XvJg5jl9Q5xXz_Y&e=>,
or mute the thread
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AD3qZTcxBWvrmRfYzx5PslxUdgLh7IWGks5vAPp2gaJpZM4ZveVR&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=TjZkSg9Jv2ODj9HXvCAcwgw9aSXPP3jh5bcRbCbUO5s&m=e_NWBIH78C4C3HHVJGYWN2YvbHbzQceYgT1zlp8P8s8&s=yeGMc_gVD39y5XmEPtM99WHxNc3TpkOlaJe4dmvhd6w&e=>
.
|
well, i find that currently enabling multi-GPU could not speed up training, the number of active GPUs is always one during training. |
@yzh119 Sounds weird. When I ran your transformer on a 4-GPU instance, all GPUs were active but with low utilization (<25%). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The performance issue seems about multi-threading itself, which is not the purpose of this PR.
Description
DGLGraph uses one global execution plan / program, which leads to data racing when multi-threading (like PyTorch DataParallel) is used. This PR fix this bug by making schedule a threading.local object. (#302)
Checklist
or have been fixed to be compatible with this change
Changes