Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training time of CULane is too long #8

Open
ztjsw opened this issue Apr 16, 2021 · 9 comments
Open

Training time of CULane is too long #8

ztjsw opened this issue Apr 16, 2021 · 9 comments

Comments

@ztjsw
Copy link

ztjsw commented Apr 16, 2021

Hello there,

The training time of train_culane.py is too long. I have trained 5 five days, it only runs at 30epoch. (Nvidia 1080ti)
Can we enlarge batch size or add multi-gpu training? Will it influence the performance?

Thanks.

@arangesh
Copy link
Collaborator

30-40 epochs are usually enough for convergence on CULane. If you are interested in speeding up the training, you could definitely try increasing the batch size if you have enough memory. Unfortunately, we do not have multi-gpu support for now. This may be a feature we work on in the future.

Another reason for slow training times might be to do with the fact that we generate affinity field on the fly. We have to do this to support random transformations during training. A faster CPU and/or larger num_workers in the dataloader can probably help with this.

@qinjian623
Copy link

Hi, @ztjsw
If you don't mind waiting one more week, I think I can open a PR of multi-gpu training.

@ztjsw
Copy link
Author

ztjsw commented Apr 26, 2021

@qinjian623 Thank you

@qinjian623
Copy link

Hi, all.
Here is the PR:
#12

We may still need a lot of code reviews here. But this script could work right now.

So @ztjsw you can checkout that branch and start training directly.

@qinjian623
Copy link

#12 (comment)

@ztjsw
Watch out for this. You may need fix some lines here to use official CULane.

@andy-96
Copy link

andy-96 commented May 27, 2021

Hey @qinjian623,

thanks for opening up a PR for multi-GPU training! Are there any known issues with your script? If not, I would be very happy to use it

@qinjian623
Copy link

@andy-96
Hi, as I know, the script should just work, except this #12 (comment)

If any issue, you can send me message about the error, I will fix it.

@zjsun7
Copy link

zjsun7 commented Jan 5, 2022

Hello there,

The training time of train_culane.py is too long. I have trained 5 five days, it only runs at 30epoch. (Nvidia 1080ti) Can we enlarge batch size or add multi-gpu training? Will it influence the performance?

Thanks.

hello, could you please tell me you torch and torchvision version,there is some error, when i make the make.sh file?

@andy-96
Copy link

andy-96 commented Jan 5, 2022

@zjsun7
If it is helpful for you, I am using torch==1.7.0+cu101 and torchvision==0.8.1+cu101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants