Training time of CULane is too long #8

ztjsw · 2021-04-16T04:05:32Z

Hello there,

The training time of train_culane.py is too long. I have trained 5 five days, it only runs at 30epoch. (Nvidia 1080ti)
Can we enlarge batch size or add multi-gpu training? Will it influence the performance?

Thanks.

The text was updated successfully, but these errors were encountered:

arangesh · 2021-04-16T18:28:06Z

30-40 epochs are usually enough for convergence on CULane. If you are interested in speeding up the training, you could definitely try increasing the batch size if you have enough memory. Unfortunately, we do not have multi-gpu support for now. This may be a feature we work on in the future.

Another reason for slow training times might be to do with the fact that we generate affinity field on the fly. We have to do this to support random transformations during training. A faster CPU and/or larger num_workers in the dataloader can probably help with this.

qinjian623 · 2021-04-25T02:13:48Z

Hi, @ztjsw
If you don't mind waiting one more week, I think I can open a PR of multi-gpu training.

ztjsw · 2021-04-26T06:31:03Z

@qinjian623 Thank you

qinjian623 · 2021-05-07T07:30:15Z

Hi, all.
Here is the PR:
#12

We may still need a lot of code reviews here. But this script could work right now.

So @ztjsw you can checkout that branch and start training directly.

qinjian623 · 2021-05-07T07:37:39Z

#12 (comment)

@ztjsw
Watch out for this. You may need fix some lines here to use official CULane.

andy-96 · 2021-05-27T09:59:42Z

Hey @qinjian623,

thanks for opening up a PR for multi-GPU training! Are there any known issues with your script? If not, I would be very happy to use it

qinjian623 · 2021-05-27T14:08:46Z

@andy-96
Hi, as I know, the script should just work, except this #12 (comment)

If any issue, you can send me message about the error, I will fix it.

zjsun7 · 2022-01-05T13:21:32Z

Hello there,

The training time of train_culane.py is too long. I have trained 5 five days, it only runs at 30epoch. (Nvidia 1080ti) Can we enlarge batch size or add multi-gpu training? Will it influence the performance?

Thanks.

hello, could you please tell me you torch and torchvision version，there is some error, when i make the make.sh file?

andy-96 · 2022-01-05T14:57:36Z

@zjsun7
If it is helpful for you, I am using torch==1.7.0+cu101 and torchvision==0.8.1+cu101

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training time of CULane is too long #8

Training time of CULane is too long #8

ztjsw commented Apr 16, 2021

arangesh commented Apr 16, 2021

qinjian623 commented Apr 25, 2021

ztjsw commented Apr 26, 2021

qinjian623 commented May 7, 2021

qinjian623 commented May 7, 2021

andy-96 commented May 27, 2021

qinjian623 commented May 27, 2021

zjsun7 commented Jan 5, 2022

andy-96 commented Jan 5, 2022

Training time of CULane is too long #8

Training time of CULane is too long #8

Comments

ztjsw commented Apr 16, 2021

arangesh commented Apr 16, 2021

qinjian623 commented Apr 25, 2021

ztjsw commented Apr 26, 2021

qinjian623 commented May 7, 2021

qinjian623 commented May 7, 2021

andy-96 commented May 27, 2021

qinjian623 commented May 27, 2021

zjsun7 commented Jan 5, 2022

andy-96 commented Jan 5, 2022