Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems that the this code reproduced results can not achieve the results in the original paper ? #1

Open
YihangLou opened this issue Mar 20, 2018 · 106 comments

Comments

@YihangLou
Copy link

No description provided.

@tengshaofeng
Copy link
Owner

ok, maybe i will try to do some image pre-processing and tune the super parameters to achieve that.
but this code performance well in my own implement about medical images recognition.

@YihangLou
Copy link
Author

Thanks for your sharing code. Maybe there are many tricks in the original implementation. But the performance margin with the paper reported results are too large. Hope you can perfectly reproduce the results in the future!

@tengshaofeng
Copy link
Owner

ok,i try to pre-process the image and keep the training process same as the paper.
and the current code does not do the padding ,crop, flip and so on, and i use the adam,(the paper is sgd),
and i only trained 100 epochs(about 204 epochs in the paper).

@tengshaofeng
Copy link
Owner

tengshaofeng commented Mar 21, 2018

@YihangLou , Hi, today I modified something, and get a new result:
accuracy for cifar10 test set : 92.66%

@tengshaofeng
Copy link
Owner

@YihangLou , i modify the optimizer, so the newest result now is 0.9354

@josianerodrigues
Copy link

Hi @tengshaofeng,
This result you got (0.9354) was using only the ResidualAttentionModel_92_32input network in the train.py file? Or do you first pretrain the network using the train_pre.py file and then train using the train.py file?

@123moon
Copy link

123moon commented May 18, 2018

Can you provide a trained model

@tengshaofeng
Copy link
Owner

@josianerodrigues , use only the train.py, train_pre.py is just my back up for code.

@tengshaofeng
Copy link
Owner

@123moon , i have provide the model of the final epoch. its accuracy is 0.9332.

@123moon
Copy link

123moon commented May 19, 2018

you provide the MODEL is 92-32,Do you have a model for the dataset imagenet 224???我可能用英语说不清,打扰你了,你有关于图像维数是224*224的训练模型吗??你的代码对我很有帮助,可是关于这个数据集,我没办法下载下来,所以请求你的帮助

@tengshaofeng
Copy link
Owner

@123moon ,
有224*224的训练模型的呀, residual_attention_network.py文件中的ResidualAttentionModel_92类就是。
下载imagenet可以访问http://image-net.org/download,需要你自己注册一下

@123moon
Copy link

123moon commented May 22, 2018

嗯呢,我看到了,我想问的是,有木有训练好的模型吖,我这个要跑起来要好久呢,我电脑内存不足,哎

@tengshaofeng
Copy link
Owner

tengshaofeng commented May 22, 2018

没有这个耶,我电脑也没那么多存储放那么大的数据

@josianerodrigues
Copy link

josianerodrigues commented May 22, 2018

Hi @tengshaofeng
Could you tell me what the effect of resetting the learning rate at a particular epoch?

# Decaying Learning Rate
if (epoch+1) / float(total_epoch) == 0.3 or (epoch+1) / float(total_epoch) == 0.6 or (epoch+1) / float(total_epoch) == 0.9:
        lr /= 10
        print('reset learning rate to:', lr)
        for param_group in optimizer.param_groups:
             param_group['lr'] = lr
             print(param_group['lr'])

@tengshaofeng
Copy link
Owner

@josianerodrigues , it is a trick for learning. when i decrease the learning rate, the loss decrease quickly.
It means that when i use lr=0.1 to train 90 epochs, i found loss tending to converge, then i decrease the lr=0.01, the loss decrease again.

@josianerodrigues
Copy link

thanks for the explanation :)

@zhangrong1722
Copy link

Hi @tengshaofeng
I also work on medical images.You mentioned this code worked well in your own implement about medical images recognition.I am in trouble when classify a medical image dataset.Could you tell me more details about it in your convenience?Or could you add my qq if possible?My qq number is 1922525328.

Thanks.

@tengshaofeng
Copy link
Owner

@estelle1722 , i use the 448input, it can convenience well.

@ondrejbiza
Copy link

Hi,
I also could not reproduce the results of the paper (with my implementation in Tensorflow) on CIFAR-10 even after exchanging a few emails with the author.

@tengshaofeng
Copy link
Owner

@ondrejba , what is your best result now?

@ondrejbiza
Copy link

My best accuracy was 94.32%, which is close to 95.01% reported in the paper, but it does not beat ResNet-164 with less parameters.

@tengshaofeng
Copy link
Owner

@ondrejba , ok,your result are really better. have you read the ResidualAttentionModel_92_32input architecture in my code? If there are some difference with yours? or if you can share the code with me?

@ondrejbiza
Copy link

I'm sorry for the delay. I'll look at your code over the weekend.

@tengshaofeng
Copy link
Owner

@ondrejba thanks

@ondrejbiza
Copy link

ondrejbiza commented Aug 20, 2018

I noticed many difference just from looking at residual_attention_network.py:

  • I use filter size 3 in the first convolution, you use 5 (probably not important)
  • I don't use max pooling (downsampling 32x32 images after the first convolution is not a good idea)
  • my filter counts for the three scales are [64, 128, 256] whereas you have [256, 512, 1024] filters

I bet there are more differences but I don't have time to go through the whole attention module.
I hope this helps.

@ondrejbiza
Copy link

I'm actually surprised that you achieved such a good CIFAR accuracy with max pooling at the start of the network.

@josianerodrigues
Copy link

josianerodrigues commented Aug 21, 2018

Hi @ondrejba, If possible could you make your code available?
Did you get 94% accuracy on what dataset and with which network? ResNet-164? What do you use after the first convolution? Sorry for taking your time.

@ondrejbiza
Copy link

Hello,
I got 94.32% accuracy with Attention92 on CIFAR-10. The 95.01% accuracy I mentioned is also for Attention92 evaluated on CIFAR-10; it was reported in the Residual Attention Networks paper but I didn't manage to replicate the results.
I will look into open-sourcing my code.

After the first convolution ... there are all the other convolutions in the network followed by average pooling and a single fully-connected layer. This architecture is described in the Residual Attention Networks paper as well as the Identity Mappings paper that is a follow up to the Deep Residual Learning paper.

Cheers,
Ondrej

@josianerodrigues
Copy link

Thank you :)

@ondrejbiza
Copy link

You're welcome!
Let me know if you manage to reproduce Fei Wang's results.

@zhongleilz
Copy link

请问你有出现过这个问题嘛,
TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:

  • (torch.device device)
  • (torch.Storage storage)
  • (Tensor other)
  • (tuple of ints size, torch.device device)
  • (object data, torch.device device)

@tengshaofeng
Copy link
Owner

@zhongleilz 请参照#3

@simi2525
Copy link

simi2525 commented Nov 1, 2018

Can anyone provide or refer me to trained models for CIFAR-10, CIFAR-100 or ImageNet-2017?

@tengshaofeng I saw the Attention-92 without mixup trained model, could you also upload it for the two results with mixup?

@tengshaofeng
Copy link
Owner

@simi2525 , you can train it yourself. Because trained models for githup is a little big. And when I train the model I have not saved the best model. Sorry.

@PistonY
Copy link

PistonY commented Nov 5, 2018

If you have enough time please have a try imagenet without wd。I use wd with 1e-4 can't reach paper result。

@ondrejbiza
Copy link

@PistonY Did you use this implementation?

@PistonY
Copy link

PistonY commented Nov 5, 2018

@PistonY Did you use this implementation?

Yes,and I do some simplification.But in Gluon not Pytorch.

@simi2525
Copy link

simi2525 commented Nov 5, 2018

@tengshaofeng for my current project, all I need are trained models, the one already uploaded is good enough. If I get the time to tinker with it in order to get the initial paper results, I'll be sure to let you know.

@tengshaofeng
Copy link
Owner

tengshaofeng commented Nov 5, 2018

@simi2525 ,um, the uploaded trained model is actually better than the one in initial paper. The one in paper is accuracy of 95.01% , and the uploaded one is accuracy of 95.4%. Both is based on attention-92.

@sankin1770
Copy link

sankin1770 commented Dec 11, 2018

@simi2525 ,um, the uploaded trained model is actually better than the one in initial paper. The one in paper is accuracy of 95.01% , and the uploaded one is accuracy of 95.4%. Both is based on attention-92.
May I ask what is the highest test accuracy of cifar10 in the papers you know to be employed at present?

@sankin1770
Copy link

@PistonY , hi, I can not find the paper AttentionResNeXt , Can you provide the paper name? 还是说你自己把Attention和ResNext结合,自己的一个尝试? 所以97%是自己给自己定的目标,并不是某篇论文里的最好结果?

感觉用了mixup这种 就是结果上了97%也没有啥意义啊 毕竟大家用这个都可以上去

@tengshaofeng
Copy link
Owner

@sankin1770 , 没用mixup,也有acc 95.4%,比原文中的高, 我这个项目只是复现论文的结果罢了。

@PistonY
Copy link

PistonY commented Dec 27, 2018

@sankin1770 天真.
@tengshaofeng 终于到97%了,太不容易了.

@sankin1770
Copy link

@sankin1770 天真.
@tengshaofeng 终于到97%了,太不容易了.

好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升

@PistonY
Copy link

PistonY commented Dec 27, 2018

@sankin1770 天真.
@tengshaofeng 终于到97%了,太不容易了.

好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升

多跑跑就知道提升哪怕0.1的精度有多难.方法不在于创新,而在于有用.而且mixup算是大的创新了,但是使用局限性也高.

@sankin1770
Copy link

@sankin1770 天真.
@tengshaofeng 终于到97%了,太不容易了.

好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升

多跑跑就知道提升哪怕0.1的精度有多难.方法不在于创新,而在于有用.而且mixup算是大的创新了,但是使用局限性也高.

是的 我初学 见谅

@tengshaofeng
Copy link
Owner

@PistonY , 你用了啥方法,提高到97%, 求指教

@PistonY
Copy link

PistonY commented Jan 2, 2019

@sankin1770
Copy link

@tengshaofeng https://arxiv.org/pdf/1812.01187.pdf

谢谢你们的帮助 我自己改进后也达到97%

@tengshaofeng
Copy link
Owner

@PistonY , u can always give me surprise. thanks.

@sankin1770
Copy link

@PistonY , u can always give me surprise. thanks.

你们两个大佬官方胡互吹 哈哈

@tengshaofeng
Copy link
Owner

@sankin1770 谢谢你的批判性建议

@PistonY
Copy link

PistonY commented Jan 9, 2019

@sankin1770 你用pytorch复现了那篇论文里面的方法吗?都用了什么到的97?

@PistonY
Copy link

PistonY commented Jan 9, 2019

@tengshaofeng @sankin1770 And welcome to have a look our new FaceRecognition project Gluon-Face

@Hiiamein
Copy link

I run the code by python train.py. Then I got the following errors. Do you know how to fix it?

model = ResidualAttentionModel().cuda()
  File "/cluster/home/it_stu19/ResidualAttentionNetwork-pytorch/Residual-Attention-Network/model/residual_attention_network.py", line 236, in __init__
    self.residual_block1 = ResidualBlock(32, 128)  # 32*32
  File "/cluster/home/it_stu19/ResidualAttentionNetwork-pytorch/Residual-Attention-Network/model/basic_layers.py", line 16, in __init__
    self.conv1 = nn.Conv2d(input_channels, output_channels/4, 1, 1, bias = False)
  File "/cluster/apps/anaconda3/5.3.0/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 315, in __init__
    False, _pair(0), groups, bias)
  File "/cluster/apps/anaconda3/5.3.0/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 38, in __init__
    out_channels, in_channels // groups, *kernel_size))
TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:
 * (torch.device device)
 * (torch.Storage storage)
 * (Tensor other)
 * (tuple of ints size, torch.device device)
 * (object data, torch.device device)

@skyguidance
Copy link

@ArtechStark It looks like you are using Python 3. The divide function in Python 3 is different in Python 2, which causes float result in nn.conv2D input.
Change ResidualBlock output_layer calculation in basic_layers.py will solve this problem.

@Hiiamein
Copy link

Hiiamein commented Mar 2, 2019

@skyguidance Thank you very much! The problem is solved now.

@gden138
Copy link

gden138 commented May 20, 2019

@tengshaofeng https://arxiv.org/pdf/1812.01187.pdf

谢谢你们的帮助 我自己改进后也达到97%

请问您如何改进的,能不能说一下细节,非常感谢。

@November666
Copy link

November666 commented Oct 23, 2019

excuse me,your code bring me a big help about my research,but,when i run the train.py,it appears the following errors,do you konw how to fix it? thank you!
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

@tengshaofeng
Copy link
Owner

tengshaofeng commented Oct 25, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests