Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IUXray: train test validation split affecting token/id mappings #7

Open
schaefer30038 opened this issue Nov 17, 2022 · 5 comments
Open

Comments

@schaefer30038
Copy link

Hello, I've attempted to reproduce the results seen in the report however after using your weights, I've been getting a shape error that is due to the size of the word mappings (token_to_id and id_to_token). I've made my own annotations.json file to split the data (70-10-20 as indicated in the paper) however the random split is causing a difference in how many words get embedded into meaningful tokens and how many are embedded as . May I ask how you split the data and if it's possible to have access to the original annotations.json file?

@Markin-Wang
Copy link
Owner

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

@Liqq1
Copy link

Liqq1 commented Mar 1, 2023

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen?
In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134
image

the return is
x + self.dropout(sublayer(self.norm(x, memory)))
Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be
self.norm((x + self.dropout(sublayer(x))), memory)
Can you help me to distinguish? thanks a lot

@Markin-Wang
Copy link
Owner

Markin-Wang commented Mar 1, 2023

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134 image

the return is x + self.dropout(sublayer(self.norm(x, memory))) Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be self.norm((x + self.dropout(sublayer(x))), memory) Can you help me to distinguish? thanks a lot

Hi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.

Hope this can help you figure out the problem.

Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.

I am not sure and you may need to discuss this with the R2Gen authors to gain the details.

@Liqq1
Copy link

Liqq1 commented Mar 2, 2023

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134 image
the return is x + self.dropout(sublayer(self.norm(x, memory))) Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be self.norm((x + self.dropout(sublayer(x))), memory) Can you help me to distinguish? thanks a lot

Hi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.

Hope this can help you figure out the problem.

Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.

I am not sure and you may need to discuss this with the R2Gen authors to gain the details.

Thank you very much. With your reply, I can completely understand.
As the author of R2Gen may not have focused on issues of his project, I was unable to contact him and confirm.
All in all, thank you again!

@Liqq1
Copy link

Liqq1 commented Mar 4, 2023

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134 image
the return is x + self.dropout(sublayer(self.norm(x, memory))) Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be self.norm((x + self.dropout(sublayer(x))), memory) Can you help me to distinguish? thanks a lot

Hi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.

Hope this can help you figure out the problem.

Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.

I am not sure and you may need to discuss this with the R2Gen authors to gain the details.

Hi! I'm sorry to bother you again! I hope to get your help.

  1. Have you tried to reproduce the results using the weights provided by the author? I tried this, but the results(On IU X-ray) I got were not the same as those shown in the paper, which is higher than the paper.
    32671677915599_ pic

  2. Results on data set MIMIC-CXR. Why are the results in the following two tables different? The author mentioned "the number of memory slots is set to 3 by default" in the paper, but the results in Table 1 are different from the third row in Table 2, do you know why?
    78a6ffa93affabd391d0a3110576e306
    961ca76bab5895ca7412b593206ddf78
    thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants