IUXray: train test validation split affecting token/id mappings #7

schaefer30038 · 2022-11-17T18:18:53Z

Hello, I've attempted to reproduce the results seen in the report however after using your weights, I've been getting a shape error that is due to the size of the word mappings (token_to_id and id_to_token). I've made my own annotations.json file to split the data (70-10-20 as indicated in the paper) however the random split is causing a difference in how many words get embedded into meaningful tokens and how many are embedded as . May I ask how you split the data and if it's possible to have access to the original annotations.json file?

Markin-Wang · 2022-11-17T18:32:28Z

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Liqq1 · 2023-03-01T07:16:29Z

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen?
In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134

the return is
x + self.dropout(sublayer(self.norm(x, memory)))
Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be
self.norm((x + self.dropout(sublayer(x))), memory)
Can you help me to distinguish? thanks a lot

Markin-Wang · 2023-03-01T20:55:03Z

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134

the return is x + self.dropout(sublayer(self.norm(x, memory))) Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be self.norm((x + self.dropout(sublayer(x))), memory) Can you help me to distinguish? thanks a lot

Hi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.

Hope this can help you figure out the problem.

Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.

I am not sure and you may need to discuss this with the R2Gen authors to gain the details.

Liqq1 · 2023-03-02T02:20:32Z

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134
the return is x + self.dropout(sublayer(self.norm(x, memory))) Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be self.norm((x + self.dropout(sublayer(x))), memory) Can you help me to distinguish? thanks a lot

Hi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.

Hope this can help you figure out the problem.

Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.

I am not sure and you may need to discuss this with the R2Gen authors to gain the details.

Thank you very much. With your reply, I can completely understand.
As the author of R2Gen may not have focused on issues of his project, I was unable to contact him and confirm.
All in all, thank you again!

Liqq1 · 2023-03-04T08:27:44Z

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134
the return is x + self.dropout(sublayer(self.norm(x, memory))) Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be self.norm((x + self.dropout(sublayer(x))), memory) Can you help me to distinguish? thanks a lot

Hi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.

Hope this can help you figure out the problem.

Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.

I am not sure and you may need to discuss this with the R2Gen authors to gain the details.

Hi! I'm sorry to bother you again! I hope to get your help.

Have you tried to reproduce the results using the weights provided by the author? I tried this, but the results(On IU X-ray) I got were not the same as those shown in the paper, which is higher than the paper.
Results on data set MIMIC-CXR. Why are the results in the following two tables different? The author mentioned "the number of memory slots is set to 3 by default" in the paper, but the results in Table 1 are different from the third row in Table 2, do you know why?

thanks again！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IUXray: train test validation split affecting token/id mappings #7

IUXray: train test validation split affecting token/id mappings #7

schaefer30038 commented Nov 17, 2022

Markin-Wang commented Nov 17, 2022

Liqq1 commented Mar 1, 2023 •

edited

Loading

Markin-Wang commented Mar 1, 2023 •

edited

Loading

Liqq1 commented Mar 2, 2023

Liqq1 commented Mar 4, 2023

IUXray: train test validation split affecting token/id mappings #7

IUXray: train test validation split affecting token/id mappings #7

Comments

schaefer30038 commented Nov 17, 2022

Markin-Wang commented Nov 17, 2022

Liqq1 commented Mar 1, 2023 • edited Loading

Markin-Wang commented Mar 1, 2023 • edited Loading

Liqq1 commented Mar 2, 2023

Liqq1 commented Mar 4, 2023

Liqq1 commented Mar 1, 2023 •

edited

Loading

Markin-Wang commented Mar 1, 2023 •

edited

Loading