Clarification Regarding "All Patch Representations" in the Pre-training Diagram #1689

liruixinxinxin · 2025-02-10T03:34:00Z

I hope this email finds you well. I am currently working on understanding your work regarding the pre-training procedure for masked audio prediction, and I have encountered a question related to the "All Patch Representations" in the figure of your paper.
Specifically, I am referring to the notation {M} in the "Label Predictor" block. I would appreciate it if you could clarify the following points:
If {M} refers to directly using the masked feature as it is, I am concerned that the dimensions may not align properly.
Alternatively, if I follow the description provided in the paper, should the {M} be all zeros as part of the masking process? I am unsure if my understanding of this aspect is correct, and I would greatly appreciate your confirmation.

Thank you for your time, and I look forward to your response.

liruixinxinxin · 2025-02-12T12:42:33Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification Regarding "All Patch Representations" in the Pre-training Diagram #1689

Clarification Regarding "All Patch Representations" in the Pre-training Diagram #1689

liruixinxinxin commented Feb 10, 2025

liruixinxinxin commented Feb 12, 2025

Clarification Regarding "All Patch Representations" in the Pre-training Diagram #1689

Clarification Regarding "All Patch Representations" in the Pre-training Diagram #1689

Comments

liruixinxinxin commented Feb 10, 2025

liruixinxinxin commented Feb 12, 2025