Confusing notation/source for AttentionalAggregation #5400

RafiBrent · 2022-09-09T13:54:13Z

📚 Describe the documentation issue

I may be misunderstanding the documentation/code, and if so please correct me. However, I believe there are a few issues with the documentation for AttentionalAggregation. The first confusing aspect is the use of the Hadamard product symbol between the softmaxed output of h_gate (shape [-1, 1]) and the output of h_theta (shape [-1, out_channels]). As I understand it, the mathematical convention is that this symbol is only used between arrays of the same size, so if what is actually happening is that each row of the output of h_theta is scalar-multiplied by the corresponding entry of h_gate, I believe there is a clearer way to express this. Secondly, and more importantly, I believe that this module is performing a fundamentally different aggregation function from that in the paper that is cited in the documentation. Despite the superficial similarity of the formulas involved, Equation 3 in the “Gated Graph Sequence Neural Networks” paper involves applying a neural network to a feature vector pertaining to a single node, outputting a vector (rather than a scalar) and then softmaxing across this modified feature vector. Thus, instead of a single softmaxed vector of size num_nodes, the neural network from the paper actually generates num_nodes different softmaxed vectors, each of which are independently element-wise multiplied by the corresponding output of the second neural network. Fundamentally, it seems that the operation in the paper is applying attentional weights to the channels of each (post-neural-network) feature vector, while the operation in PyG is applying attentional weights to the set of nodes as a whole (since all channels of a given node are multiplied by the same scalar output of h_gate). Please let me know if this interpretation is correct, and if so it would be helpful if the citation were modified in some way to avoid the confusion. Thanks so much for your help.

Suggest a potential alternative/fix

No response

rusty1s · 2022-09-15T11:26:18Z

Really sorry for the late reply. Interestingly, we use the implementation from https://arxiv.org/pdf/1904.12787.pdf (Eq. 3) which cites the initial work of Li et al., 2015. I will change the reference in the documentation.

rusty1s · 2022-09-15T11:30:52Z

#5448

rusty1s · 2022-09-15T11:35:32Z

Also added support for feature-level gating, see #5449. Thank you!

RafiBrent added the documentation label Sep 9, 2022

rusty1s linked a pull request Sep 15, 2022 that will close this issue

AttentionalAggregation per-feature gating #5449

Merged

rusty1s closed this as completed in #5449 Sep 15, 2022

RafiBrent mentioned this issue Sep 18, 2022

Fixing Documentation for Node-Level AttentionalAggregation #5471

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusing notation/source for AttentionalAggregation #5400

Confusing notation/source for AttentionalAggregation #5400

RafiBrent commented Sep 9, 2022

rusty1s commented Sep 15, 2022

rusty1s commented Sep 15, 2022

rusty1s commented Sep 15, 2022 •

edited

Loading

Confusing notation/source for AttentionalAggregation #5400

Confusing notation/source for AttentionalAggregation #5400

Comments

RafiBrent commented Sep 9, 2022

📚 Describe the documentation issue

Suggest a potential alternative/fix

rusty1s commented Sep 15, 2022

rusty1s commented Sep 15, 2022

rusty1s commented Sep 15, 2022 • edited Loading

rusty1s commented Sep 15, 2022 •

edited

Loading