-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the AMR graph constrction and RGCN in example #578
base: develop
Are you sure you want to change the base?
Conversation
52d81ee
to
dee086d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the comments.
graph4nlp/pytorch/data/data.py
Outdated
@@ -195,6 +218,18 @@ def node_features(self) -> NodeFeatView: | |||
|
|||
return self.nodes[:].features | |||
|
|||
@property | |||
def ntypes(self) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the graph is homogeneous, it should be an exception.
graph4nlp/pytorch/data/data.py
Outdated
@@ -157,6 +168,12 @@ def add_nodes(self, node_num: int): | |||
node_num > 0 | |||
), "The number of nodes to be added should be greater than 0. (Got {})".format(node_num) | |||
|
|||
if not self.is_hetero: | |||
assert ntypes is None, "The graph is homogeneous, ntypes should be None." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please raise exceptions
graph4nlp/pytorch/data/data.py
Outdated
@@ -326,6 +361,18 @@ def edges(self) -> EdgeView: | |||
""" | |||
return EdgeView(self) | |||
|
|||
@property | |||
def etypes(self) -> List[Tuple[str, str, str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it is homogeneous, self._etypes is not assigned.
graph4nlp/pytorch/data/data.py
Outdated
@@ -641,6 +713,41 @@ def edge_attributes(self) -> List[Dict[str, torch.Tensor]]: | |||
return self._edge_attributes | |||
|
|||
# Conversion utility functions | |||
def make_data_dict(self) -> Dict[Tuple[str, str, str], Tuple[torch.Tensor, torch.Tensor]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be a utility function.
graph4nlp/pytorch/data/dataset.py
Outdated
edge_token = graph.edge_attributes[edge_idx]["token"] | ||
s.add(edge_token) | ||
except Exception as e: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please handle the catched exception
graph4nlp/pytorch/data/dataset.py
Outdated
@@ -705,6 +709,41 @@ def _process(self): | |||
self.test = self.build_topology(self.test) | |||
if "val" in self.__dict__: | |||
self.val = self.build_topology(self.val) | |||
# build_edge_vocab and save | |||
if self.init_edge_vocab: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a new function similar to "build_vocab()". E.g., build_edge_vocab()
@@ -311,6 +311,7 @@ def __init__( | |||
for_inference=False, | |||
reused_vocab_model=None, | |||
nlp_processor_args=None, | |||
init_edge_vocab=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init_edge_vocab is not clear. It covers 1.build edge vocab, 2. an indicator to use heterogeneous graph.
TODO: discuss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- build_edge_vocab: bool
- is_hetero: bool
) | ||
data_item_collect.append(data_item) | ||
|
||
data_item_collect = self.dataset.build_topology(data_items=data_item_collect) | ||
for i in range(len(data_item_collect)): | ||
data_item_collect[i] = self.dataset._vectorize_one_dataitem( | ||
data_item_collect[i], self.vocab_model, use_ie=use_ie | ||
data_item_collect[i], self.vocab_model, use_ie=use_ie, edge_vocab=self.edge_vocab |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the api in dataset._vectorize_one_dataitem doesn't have the parameter: "edge_vocab", please check it.
_vectorize_one_dataitem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- move the edge vocab to the library code
- unify edge_vocab with vocab_model
@@ -140,6 +140,9 @@ def __init__( | |||
"w2v_bert", | |||
"w2v_bert_bilstm", | |||
"w2v_bert_bigru", | |||
"w2v_amr", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why amr shoud be a general embedding strategy?
I don't think it should be regarded as a new general embedding strategy.
@@ -211,6 +222,17 @@ def __init__( | |||
else: | |||
rnn_input_size = word_emb_size | |||
|
|||
if "pos" in word_emb_type: | |||
self.word_emb_layers["pos"] = WordEmbedding(37, 50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't use magic numbers.
dee086d
to
0d0400c
Compare
0d0400c
to
0dd8edb
Compare
Description
Checklist
Please feel free to remove inapplicable items for your PR.
or have been fixed to be compatible with this change
Changes