-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refine NER. #102
refine NER. #102
Conversation
README hasn't updated yet. |
b682e7d
to
9dbe1cd
Compare
|
||
forward_hidden, rnn_forward = stacked_rnn(word_caps_vector, hidden_dim, | ||
hidden_para_attr, rnn_para_attr) | ||
backward_hidden, rnn_backward = stacked_rnn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the two stacked_rnn
branches import separated hidden layers at the first level, while the two RNN branches in the raw config share the same hidden layer. It matters if strict consistency is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug, these two RNN should not share the parameters. I will fix this. Thanks for your comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modify the network configuration to keep it consistent with the original one.
sequence_tagging_for_ner/infer.py
Outdated
pred_str = "" | ||
for w, tag in zip(test_sample[0], | ||
probs[start_id:start_id + len(test_sample[0])]): | ||
pred_str += "%s[%s] " % (id_2_word[w], id_2_label[tag]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since lowercase is used, the outputs might be different from the raw texts. It should be noticed if the raw texts are wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hasn't be fixed yet. During inferring only normalized text is printed.
0ddc309
to
670378c
Compare
sequence_tagging_for_ner/README.md
Outdated
``` | ||
|
||
其中第一列为原始句子序列(第二、三列分别为词性标签和句法分析中的语块标签,这里暂时不用),第四列为采用了I-TYPE方式表示的NER标签(I-TYPE和BIO方式的主要区别在于语块开始标记的使用上,I-TYPE只有在出现相邻的同类别实体时对后者使用B标记,其他均使用I标记),句子之间以空行分隔。 | ||
- 第一列为原始句子序列 | ||
- 第二、三列分别为词性标签和句法分析中的语块标签,本利不使用 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"本利不使用"应改为"本例不使用"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
sequence_tagging_for_ner/README.md
Outdated
1. 输入文本的词典 | ||
2. 为词典中的词语提供预训练好的词向量 | ||
2. 标记标签的词典 | ||
标记标签词典已附在`data`目录中,对应于`data/target.txt`文件。输入文本的词典以及词典中词语的预训练的词向量来自:[Stanford CS224d](http://cs224d.stanford.edu/)课程作业。**为运行本例,请首先在`data`目录下运行`download.sh`脚本下载预训练的词向量。** 完成后会将这两个文件一并放入`data`目录下,输入文本的词典和预训练的词向量分别对应:`data/vocab.txt`和`data/wordVectors.txt`这两个文件。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"下载预训练的词向量"是否应为"下载输入文本的词典和预训练的词向量"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow comments, thank you.
sequence_tagging_for_ner/README.md
Outdated
1. 输入文本的词典 | ||
2. 为词典中的词语提供预训练好的词向量 | ||
2. 标记标签的词典 | ||
标记标签词典已附在`data`目录中,对应于`data/target.txt`文件。输入文本的词典以及词典中词语的预训练的词向量来自:[Stanford CS224d](http://cs224d.stanford.edu/)课程作业。**为运行本例,请首先在`data`目录下运行`download.sh`脚本下载预训练的词向量。** 完成后会将这两个文件一并放入`data`目录下,输入文本的词典和预训练的词向量分别对应:`data/vocab.txt`和`data/wordVectors.txt`这两个文件。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
sequence_tagging_for_ner/README.md
Outdated
``` | ||
|
||
其中第一列为原始句子序列(第二、三列分别为词性标签和句法分析中的语块标签,这里暂时不用),第四列为采用了I-TYPE方式表示的NER标签(I-TYPE和BIO方式的主要区别在于语块开始标记的使用上,I-TYPE只有在出现相邻的同类别实体时对后者使用B标记,其他均使用I标记),句子之间以空行分隔。 | ||
- 第一列为原始句子序列 | ||
- 第二、三列分别为词性标签和句法分析中的语块标签,本利不使用 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
refactor NER demo.