Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine NER. #102

Merged
merged 3 commits into from
Jun 26, 2017
Merged

refine NER. #102

merged 3 commits into from
Jun 26, 2017

Conversation

lcy-seso
Copy link
Collaborator

refactor NER demo.

@lcy-seso lcy-seso requested a review from guoshengCS June 16, 2017 10:27
@lcy-seso
Copy link
Collaborator Author

README hasn't updated yet.

@lcy-seso lcy-seso force-pushed the refine_ner branch 2 times, most recently from b682e7d to 9dbe1cd Compare June 16, 2017 10:39

forward_hidden, rnn_forward = stacked_rnn(word_caps_vector, hidden_dim,
hidden_para_attr, rnn_para_attr)
backward_hidden, rnn_backward = stacked_rnn(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the two stacked_rnn branches import separated hidden layers at the first level, while the two RNN branches in the raw config share the same hidden layer. It matters if strict consistency is required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug, these two RNN should not share the parameters. I will fix this. Thanks for your comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modify the network configuration to keep it consistent with the original one.

pred_str = ""
for w, tag in zip(test_sample[0],
probs[start_id:start_id + len(test_sample[0])]):
pred_str += "%s[%s] " % (id_2_word[w], id_2_label[tag])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since lowercase is used, the outputs might be different from the raw texts. It should be noticed if the raw texts are wanted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't be fixed yet. During inferring only normalized text is printed.

@lcy-seso lcy-seso force-pushed the refine_ner branch 4 times, most recently from 0ddc309 to 670378c Compare June 21, 2017 05:33
```

其中第一列为原始句子序列(第二、三列分别为词性标签和句法分析中的语块标签,这里暂时不用),第四列为采用了I-TYPE方式表示的NER标签(I-TYPE和BIO方式的主要区别在于语块开始标记的使用上,I-TYPE只有在出现相邻的同类别实体时对后者使用B标记,其他均使用I标记),句子之间以空行分隔。
- 第一列为原始句子序列
- 第二、三列分别为词性标签和句法分析中的语块标签,本利不使用
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"本利不使用"应改为"本例不使用"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

1. 输入文本的词典
2. 为词典中的词语提供预训练好的词向量
2. 标记标签的词典
标记标签词典已附在`data`目录中,对应于`data/target.txt`文件。输入文本的词典以及词典中词语的预训练的词向量来自:[Stanford CS224d](http://cs224d.stanford.edu/)课程作业。**为运行本例,请首先在`data`目录下运行`download.sh`脚本下载预训练的词向量。** 完成后会将这两个文件一并放入`data`目录下,输入文本的词典和预训练的词向量分别对应:`data/vocab.txt`和`data/wordVectors.txt`这两个文件。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"下载预训练的词向量"是否应为"下载输入文本的词典和预训练的词向量"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator Author

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow comments, thank you.

1. 输入文本的词典
2. 为词典中的词语提供预训练好的词向量
2. 标记标签的词典
标记标签词典已附在`data`目录中,对应于`data/target.txt`文件。输入文本的词典以及词典中词语的预训练的词向量来自:[Stanford CS224d](http://cs224d.stanford.edu/)课程作业。**为运行本例,请首先在`data`目录下运行`download.sh`脚本下载预训练的词向量。** 完成后会将这两个文件一并放入`data`目录下,输入文本的词典和预训练的词向量分别对应:`data/vocab.txt`和`data/wordVectors.txt`这两个文件。
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

```

其中第一列为原始句子序列(第二、三列分别为词性标签和句法分析中的语块标签,这里暂时不用),第四列为采用了I-TYPE方式表示的NER标签(I-TYPE和BIO方式的主要区别在于语块开始标记的使用上,I-TYPE只有在出现相邻的同类别实体时对后者使用B标记,其他均使用I标记),句子之间以空行分隔。
- 第一列为原始句子序列
- 第二、三列分别为词性标签和句法分析中的语块标签,本利不使用
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@lcy-seso lcy-seso merged commit 436f480 into PaddlePaddle:develop Jun 26, 2017
@lcy-seso lcy-seso deleted the refine_ner branch June 26, 2017 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants