How to prepare a dataset to train "Quality Scorer" classifier? #449
Unanswered
kdcyberdude
asked this question in
Q&A
Replies: 1 comment 1 reply
-
Basically, "Quality Scorer" is a fasttext classifier that was trained to assign high scores to pages that are similar to "high quality" content like Wikipedia pages and books. "Document Coherence Scorer" is a scorer to assign high scores to pages where paragraphs are more "consistent", bases on their embedding cosine similarity. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to know the implementation details of the "Quality Scorer" and "Document Coherence Scorer" filters.
Beta Was this translation helpful? Give feedback.
All reactions