Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Journal suggestion bias towards certain journals #165

Open
jdagdelen opened this issue Nov 14, 2019 · 3 comments
Open

Journal suggestion bias towards certain journals #165

jdagdelen opened this issue Nov 14, 2019 · 3 comments
Labels
backend Requires changes on the backend not directly solved by unibox Issue is unrelated to or not superceded by unibox

Comments

@jdagdelen
Copy link
Contributor

I was just playing around with this and it seems like the Journal of Electroceramics is suggested for a ton of abstracts in our example set. @kevinyang8 It seems like there may be some biasing effects that we should work on. I don't this this disqualifies the app, but we should put a disclaimer on it and describe how the suggestions are calculated.

@jdagdelen
Copy link
Contributor Author

We might want to do some benchmarking and see if there are any journals that should be downsampled. Maybe run the prediction for 10,000 abstracts from various journals and see if there are any journals that turn up more than we'd expect?

@computron
Copy link

@jdagdelen are there clear examples of abstracts that don't belong to that journal, but for which that journal is predicted?

@computron
Copy link

@kevinyang8 one easy test you can do to test for journal bias is to look at your test set, and see whether certain journals have larger errors than others.

e.g., let's say that abstracts in the test set which should be in "Nature Materials" list Nature Materials as one of the top 3 predictions 80% of time. But abstracts in the test set which should be in "Physical Review B" list that journal as one of the top 3 predictions only 20% of time. This would be evidence of bias against Physical Review B.

One reason for journal bias could be imbalanced data classes. e.g., if you have 500 abstracts in the test set for Nature Materials, and 20 abstracts for Physical Review B, then the algorithm will get loewr errors on the test set by guessing Nature Materials more frequently

@ardunn ardunn added backend Requires changes on the backend not directly solved by unibox Issue is unrelated to or not superceded by unibox labels Nov 20, 2019
@ardunn ardunn added not directly solved by unibox Issue is unrelated to or not superceded by unibox and removed not directly solved by unibox Issue is unrelated to or not superceded by unibox labels Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Requires changes on the backend not directly solved by unibox Issue is unrelated to or not superceded by unibox
Projects
None yet
Development

No branches or pull requests

3 participants