Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pubmed data source main table #31

Merged
merged 1 commit into from
Jan 4, 2024

Conversation

BasVerlooy
Copy link
Contributor

@BasVerlooy BasVerlooy commented Dec 1, 2023

  • Add the main table for the Pubmed datasource
  • Create a new xml method which allows you to get an element in an array by the value of it's attribute
  • Move some code from CrossRef into the data_source.py as the code is also being used by pubmed now and removes code duplication. Also add the option to check for file extensions when loading files, MacOS will often create .DS_Store files and this caused the loading to fail.
  • Add tests for the new pubmed datasource
  • Add a test for the new XML method

@dspinellis
Copy link
Owner

Thank you for your contribution! I think it addresses two topics: PubMed data source and progress bar. In order to help me review it, please do the following.

  • Split the PR into one for each separate feature, so that they can be individually merged
  • Squash the commits into one commit for each PR (git-rebase is your friend)
  • Remember to start each commit message with an uppercase letter
  • Add an option to disable the progress bar output (make the program silent / quiet)
  • Ensure the progress bar works correctly with the files-read debug option and Python API invocation (it should not appear)

@BasVerlooy
Copy link
Contributor Author

The progress bar has been removed from this branch and will have it's own PR, it will also have a setting to disable it and work correctly with the debug option and Python API invocation.

Furthermore the commits have been squashed into 1 commit, for future commit I'll take into account that the messages should start with a capital.

@dspinellis dspinellis closed this Dec 23, 2023
@dspinellis dspinellis reopened this Dec 23, 2023
@dspinellis dspinellis changed the title Add Pubmed datasource main table Add Pubmed data source main table Dec 23, 2023
Copy link
Owner

@dspinellis dspinellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thank you! I assume another PR will add detail tables?

"doi",
getter_by_attribute(
"IdType", "doi", "PubmedData/ArticleIdList/ArticleId"
),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalize this to lowercase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the source XML, so I don't think this can be lowercase
image

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear enough. I am suggesting that you normalize the obtained DOI into lowercase to match the normalization done in other parts of a3k.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah I see, I'll look at the implementation for CrossRef and see what can be reused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doi has been changed to lowercase

tests/data_sources/test_pubmed.py Outdated Show resolved Hide resolved
src/alexandria3k/file_pubmed_cache.py Outdated Show resolved Hide resolved
src/alexandria3k/data_sources/pubmed.py Outdated Show resolved Hide resolved
test new pubmed datasource

test new xml method

avoid code duplication by generic datafiles class

remove unused imports

move constant out of class

create variable to avoid code duplicaiton

small pubmed cleanup

improve how progress bar looks

remove crossref references from pubmed tests

run pre-commit on xml test file

Remove progress bar from datasource.py

Remove progress bar from datasource.py

Add default source constant back

Add default source constant back

Add own name as author for Pubmed

Lowercase DOI like in Crossref
@dspinellis dspinellis merged commit e4645a1 into dspinellis:main Jan 4, 2024
5 checks passed
@dspinellis
Copy link
Owner

Great, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants