-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time to switch to BOLDv5 #110
Comments
Is there a time when it happens more often? I can try reaching out to them, because if it can happen when only asking for one species, it's definitively on their server side. |
Oooooooooooh, the new api is finaly out! 😮 Well, I'll check how much has changed and see how long it will take to update. |
Yesterday afternoon was particularly bad, evening not so much and I managed to get most of what I was after (all COI-5p sequences for Animalia), but I'm pretty sure it's not all of it which is annoying for reproducibility. |
BOLD is really odd. When I use the web interface to get records for the order Diptera, It lists 6,660,909 records. When I try to get these via the R tool, it times out, but I can loop over all the families under Diptera and query those individually. When I do, I get 6,241,574 records, failures for two families (Heterocheilidae & Braulidae), and timeouts for three families (Perissommatidae, Neminidae, Teratomyzidae). Searching for the failed families on the website search interface returns 1 & 3 sequences respectively. Searching for the three families that are timing out on the website search interface returns no hits despite the fact I know there are some in the database I think because there are no public records (following the "sub-taxa" links from Animalia > Arthropoda > Insecta > Diptera, it lists the families (e.g., https://v4.boldsystems.org/index.php/Taxbrowser_Taxonpage?taxid=532727, but lists that there is no public data available for this family). Using this tool to try and pull data for this family (where there is no public data) with See, this is why I hate this restricted kind of database and semi-closed-source system. |
I see their taxonomy broser is still using v4. There was some inconsistency between the taxonomy api and the seq/specimen one. They might still be working on fixing those tbh.
|
Well, it seems they decided to do their on package https://github.com/boldsystems-central/BOLDconnectR/ ... |
Well, on their new web site, there's only one specimen for that one. |
That's partially why they redid the whole API. |
I'll give it a go and see and let you know. The code is working and giving me useful sequences eventually, I just have no idea whether it's complete at the end, after iterating over different taxonomic levels and multiple retrying to get around the BOLD database http errors, to try and get everything. And if it's not complete, I've got no idea what's missing. |
Update:
My conclusion is that, at the moment, it's practically impossible to get hold of data at scale from BOLD P.S For reference, and in case it's useful for anyone else, the script I'm using to attempt to get the data is:
|
I'm trying to use this package to get a bunch of COI sequences from BOLD. I know the code works because sometimes it returns things, but mostly I'm plagued with intermittent
Warning: Content was type '' when it should've been type 'text/html; charset=utf-8'
errors, or, for larger groups of sequences, occasionallyThe request timed out, see 'If a request times out'. returning partial output
For example:
Is this normal? Is something wrong with the bold servers? Can I set the timeout length of the maximum number of sequences to return or something?
The text was updated successfully, but these errors were encountered: