-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add files via upload #558
Add files via upload #558
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @iechevarriaz. It would be great to have Bos taurus in the catalog! I've opened an QC issue over at #559, which is a checklist of things that we'll look at before merging the new species into stdpopsim. This is quite a new process, so please let us know if you have any comments or questions. I've also left some comments next to your code. If you have specific questions, please fire away, and we'll do our best to help out.
BosTau.py
Outdated
|
||
# Create a string of chromosome lengths for easy parsing | ||
_chromosome_data = """\ | ||
chr1 158534110 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here about which reference assembly (and version) the chromosome lengths come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a URL for where you got it from that would be nice to have in a comment as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference assembly comes from Ensembl genome browser... Genome assembly: ARS-UCD1.2 (GCA_002263795.2) . https://www.ensembl.org/Bos_taurus/Info/Annotation
Worked out the chromosome lengths from each "chromosome summary" in Ensembl. Copied and pasted.
Eg. for Chr1 ....https://www.ensembl.org/Bos_taurus/Location/Chromosome?r=1:421347-158534110
Chr2... https://www.ensembl.org/Bos_taurus/Location/Chromosome?r=2:14572483-14672483
BosTau.py
Outdated
_genome = stdpopsim.Genome( | ||
chromosomes=_chromosomes, | ||
assembly_citations=[_assembly_citation]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll also need citations for the mutation rate and recombination rate used above.
BosTau.py
Outdated
########################################################### | ||
|
||
def _iid(): | ||
id = "IonaInferredDemography" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get the tests to pass, this id will need to follow a specific naming convention. See https://stdpopsim.readthedocs.io/en/latest/development.html#adding-a-new-demographic-model and https://stdpopsim.readthedocs.io/en/latest/development.html#naming-conventions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iechevarriaz this still needs to be fixed. Please use this format ${SomethingDescriptive}_${number_of_populations}${first_author_initial}${two_digit_date}
.
This is great, thanks @iechevarriaz! Welcome to stdpopsim, and thanks for taking the time to make this great contribution! It'll be fantastic to have cattle in the catalog. From a quick scan of the code, I think you're most of the way there. We'll need to get the QC process going for the species data, which we're still working out, so please bear with us there. |
BosTau.py
Outdated
|
||
# Create a string of chromosome lengths for easy parsing | ||
_chromosome_data = """\ | ||
chr1 158534110 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a URL for where you got it from that would be nice to have in a comment as well
BosTau.py
Outdated
The Runs of Homozygosity-based infer of Demography from MacLeod et al. 2013. | ||
""" | ||
populations = [ | ||
stdpopsim.Population(id="FILL ME", description="FILL ME"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add population id and description. See other implemented models for examples.
BosTau.py
Outdated
initial_size=90, growth_rate=0.0166, | ||
metadata=populations[0].asdict()) | ||
], | ||
#migration_matrix=[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be migration_matrix = [0]
if you want to be explicit. Otherwise just delete.
BosTau.py
Outdated
_gen_time_citation = stdpopsim.Citation( | ||
doi="", | ||
year="", | ||
author="", | ||
reasons={stdpopsim.CiteReason.GEN_TIME}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You put 5 years as the generation time below, please add a citation for that number.
BosTau.py
Outdated
_pop_size_citation = stdpopsim.Citation( | ||
doi="", | ||
year="", | ||
author="", | ||
reasons={stdpopsim.CiteReason.POP_SIZE}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You put 62,000 below, please add a citation for that number here if you think it is a representative population size for cattle.
I'd like to add my greetings as well :) Thanks for contributing! |
Welcome @iechevarriaz! Since you are adding both a new species and a new demographic model we will doing the QC process for both the new species and new demographic model. Your BosTau.py file should be moved to |
Thank you all for your commentaries and help!!!! And such a warm welcome! In the next couple of days, I'll be trying to resolve these citations and change parameters if needed. For that, I'm contacting people to point me in the right direction. |
Here I've tried to put a descriptive word for the model, then the number of populations, the initial for the first author and the year of publication.
Added citations for mutation rates and recombination rates. Wrote the genome object with those citations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! There are some changes that need to be made based on some additional integration with Ensembl that we've added.
BosTau.py
Outdated
_chromosome_data = """\ | ||
chr1 158534110 | ||
chr2 136231102 | ||
chr3 121005158 | ||
chr4 120000601 | ||
chr5 120089316 | ||
chr6 117806340 | ||
chr7 110682743 | ||
chr8 113319770 | ||
chr9 105454467 | ||
chr10 103308737 | ||
chr11 106982474 | ||
chr12 87216183 | ||
chr13 83472345 | ||
chr14 82403003 | ||
chr15 85007780 | ||
chr16 81013979 | ||
chr17 73167244 | ||
chr18 65820629 | ||
chr19 63449741 | ||
chr20 71974595 | ||
chr21 69862954 | ||
chr22 60773035 | ||
chr23 52498615 | ||
chr24 62317253 | ||
chr25 42350435 | ||
chr26 51992305 | ||
chr27 45612108 | ||
chr28 45940150 | ||
chr29 51098607 | ||
chrX 139009144 | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the sudden change but this info should now be in an auto-generated file in this directory named genome_data.py
this file is autogenerated by querying the ensembl REST API. We have a script for updating chromosome info for all species here but you probably want to tweak it to just add cow. (We really need to add at least a simple CLI to that file to make this easier). Finally, you will need to add an from . import genome_data
to the top of this file and rename it __init__.py
. Look at some of the other species for an example if you need them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @ndukler , I'm trying to run the script for getting the chromosome info but it crashes, it cannot find module called "maintenance". Moreover, I don't know how to tweak it to get only cattle info. That should generate a file called genome_data.py , is that correct? I've already made some changes including rename of BosTau.py to init.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. Also I suppose you can run it so it updates everything then discard all changes that are not to bosTau. I'll look into running it and get back to you. If I can just generate genome_data.py
I'll just send it to you.
BosTau.py
Outdated
_chromosomes.append(stdpopsim.Chromosome( | ||
id=name, length=int(length), | ||
mutation_rate=1.2e-8, | ||
recombination_rate=9.26e-9)) # 25.5 crossovers per meiosis in males and 23.2 crossovers per meiosis in females gives an average of 24.35 crossovers per meiosis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutation and recombination rates look good! Relatively easy to see where you got them in the papers
BosTau.py
Outdated
########################################################### | ||
|
||
def _iid(): | ||
id = "IonaInferredDemography" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iechevarriaz this still needs to be fixed. Please use this format ${SomethingDescriptive}_${number_of_populations}${first_author_initial}${two_digit_date}
.
Also, I think we're far enough along that you can send me the files for the recombination map. They should be placed in a flat tarball with one file per chromosome. |
Hey @iechevarriaz, I just wanted to remind you that to start the demographic model QC process we'll need a new issue with the Model QC issue template. |
@iechevarriaz I've updated the script on downloading a new species genome info and updated the devdocs. I've also been working on the species QC and most of it is good to go except some of the assembly stuff (most of which is just handled by the automated download). If you pull master you should have everything you need to move forward. If possible can we wrap this up this week? |
Hello @ndukler ! Thank you for the update... I have run the update_ensembl_data.py script and it fetches assemby_accession and assemby_name for BosTau, but it doesn't fill in the chromosome lengths... |
Hey @iechevarriaz. It looks like a bug in |
Thank you @grahamgower and @jeromekelleher ... now it works!!!!
@ndukler Sorry but we don't have a recombination map for Bos taurus... Is there anything else I should be doing? |
I think you're good. Just push the updates and the chromosome info.
|
This looks like its basically done. I'm going to make some minor changes (in a new PR) so we can merge this. |
Closing in favour of #600. |
Hello, this is my first attempt to contribute to a github repository. The aim of this PR is to try to get help in order to add Bos taurus as new species to the catalog and then adding an inferred Demographic model from literature.
Thank you!