Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add files via upload #558

Conversation

iechevarriaz
Copy link
Contributor

Hello, this is my first attempt to contribute to a github repository. The aim of this PR is to try to get help in order to add Bos taurus as new species to the catalog and then adding an inferred Demographic model from literature.
Thank you!

@grahamgower grahamgower mentioned this pull request Jun 11, 2020
11 tasks
Copy link
Member

@grahamgower grahamgower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @iechevarriaz. It would be great to have Bos taurus in the catalog! I've opened an QC issue over at #559, which is a checklist of things that we'll look at before merging the new species into stdpopsim. This is quite a new process, so please let us know if you have any comments or questions. I've also left some comments next to your code. If you have specific questions, please fire away, and we'll do our best to help out.

BosTau.py Outdated
Comment on lines 19 to 22

# Create a string of chromosome lengths for easy parsing
_chromosome_data = """\
chr1 158534110
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here about which reference assembly (and version) the chromosome lengths come from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a URL for where you got it from that would be nice to have in a comment as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference assembly comes from Ensembl genome browser... Genome assembly: ARS-UCD1.2 (GCA_002263795.2) . https://www.ensembl.org/Bos_taurus/Info/Annotation
Worked out the chromosome lengths from each "chromosome summary" in Ensembl. Copied and pasted.
Eg. for Chr1 ....https://www.ensembl.org/Bos_taurus/Location/Chromosome?r=1:421347-158534110
Chr2... https://www.ensembl.org/Bos_taurus/Location/Chromosome?r=2:14572483-14672483

BosTau.py Outdated Show resolved Hide resolved
BosTau.py Outdated
Comment on lines 78 to 81
_genome = stdpopsim.Genome(
chromosomes=_chromosomes,
assembly_citations=[_assembly_citation])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also need citations for the mutation rate and recombination rate used above.

BosTau.py Outdated
###########################################################

def _iid():
id = "IonaInferredDemography"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iechevarriaz this still needs to be fixed. Please use this format ${SomethingDescriptive}_${number_of_populations}${first_author_initial}${two_digit_date}.

BosTau.py Outdated Show resolved Hide resolved
@jeromekelleher
Copy link
Member

This is great, thanks @iechevarriaz! Welcome to stdpopsim, and thanks for taking the time to make this great contribution! It'll be fantastic to have cattle in the catalog.

From a quick scan of the code, I think you're most of the way there. We'll need to get the QC process going for the species data, which we're still working out, so please bear with us there.

BosTau.py Outdated
Comment on lines 19 to 22

# Create a string of chromosome lengths for easy parsing
_chromosome_data = """\
chr1 158534110
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a URL for where you got it from that would be nice to have in a comment as well

BosTau.py Outdated
The Runs of Homozygosity-based infer of Demography from MacLeod et al. 2013.
"""
populations = [
stdpopsim.Population(id="FILL ME", description="FILL ME"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add population id and description. See other implemented models for examples.

BosTau.py Outdated
initial_size=90, growth_rate=0.0166,
metadata=populations[0].asdict())
],
#migration_matrix=[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be migration_matrix = [0] if you want to be explicit. Otherwise just delete.

BosTau.py Outdated Show resolved Hide resolved
BosTau.py Outdated
Comment on lines 83 to 87
_gen_time_citation = stdpopsim.Citation(
doi="",
year="",
author="",
reasons={stdpopsim.CiteReason.GEN_TIME})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You put 5 years as the generation time below, please add a citation for that number.

BosTau.py Outdated
Comment on lines 89 to 93
_pop_size_citation = stdpopsim.Citation(
doi="",
year="",
author="",
reasons={stdpopsim.CiteReason.POP_SIZE})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You put 62,000 below, please add a citation for that number here if you think it is a representative population size for cattle.

@ndukler
Copy link
Contributor

ndukler commented Jun 11, 2020

This is great, thanks @iechevarriaz! Welcome to stdpopsim, and thanks for taking the time to make this great contribution! It'll be fantastic to have cattle in the catalog.

From a quick scan of the code, I think you're most of the way there. We'll need to get the QC process going for the species data, which we're still working out, so please bear with us there.

I'd like to add my greetings as well :) Thanks for contributing!

@agladstein
Copy link
Contributor

Welcome @iechevarriaz! Since you are adding both a new species and a new demographic model we will doing the QC process for both the new species and new demographic model.
Could you open a new issue with the Model QC issue template to get that started? Thanks!

Your BosTau.py file should be moved to stdpopsim/catalog/.

@iechevarriaz
Copy link
Contributor Author

Thank you all for your commentaries and help!!!! And such a warm welcome! In the next couple of days, I'll be trying to resolve these citations and change parameters if needed. For that, I'm contacting people to point me in the right direction.

Here I've tried to put a descriptive word for the model, then the number of populations, the initial for the first author and the year of publication.
Added citations for mutation rates and recombination rates. Wrote the genome object with those citations.
Copy link
Contributor

@ndukler ndukler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! There are some changes that need to be made based on some additional integration with Ensembl that we've added.

BosTau.py Outdated
Comment on lines 22 to 53
_chromosome_data = """\
chr1 158534110
chr2 136231102
chr3 121005158
chr4 120000601
chr5 120089316
chr6 117806340
chr7 110682743
chr8 113319770
chr9 105454467
chr10 103308737
chr11 106982474
chr12 87216183
chr13 83472345
chr14 82403003
chr15 85007780
chr16 81013979
chr17 73167244
chr18 65820629
chr19 63449741
chr20 71974595
chr21 69862954
chr22 60773035
chr23 52498615
chr24 62317253
chr25 42350435
chr26 51992305
chr27 45612108
chr28 45940150
chr29 51098607
chrX 139009144
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the sudden change but this info should now be in an auto-generated file in this directory named genome_data.py this file is autogenerated by querying the ensembl REST API. We have a script for updating chromosome info for all species here but you probably want to tweak it to just add cow. (We really need to add at least a simple CLI to that file to make this easier). Finally, you will need to add an from . import genome_data to the top of this file and rename it __init__.py. Look at some of the other species for an example if you need them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ndukler , I'm trying to run the script for getting the chromosome info but it crashes, it cannot find module called "maintenance". Moreover, I don't know how to tweak it to get only cattle info. That should generate a file called genome_data.py , is that correct? I've already made some changes including rename of BosTau.py to init.py

Copy link
Contributor

@ndukler ndukler Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Also I suppose you can run it so it updates everything then discard all changes that are not to bosTau. I'll look into running it and get back to you. If I can just generate genome_data.py I'll just send it to you.

BosTau.py Outdated
_chromosomes.append(stdpopsim.Chromosome(
id=name, length=int(length),
mutation_rate=1.2e-8,
recombination_rate=9.26e-9)) # 25.5 crossovers per meiosis in males and 23.2 crossovers per meiosis in females gives an average of 24.35 crossovers per meiosis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutation and recombination rates look good! Relatively easy to see where you got them in the papers

BosTau.py Outdated
###########################################################

def _iid():
id = "IonaInferredDemography"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iechevarriaz this still needs to be fixed. Please use this format ${SomethingDescriptive}_${number_of_populations}${first_author_initial}${two_digit_date}.

@ndukler
Copy link
Contributor

ndukler commented Aug 16, 2020

Also, I think we're far enough along that you can send me the files for the recombination map. They should be placed in a flat tarball with one file per chromosome.

@agladstein
Copy link
Contributor

Hey @iechevarriaz, I just wanted to remind you that to start the demographic model QC process we'll need a new issue with the Model QC issue template.

@ndukler
Copy link
Contributor

ndukler commented Sep 2, 2020

@iechevarriaz I've updated the script on downloading a new species genome info and updated the devdocs. I've also been working on the species QC and most of it is good to go except some of the assembly stuff (most of which is just handled by the automated download). If you pull master you should have everything you need to move forward. If possible can we wrap this up this week?

@iechevarriaz
Copy link
Contributor Author

Hello @ndukler ! Thank you for the update... I have run the update_ensembl_data.py script and it fetches assemby_accession and assemby_name for BosTau, but it doesn't fill in the chromosome lengths...
Screenshot from 2020-09-02 19-26-04

@grahamgower
Copy link
Member

Hey @iechevarriaz. It looks like a bug in maintenance/enseml.py. I'll open a ticket.

@iechevarriaz
Copy link
Contributor Author

Thank you @grahamgower and @jeromekelleher ... now it works!!!!

Also, I think we're far enough along that you can send me the files for the recombination map. They should be placed in a flat tarball with one file per chromosome.

@ndukler Sorry but we don't have a recombination map for Bos taurus...

Is there anything else I should be doing?

@ndukler
Copy link
Contributor

ndukler commented Sep 3, 2020

I think you're good. Just push the updates and the chromosome info.

Thank you @grahamgower and @jeromekelleher ... now it works!!!!

Also, I think we're far enough along that you can send me the files for the recombination map. They should be placed in a flat tarball with one file per chromosome.

@ndukler Sorry but we don't have a recombination map for Bos taurus...

Is there anything else I should be doing?

@grahamgower
Copy link
Member

This looks like its basically done. I'm going to make some minor changes (in a new PR) so we can merge this.

@grahamgower
Copy link
Member

Closing in favour of #600.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants