Basic ancestry functionality #143

arvkevi · 2021-09-21T02:33:42Z

This PR adds basic functionality to predict genetic ancestry using ezancestry. @apriha please feel free to make suggestions/direct edits as you see fit, this is just to get the concept moving forward. Here's how a user could utilize this functionality from snps.

v2.2.0

v2.3.0

.github/workflows/ci.yml

https://stackoverflow.com/a/1051266

codecov · 2021-10-05T05:52:48Z

Codecov Report

Merging #143 (d29743f) into develop (8ca5d75) will increase coverage by 0.07%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop     #143      +/-   ##
===========================================
+ Coverage    93.44%   93.52%   +0.07%     
===========================================
  Files            8        8              
  Lines         1540     1559      +19     
  Branches       273      274       +1     
===========================================
+ Hits          1439     1458      +19     
  Misses          54       54              
  Partials        47       47

Impacted Files	Coverage Δ
src/snps/snps.py	`95.94% <100.00%> (+0.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ca5d75...d29743f. Read the comment docs.

apriha · 2021-10-05T06:09:34Z

@arvkevi I think we're close with getting the initial tests working. However, pip is taking a long time to search for compatible packages. I can fix this via a two step install, e.g.:

pip install ezancestry
pip install .

However, that defeats the simplicity of just pip install .[ezancestry]. Any ideas on how this can be improved?

arvkevi · 2021-10-06T10:04:27Z

Thank you for hacking on this PR, Andrew! I cut a release to ezancestry that supports 3.7, which is why I triggered the build yesterday w/ an empty commit. I am confused as to why this is taking so long to resolve dependencies. I'll spend some more time with it.

apriha · 2021-10-08T06:22:39Z

Hi Kevin, same here. FYI, I tried running the test-extras job locally via act, and dependencies were resolved quickly and without any issues...

python-versioneer/python-versioneer#206

This reverts commit 3bbe462.

apriha · 2021-10-09T04:37:06Z

Hey @arvkevi , turns out pip couldn't find the correct version of snps since the tag version history was not available after checkout; 4582b51 fixed it! Pretty close now... looks like some issues with finding ezancestry data.

apriha · 2021-10-11T04:19:30Z

I did some more testing with act and listed the contents of the equivalent of the /home/runner/.ezancestry/data/ directory... It looks like the ezancestry Python code is looking up filenames with a different case to what's actually on the filesystem; e.g., aisnps/Kidd.AISNP.txt (Python) vs aisnps/KIDD.AISNP.txt (actual). Same for models/knn.PCA.Kidd.population.bin and models/knn.PCA.Kidd.superpopulation.bin.

Hopefully that helps speed the troubleshooting along. 🙂

arvkevi · 2021-10-13T09:50:25Z

Thanks, Andrew. I will cut a new release this weekend with a fix for the filenames. I'll also setup my own ci in ezancestry so we don't languish on this branch. Thanks for being so patient with this.

arvkevi · 2021-10-19T02:58:28Z

I think I fixed the issue with the new release. The new errors are likely due newly trained models in the release. We can probably just update the assert value.

apriha · 2021-10-19T05:26:11Z

I think we're good @arvkevi! What are your thoughts on also exposing the raw predictions dataframe?

arvkevi · 2021-10-19T23:25:17Z

@apriha I think that's a good idea. I will put together some documentation with column descriptions.

arvkevi · 2021-10-20T01:24:17Z

I'll leave this here and feel free to modify and incorporate wherever you like.

Populations described below are defined here.
'component1', 'component2', 'component3':
The coordinates of the sample in the dimensionality-reduced component space. Can be used as (x, y, z,) coordinates for plotting in a 3d scatter plot.

predicted_population_population:
The max predicted population for the sample.

'ACB', 'ASW', 'BEB', 'CDX', 'CEU', 'CHB', 'CHS', 'CLM', 'ESN', 'FIN', 'GBR', 'GIH', 'GWD', 'IBS', 'ITU', 'JPT', 'KHV', 'LWK', 'MSL', 'MXL', 'PEL', 'PJL', 'PUR', 'STU', 'TSI', 'YRI',:
Predicted probabilities for each of the populations. These sum to 1.0.

'predicted_population_superpopulation':
The max predicted super population (continental) for the sample.

'AFR', 'AMR', 'EAS', 'EUR', 'SAS':
Predicted probabilities for each of the super populations. These sum to 1.0.

'population_description', 'superpopulation_name'
Descriptive names of the population and superpopulations.

apriha · 2021-10-22T05:43:59Z

@arvkevi updates incorporated. Please let me know what you think... If you agree, I think it's ready to merge. Thanks again for developing this awesome capability!

arvkevi · 2021-10-22T09:14:34Z

LGTM @apriha, thank you for all your hard work on this PR!

apriha and others added 7 commits July 5, 2021 22:18

Merge pull request apriha#138 from apriha/develop

d21dcbe

v2.2.0

Merge pull request apriha#141 from apriha/develop

b3ebe4c

v2.3.0

Basic ancestry functionality

41d8452

Black formatting

3ef507f

Black formatting

7346acd

Install ezancestry for testing

d3fcaad

Update pip

dd52e6e

arvkevi commented Oct 2, 2021

View reviewed changes

.github/workflows/ci.yml Outdated Show resolved Hide resolved

arvkevi and others added 4 commits October 4, 2021 07:39

Trigger Build

e19e5f0

Test ancestry predictions if ezancestry installed

c4e2e96

https://stackoverflow.com/a/1051266

Merge branch 'master' into feature/add_ancestry_predictions

c2903da

Add job for testing with extras (e.g., ezancestry)

e0bcf52

apriha added 2 commits October 7, 2021 23:03

Update Python packaging tools

921c546

Test extras with Python 3.7

54a5e6f

apriha added 3 commits October 8, 2021 20:27

Install ezancestry before snps

3bbe462

Ensure snps version can be determined

4582b51

python-versioneer/python-versioneer#206

Revert "Install ezancestry before snps"

4e30a99

This reverts commit 3bbe462.

arvkevi added 2 commits October 10, 2021 19:27

Run ezancestry to move data files to user's home directory

fc9a5d0

Valid ezancestry command

92d1dc3

apriha added 3 commits October 12, 2021 22:10

Compute predicted ancestry via property

ea36c05

Handle case when no SNPs are loaded

8dc7a0c

Mock ezancestry imports

4eadf6e

Fix and add ezancestry tests

250e417

apriha and others added 2 commits October 14, 2021 23:14

Update README

fd99424

ezancestry 0.0.6

7ccbdf4

apriha added 3 commits October 18, 2021 21:47

Update tests for newly trained models

92338e1

Update test-extras job

d3c5e91

Add documentation

247df10

Expose parameters to and output from ezancestry

1215b72

Rename method

d29743f

apriha merged commit 2e8ccfe into apriha:develop Oct 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic ancestry functionality #143

Basic ancestry functionality #143

arvkevi commented Sep 21, 2021

codecov bot commented Oct 5, 2021 •

edited

Loading

apriha commented Oct 5, 2021

arvkevi commented Oct 6, 2021

apriha commented Oct 8, 2021

apriha commented Oct 9, 2021

apriha commented Oct 11, 2021

arvkevi commented Oct 13, 2021

arvkevi commented Oct 19, 2021

apriha commented Oct 19, 2021

arvkevi commented Oct 19, 2021

arvkevi commented Oct 20, 2021

apriha commented Oct 22, 2021

arvkevi commented Oct 22, 2021

Basic ancestry functionality #143

Basic ancestry functionality #143

Conversation

arvkevi commented Sep 21, 2021

codecov bot commented Oct 5, 2021 • edited Loading

Codecov Report

apriha commented Oct 5, 2021

arvkevi commented Oct 6, 2021

apriha commented Oct 8, 2021

apriha commented Oct 9, 2021

apriha commented Oct 11, 2021

arvkevi commented Oct 13, 2021

arvkevi commented Oct 19, 2021

apriha commented Oct 19, 2021

arvkevi commented Oct 19, 2021

arvkevi commented Oct 20, 2021

apriha commented Oct 22, 2021

arvkevi commented Oct 22, 2021

codecov bot commented Oct 5, 2021 •

edited

Loading