-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
-doSaf retains no site with -sites #385
Comments
Hey Kwi, Out of curiosity are you using By contrast, when using I'm not sure if this is because of the different HTSLIB versions (1.10.2 through Hope this helps! |
I am having this exact same problem as well! When I give -dosaf a sites file (of sites I identified with the exact same dataset), and no other filters, it returns 0 sites after filtering. I just tried downloading a fresh version of angsd/htslib. I did not use conda, I used git clone etc. Unfortunately, I get the exact same result -- 0 sites retained. Is this a recent problem? Might this work if I were to install/use an older version of angsd? In the meantime, is there any other way to filter a saf file? E.g. to get a saf file for the entire genome and then filter that down to the sites that I need, using awk or something? Unfortunately the fact that the saf file is in an unusual format means that I am not sure how I would do that, but if anyone knows a way it would be great to hear! Also, I am using the saf file with the aim of making a site frequency spectrum (as I believe many people who use -dosaf are). Could I apply a sites filter at the realSFS step? Would that give me an SFS built with only my sites of interest even if the saf input contains all sites in the genome? Thanks, |
Hi James, I have used your Singularity container and it did solve the problems. Thanks a lot ! |
Dear Teresa, As I commented above, James' solution worked for me. For your other questions, I am not sure if you can do that. I hope you find your answers! |
@kwiyounghan Glad to hear the Singularity container worked! @TeresaPegan I'm not sure about filtering the SAF file. You could print it using James |
Thank you both for your feedback. I will look into trying to set up the singularity container you linked with my university's computing cluster support group. In the meantime, though, I tried just installing and using ANGSD v0.933 and HTSLIB v1.11 on my cluster account. This did not work, however, because -dosaf just had some other kind of error. I'll paste some of the output below. I wonder if this means the singularity container would not work for me anyway? It's great to hear that using the -sites filter on realSFS should work. It is too bad that you have to -dosaf on the whole genome first to use it, but I have enough flexibility in time and storage space that I think it's the best thing for me to do at this point. I'll keep an eye out for future updates to ANGSD that might fix this bug with the sites file and -dosaf! Thanks,
...and this continues for all of my chromosomes, so the program exits after about 10 seconds and nothing is done. When I ran the same code using the most up-to-date versions of ANGSD and HTSLIB I did not get this error about not finding data for the chromosomes (though of course I did get the 0 sites retained issue), so I assume this has something to do with the older versions I was trying out. |
The nspope_bandedDP branch has storage/memory requirements for SAF files that are far less than the master branch and is probably better if you're outputting the entire genome (at least, it'll be better if you have more than a few samples). @ANGSD given all these htslib issues that seem to be popping up, maybe need a container or at least a Dockerfile with well-behaved commits ... ? |
@James-S-Santangelo I was able to get your singularity container on my cluster and run dosaf with -sites successfully on it, thanks!! |
Hi it seems I am having the same problem as Kwi, and I am interested in using the singularity container version, but the link to that by James isn't working... I tried googling it and it got me here: https://github.com/ANGSD/angsd/blob/master/README.md Is this correct? Thanks! |
Hey Erica, Singularity seems to have changed how they build and store containers in the past few days so that link is now broken. However, the previously linked container is still available in their archived repository (see here). I haven't yet looked into what (if anything) has changed in terms of how these containers are now supposed to be maintained, but I have confirmed that the following command works for downloading the container locally:
Hope this helps! James |
Sweet, thanks James! |
This issue looks related to #348 |
Hi, so I was able to install with the singularity container, but now I am running into errors running angsd in my bash script. My script is: #!/bin/bash singularity exec singularity-recipes_angsd_v0.933.sif /opt/bin/angsd ./angsd -bam subsetbam.filelist -GL 1 -doMaf 1 -doMajorMinor 1 -nThreads 1 -out test When I try the command with './angsd' at the beginning I get the error "./angsd: Is a directory" And when I try with just "angsd" at the beginning I get the error "angsd: command not found" Could anyone let me know how they ran angsd with singularity on a cluster? Many thanks! |
In my cluster, I have a folder in my home directory called "angsd_sing." Within that is a script called "angsd_sing" that with this in it:
The .sif file in referred to here is the singularity script that the cluster IT people helped me with. I also add this to my path:
I have to load a module called singularity. Once I do that, I just call the program with
Hope this helps! |
Great, I'll try that- thanks Teresa! |
Hello, there is a lot of information in this thread. Some of it relates to a conda version which I am not familiar. Here is how it is supposed to work with the github angsd version
Then this should be parsed as an option to angsd with
you can validate that angsd correctly interprets your original file with
There is more information here If your have already filtered out the "bad" sites when making your sites file, then there will be no benefit of including these parameters again. Another trick would be to use the -rf in combination with the -sites argument so you only use these chromosomes/scaffolds/contigs that are in your sites files
It would seem that the installation issue might have to do with a problem with specific older version of htslib so if this is still relevant you should use the most recent version of angsd and htslib. I will close this issue, but in case it is still causing problems you can reopen this issue. Best |
Add function aio::doAssert to replace asserts Did not use aio::assert as name since aio.h namespace complains due to assert being a macro Fixes the major bug explained in #527 Fixes issues #520 #474 #466 #420 #405 #396 #385 Possibly others; other issues should rerun the commands using the latest version.
Hi
I am trying to get SFS estimation for whole genome reseq data with 16 bams with very uneven coverages.
The ref sequence is around 0.7gb.
I am running with angsd version: 0.933 (htslib: 1.9) build(May 6 2020 21:25:11)
To get the SFS estimations, my workflow looked like
SFS1 select sites
$FILTERS="-uniqueOnly 1 -minMapQ 30 -minQ 30 -minInd 12 -SNP_pval 1e-6 -skipTriallelic 1 -sb_pval 1e-5"
$TODO="-doMajorMinor 1 -doMaf 1 -dosnpstat 1 -doHWE 1"
$angsd -b bams -GL 1 -anc $REFGEN -P 32 -out SFS1_sites1 $TODO $FILTERS
-> Total number of sites analyzed: 601,095,263
-> Number of sites retained after filtering: 7,221,984
$zcat SFS1_sites1.mafs.gz | cut -f 1,2 | tail -n +2 | grep "NC.044" > SFS1_sites1
then, to only get chr1,
$grep 'NC_044048.1' > SFS1_sites1_chr1
$angsd sites index SFS1_sites1_chr1
$head SFS1_sites1_chr1
NC_044048.1 60
NC_044048.1 66
NC_044048.1 114
NC_044048.1 116
SFS1_sites1_chr1 contains 311,175 sites. (only first chr from the ref seq and filtered sites)
$FILTERS=""
$TODO="-doSaf 1 -doMajorMinor 1 -doMaf 1 -dosnpstat 1 -doHWE 1"
$angsd -b bams -GL 1 -anc $REFGEN -P 4 -sites SFS1_sites1_chr1 -out SFS3_SAF5 $TODO
The problem is, the result doesnt retain any sites in the end..
-> Total number of sites analyzed: 645625341
-> Number of sites retained after filtering: 0
Here you see the sites retained after filtering is 0.
And also, when i try to print the result file with realSFS :
$realSFS print SFS3_SAF5.saf.idx
-> Version of fname:SFS3_SAF5.saf.idx is:2
[safreader.cpp.persaf_init():112] Problem reading data: SFS3_SAF5.saf.idx
realSFS cannot read the output file.
I have played around with different options to figure out what the problem is.
With other filters, and/or with and without -r or -rf, also with and without other TODOs than -doSaf, and different combinations of these elements.
All the results behave the same way more or less.
Then, I realized when I apply filters directly with -doSaf without -sites, angsd gives me a different results with different issues.
$FILTERS="-uniqueOnly 1 -minMapQ 30 -minQ 30 -minInd 12 -SNP_pval 1e-6 -skipTriallelic 1 -sb_pval 1e-5"
$TODO="-doSaf 1 -doMajorMinor 1 -doMaf 1 -dosnpstat 1 -doHWE 1"
$angsd -b bams -GL 1 -anc $REFGEN -P 4 -r NC_044048.1 -out SFS3_SAF7 $TODO
-> Total number of sites analyzed: 29600590
-> Number of sites retained after filtering: 29584867
Here, barely any sites get filtered out, which shouldn't be the case.
Please note that, based on step 1) above, I know a lot more sites should be filtered out.
But also, when I try to read the output, realSFS throws an error..
$realSFS print SFS3_SAF7.saf.idx
-> Version of fname:SFS3_SAF7.saf.idx is:2
[safreader.cpp->persaf_init():117] Problem reading data: SFS3_SAF7.saf.idx
There seem to be a multitude of issues here.
With -sites, -doSaf doesn't retain any sites in the end.
without -sites, sites don't get filtered out based on the filters I applied.
Both cases, realSFS can't read the output files.
Are these from some bugs of the program?
Or are there any ways to fix this?
Thanks,
Kwi
The text was updated successfully, but these errors were encountered: