-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ATAC-seq settings #145
Comments
If you followed original protocol for ATAC-Seq, you should get Paired-End reads. If so, I would suggest you just use "--format BAMPE" to let MACS2 pileup the whole fragments in general. But if you want to focus on looking for where the 'cutting sites' are, then '--nomodel --shift -100 --extsize 200' should work. |
I have a question towards the BAM and BAMPE mode: The TLEN of a pair is of course determined by the insert size after the alignment. I understand that in BAMPE, MACS takes only the 5' of the first mate of the pair and uses TLEN for the fragment length. But how is it in BAM mode? Does it treat the two mates idependently, so that literally the coverage of each nt within the pair would be double as high as in BAMPE mode? |
@ATpoint, according to the manual:
It sounds like only one of the mates is used. |
Yes, but I was referring to what happens with paired-end BAMs when not BAMPE but BAM is specified as -f, as this currently seems to be the method of choice for ATAC-seq in the literature. |
Based on that description, it sounds like only one mate (left mate) is used when the reads are paired and BAM (not BAMPE) is specified. Essentially, half the reads are ignored. Hopefully @taoliu can confirm. |
Yes you are right. The more I think about it, it probably autodetects the bitwise flag for paired-end reads and discards the reverse mate for further analysis. |
so -f BAMPE is preferred for paired-end ATAC-seq? |
@crazyhottommy @igordot Yes. With -f BAMPE on, MACS2 read the left mate and the insertion length information from BAM file, and discard right mate. With -f BAM, MACS2 only keeps the left mate, so it's definitely not a feature you want for your paired-end data. |
Why do so many papers use --shift -100 --extsize 200 for MACS2 rather than -f BAMPE if this is not recommended for paired-end data? |
--shift -100 --extsize 200 will amplify the 'cutting sites' enrichment from ATAC-seq data. So in the end, the 'peak' is where Tn5 transposase likes to attack. The fact is that, although many information such as the insertion length and the other mate alignment is ignored, such result is still usable. Especially when the short fragment population is extremely dominant, the final output won't be off much. |
Ok thank you. But you would still recommend the -f BAMPE option? After peak calling with -f BAMPE, I then take the 5' end of the forward and reverse reads as the cut sites. Does this make sense? |
@taoliu > @crazyhottommy @igordot Yes. With -f BAMPE on, MACS2 read the left mate and the insertion length information from BAM file, and discard right mate. With -f BAM, MACS2 only keeps the left mate, so it's definitely not a feature you want for your paired-end data. How does --keep-dup 1 works when setting -f BAMPE ? will --keep-dup 1 still deduplicate the fragments ? |
Sorry for reopening this discussion, but I have another question regarding ATAC-seq. I am interested in the cutting sites, so I want to use the --shift and --extsize options. However, as discussed before, we then unfortunately ignore all the mates. I have heard of people just merging the fastqs and treating the paired-end data as single-end to avoid this, however we then have suboptimal alignment. @taoliu, I am wondering if the assumptions of MACS2 aren't broken if we align PE, and set all RNEXT to *, PNEXT to 0, and shift the mates position by - (minus) TLEN, and use this BAM file for peak calling (with --shift and --extsize). |
For recent alternatives have a look at https://github.com/LiuLabUB/HMMRATAC from the Liu lab for ATAC-seq or https://github.com/jsh58/Genrich If you want to use macs and keep both mates you could write the BAM file as BED with one interval for each of the mates, reduced to the 5' end to get the cutting site and then use |
@ATpoint, thanks! Converting from BAM to BED is probably a much simpler approach! |
Hi, did you solve your problems about parameter --keep-dup? When I set it as all or 1, the peak numbers are significantly different? Which one is better for ATAC-Seq? |
You can use HMMRATAC instead now, which is a substitution of MACS2 spcific designed to ATACseq. It is developed by Tao Liu too. |
Thanks, Sounds good! Btw, what bam files do you use? Can you use the bam files after removing the duplicates using Picard? |
Yes, HMMRATAC can run with BAM files after duplicate removal. However, by default, HMMRATAC will remove the duplicates for you using the same Picard process. It is also possible to turn off this behavior with the --rmDup option. |
@igordot Hi, I am using MACS2 to call peaks, I am interested in the parameter you used in your article. my data are paired end reads. |
Hi, I have quite a naive question regarding the ATAC-seq analysis. Why do we want to set the My guess: |
@yichao-cai |
For anyone still following this discussion (more than 4 years later), there is a related thread on Twitter that brings up some interesting points and suggests using "-f BED --shift -100 --extsize 200": https://twitter.com/XiChenUoM/status/1336658454866325506 |
Thanks, that's very helpful |
@igordot I just enabled the 'Discussions' function for MACS on Github: https://github.com/macs3-project/MACS/discussions Hope that the informative discussions in the 'Issues' won't go away because of age :) I can summarize our discussion in this issue and start a new discussion. But if you want to start it, let me know and feel free to do so :) Just don't want to take away your credit. |
Another related issue, as I mentioned in Xi Chen's Twitter thread, is #421. As for MACS, the goal is to enable possible ways to analyze the data, so I will add an option (to keep both ends of BAMPE) to make life easier for people who want to focus on the cutting/insertion sites enrichment in ATAC-seq. |
Thank you! @igordot |
Any update on this @taoliu ? I was just looking through the code, but didn't find the right place where the filtering of 2nd mates occurs. |
I am trying to call peaks in ATAC-seq data. According to the documentation:
Based on a brief literature review, people use both
--shift -100 --extsize 200
and--shift 37 --extsize 73
for ATAC-seq. Is one option more appropriate? Is there another setting I should be using?The text was updated successfully, but these errors were encountered: