Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reads in Peaks Statistic is probably wrong #96

Open
alexg9010 opened this issue Jun 28, 2018 · 7 comments
Open

Reads in Peaks Statistic is probably wrong #96

alexg9010 opened this issue Jun 28, 2018 · 7 comments
Labels

Comments

@alexg9010
Copy link
Member

I have the feeling that there is something wrong with this.

@frenkiboy
Copy link
Contributor

can you create a test data set where you know the percentage, and see how it performs?
I went through the code, and can't figure out the mistake

@alexg9010
Copy link
Member Author

If I just define one peak, then the resulting plot shows what I would expect:

bildschirmfoto 2018-06-29 um 13 53 12

@alexg9010
Copy link
Member Author

This is how it looks when two peaks are defined.

bildschirmfoto 2018-06-29 um 13 59 31

@alexg9010 alexg9010 added the bug label Oct 18, 2018
@messersc
Copy link

Has this been fixed? I would need to know for the discussion of the experiment with our collaborators.

@alexg9010
Copy link
Member Author

Hi @messersc ,

Sorry, but this is not fixed yet. I can have a closer look at this next week, but I need to create a controlled test set first.

How does your distribution look like, do you have get some bars?

For the case your discussion is soon, please use this code to get some rough values:

library(dplyr)
library(ggplot2)

lstats <- readRDS("[/path/to/output]/Analysis/Summarized_Data_For_Report.RDS")

dd = lstats$Peak_Statistics$peaks_sample %>%
  dplyr::select(-bed_file, -bw_files, -bam_file, -sample_id,-library,-genome_type)                 %>%
  tidyr::gather(sample_cnt, value, -sample_name,-bam_name,  -mapped_total, -peak_number)  %>%
  mutate(value = as.numeric(value)) %>%
  mutate(mapped_total = as.numeric(mapped_total)) %>%
  mutate(value      = value/mapped_total)

g = dd %>%
  dplyr::filter(bam_name == sample_cnt)                          %>%
  ggplot(aes(bam_name, value, fill=sample_name)) +
  geom_bar(stat='identity', position='dodge',show.legend = FALSE) +
  xlab('Sample name')                           +
  ylab('Percentage of reads in peaks') +
  coord_flip() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  scale_fill_discrete('Peak Name')

print(g)

Best,
Alex

@messersc
Copy link

Hi Alex,

wow, you're super responsive 👍

I just needed to know if we can rely on these numbers or not. I will try to run your code, maybe I can contribute a bit to find the bug.

Thanks for your help and hope you have a nice weekend.
Clemens

@alexg9010
Copy link
Member Author

@messersc I figured out the bug that caused this issue. It had to do with some default settings in summarizeOverlaps that were messing with our counts.
I will soon draft a new release and then it should be available on guix quite fast.

@alexg9010 alexg9010 reopened this May 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants