Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot differential analysis results #82

Open
cnluzon opened this issue Oct 7, 2020 · 2 comments · May be fixed by #83
Open

Plot differential analysis results #82

cnluzon opened this issue Oct 7, 2020 · 2 comments · May be fixed by #83

Comments

@cnluzon
Copy link
Collaborator

cnluzon commented Oct 7, 2020

After merging #80, a results table with adjusted p-values and log fold change across replicates can be generated.

This is a reminder that a plotting function for such bins using a pval or/and logfc threshold can be implemented (scatterplot with highlighted dots and mean values).

@cnluzon
Copy link
Collaborator Author

cnluzon commented Oct 12, 2020

When filtering by p-value or log fc, the values used currently for both selecting and plotting are the coverage values. I haven't implemented yet a version where you can do this including input bigWig files.

However if we want a scatter plot where bins are normalized to corresponding input (logfc or not), would it be correct to calculate the significance over those?

My feeling is that the best approach is not to do that, but instead:

  1. Select significant bins on "raw" (just the scaled bigwig coverage value) bins.
  2. Plot the log(bin / input) values. Since these are replicates, the plot will be: log(mean(bins) / mean(inputs)), which I think it's more robust than aggregating the individual log values.
  3. Highlight bins that were significant in step (1).

I would need some feedback on this issue @simonelsasser

@cnluzon cnluzon linked a pull request Oct 12, 2020 that will close this issue
2 tasks
@shaorray
Copy link
Contributor

However if we want a scatter plot where bins are normalized to corresponding input (logfc or not), would it be correct to calculate the significance over those?

I think it depends on the normalisation if or not using external size factors, which represent the input sizes. If the aim is to remove the coverage variance on the input intervals, sample_bin / input_bin can "flatten the curve" on the igv profile plot, and
the differential expression is likely to be the same.

log(mean(bins) / mean(inputs)) can make a nicer plot by removing extreme values from the GC repeats, if they're MINUTE-ChIPs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants