Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Include all possible bin sizes within GSEA (0 and 1) #102

Open
bvenn opened this issue Jul 17, 2020 · 0 comments
Open

[BUG] Include all possible bin sizes within GSEA (0 and 1) #102

bvenn opened this issue Jul 17, 2020 · 0 comments
Labels

Comments

@bvenn
Copy link
Member

bvenn commented Jul 17, 2020

Describe the bug

For gene set enrichment analysis (GSEA) fishers exact method is applied to analyse over-, or under representated groups. The method is based on multiple hypergeometric distribution tests.

In BioFSharp.Stast.OntologyEnrichment.CalcHyperGeoPvalue, two cases get individual treatments: When the number of differentially expressed genes in a random bin is 0 or 1, the pValue is reported as nan. While these cases might not be of interest, a true pValue can be calculated.

For further analysis a multiple-testing-correction can be performed. The BenjaminiHochberg-method calculates false discovery rates (FDR) for every p value. nans cannot be processed, so they get filtered out. This filtering of p values that could have been calculated manipulates the FDR-calculation and keeps p values more flat than expected.

Often bins of sized lower than 5 are not of interested and are rejected anyway. The filtering should be supervised by the operator and have to be performed after the enrichment analysis and prior to multiple testing correction.

The current filter within the GSEA leads to results that cannot be easily interpreted.

Solution

  • Remove the if expression within CalcHyperGeoPvalue

  • add additional context for filtering procedures to the documentation

  • consider renaming the functions to lower case

@bvenn bvenn added the bug label Jul 17, 2020
@bvenn bvenn changed the title [BUG] Include all possible bin sized within GSEA (0 and 1) [BUG] Include all possible bin sizes within GSEA (0 and 1) Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant