Skip to content

5. P‐value aggregation for feature pre‐selection

Louis-Mael Gueguen edited this page Nov 11, 2024 · 2 revisions

This step ensure the most discriminant unitigs are used to build the models and select the best features.

Aggregation of unitigs' p-values

The unitigs are then sorted by aggregated p-values first (ascending, smallest p-values first), and then by size (descending, longest unitigs first for the equal p-values). The aggregation of each k-mer p-value for a given unitig into one p-value for the unitig is done using the following formula:

Cauchy

Where w_i is the weight attributed to each k-mer p-value p_i (in our calculation, all k-mers, thus p-values, have the same weight). This test is described by Liu and Xie.

Feature prefilter

A subset of features are selected by the user, using the top X unitigs (previously sorted). This number is equally divided between the two conditions.