-
Notifications
You must be signed in to change notification settings - Fork 0
5. P‐value aggregation for feature pre‐selection
Louis-Mael Gueguen edited this page Nov 11, 2024
·
2 revisions
This step ensure the most discriminant unitigs are used to build the models and select the best features.
The unitigs are then sorted by aggregated p-values first (ascending, smallest p-values first), and then by size (descending, longest unitigs first for the equal p-values). The aggregation of each k-mer p-value for a given unitig into one p-value for the unitig is done using the following formula:
Where w_i
is the weight attributed to each k-mer p-value p_i
(in our calculation, all k-mers, thus p-values, have the same weight).
This test is described by Liu and Xie.
A subset of features are selected by the user, using the top X unitigs (previously sorted). This number is equally divided between the two conditions.