Chore/fix confidence api #2

jspaezp · 2024-12-14T04:12:59Z

I want to re-enable this API (or equivalent), since those functions were removed:

conf = psms.assign_confidence()
conf.plot_qvalues()

Also ... assign_confidence is a very messy function right now, so I am refactoring so I can actually understand what it is doind.

What is not going to happen:

result_files = moka_conf.to_txt(dest_dir=out_dir)
# Right now `assign_confidence` does both the writing and the assignment, since there is
# the transition to arbitrary input lengths.

jspaezp · 2024-12-15T21:33:59Z

RN only one function uses numba ...

mokapot/mokapot/qvalues.py

Line 172 in d4d3e8c

prev_idx = 0

it could be replaced by:


def _np_fdr2qvalue(fdr, num_total, indices):
    """Quickly turn a list of FDRs to q-values using vectorized operations.

    All of the inputs are assumed to be sorted.

    Parameters
    ----------
    fdr : numpy.ndarray
        A vector of all unique FDR values.
    num_total : numpy.ndarray
        A vector of the cumulative number of PSMs at each score.
    indices : tuple of numpy.ndarray
        Tuple where the vector at index i indicates the PSMs that
        shared the unique FDR value in `fdr`.

    Returns
    -------
    numpy.ndarray
        A vector of q-values.
    """
    # Calculate the cumulative sum of indices to get the end positions of
    # each group
    group_ends = np.cumsum(indices)
    # Calculate the start positions of each group
    group_starts = np.r_[0, group_ends[:-1]]

    # Create arrays of group slices
    group_slices = np.column_stack((group_starts, group_ends))

    # For each group, find the index of max num_total
    slices = [slice(start, end) for start, end in group_slices]
    foo = np.argsort(-num_total)
    max_n_indices = np.array([foo[x].min() for x in slices])
    # max_n_indices = np.array([
    #     start + np.argmax(num_total[start:end]) for start, end in group_slices
    # ])

    # Get the FDR values at these positions
    curr_fdrs = fdr[max_n_indices]

    # Create expanded array of FDR values for each position
    expanded_fdrs = np.repeat(curr_fdrs, indices)

    # Calculate running minimum (cumulative minimum)
    qvals = np.minimum.accumulate(expanded_fdrs)
    # Clip to [0, 1]
    qvals.clip(0, 1, out=qvals)

    return qvals

but its 10x slower (from fractions of miliseconds to low 10s of ms)

Edit:

qvals_np = np.minimum.accumulate(fdr)

seems to give nearly identical results and its 10x faster. Looking at the code it would seem
like they are equivalent (iteratively preserve the min value starting with 1 of the FDR sorted values, right?)

Results are very similar when there is a good distribution of fdrs

Looks a lot weirder when nothing is significant

Edit 2

Replacing the function passes all unit tests despite the difference in plotted behavior. Some system tests fail due to small numeric differences.

jspaezp · 2024-12-16T00:19:31Z

assign_confidence right now takes waaaay too many arguments and they are usually not ranked in importance ... we can make them kwarg only or simplify some of it ...

jspaezp · 2024-12-17T21:53:45Z

Note: I think right now there is a pretty rough edge where the user can very easily over-write the confidence location without a major warning. I could hash the output and make sure that flags an error if a read is done on a file that does not have the same hash after the writer is finalized.

jspaezp added 4 commits December 13, 2024 19:50

refactor: extracted output writer factory

1a95140

refactor: extracted level manager in confidence

0466b71

refactor: extracted level writer group

692a9f2

refactor: extracted more writer builder work to class

d4d3e8c

feat: score propagation and unscored confidence

789f0b5

jspaezp mentioned this pull request Dec 16, 2024

[WIP] v0.11.0 RC wfondrie/mokapot#132

Open

jspaezp added 3 commits December 16, 2024 15:35

feat(confidence): add data reading api

59e649d

feat,experiment: Experimental qvalue-fdr estimation

2e43ce2

chore,docs: updated basic docs to curr api and updated typing

be91528

jspaezp added 4 commits December 17, 2024 16:02

chore: updated basic n joint model docs code (md in progress)

680fc5b

chore: updated notebook

4ece548

chore,confidence: update docstrings

409d98d

chore,qvalue: removed commented out code

100ec58

jspaezp merged commit c246df7 into feature/auto_pin_handling2 Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chore/fix confidence api #2

Chore/fix confidence api #2

jspaezp commented Dec 14, 2024 •

edited

Loading

jspaezp commented Dec 15, 2024 •

edited

Loading

jspaezp commented Dec 16, 2024

jspaezp commented Dec 17, 2024

Chore/fix confidence api #2

Chore/fix confidence api #2

Conversation

jspaezp commented Dec 14, 2024 • edited Loading

jspaezp commented Dec 15, 2024 • edited Loading

Edit:

Edit 2

jspaezp commented Dec 16, 2024

jspaezp commented Dec 17, 2024

jspaezp commented Dec 14, 2024 •

edited

Loading

jspaezp commented Dec 15, 2024 •

edited

Loading