Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore/fix confidence api #2

Merged
merged 12 commits into from
Dec 19, 2024

Conversation

jspaezp
Copy link
Owner

@jspaezp jspaezp commented Dec 14, 2024

I want to re-enable this API (or equivalent), since those functions were removed:

conf = psms.assign_confidence()
conf.plot_qvalues()

Also ... assign_confidence is a very messy function right now, so I am refactoring so I can actually understand what it is doind.

What is not going to happen:

result_files = moka_conf.to_txt(dest_dir=out_dir)
# Right now `assign_confidence` does both the writing and the assignment, since there is
# the transition to arbitrary input lengths.

@jspaezp
Copy link
Owner Author

jspaezp commented Dec 15, 2024

RN only one function uses numba ...

prev_idx = 0

it could be replaced by:


def _np_fdr2qvalue(fdr, num_total, indices):
    """Quickly turn a list of FDRs to q-values using vectorized operations.

    All of the inputs are assumed to be sorted.

    Parameters
    ----------
    fdr : numpy.ndarray
        A vector of all unique FDR values.
    num_total : numpy.ndarray
        A vector of the cumulative number of PSMs at each score.
    indices : tuple of numpy.ndarray
        Tuple where the vector at index i indicates the PSMs that
        shared the unique FDR value in `fdr`.

    Returns
    -------
    numpy.ndarray
        A vector of q-values.
    """
    # Calculate the cumulative sum of indices to get the end positions of
    # each group
    group_ends = np.cumsum(indices)
    # Calculate the start positions of each group
    group_starts = np.r_[0, group_ends[:-1]]

    # Create arrays of group slices
    group_slices = np.column_stack((group_starts, group_ends))

    # For each group, find the index of max num_total
    slices = [slice(start, end) for start, end in group_slices]
    foo = np.argsort(-num_total)
    max_n_indices = np.array([foo[x].min() for x in slices])
    # max_n_indices = np.array([
    #     start + np.argmax(num_total[start:end]) for start, end in group_slices
    # ])

    # Get the FDR values at these positions
    curr_fdrs = fdr[max_n_indices]

    # Create expanded array of FDR values for each position
    expanded_fdrs = np.repeat(curr_fdrs, indices)

    # Calculate running minimum (cumulative minimum)
    qvals = np.minimum.accumulate(expanded_fdrs)
    # Clip to [0, 1]
    qvals.clip(0, 1, out=qvals)

    return qvals

but its 10x slower (from fractions of miliseconds to low 10s of ms)

Edit:

qvals_np = np.minimum.accumulate(fdr)

seems to give nearly identical results and its 10x faster. Looking at the code it would seem
like they are equivalent (iteratively preserve the min value starting with 1 of the FDR sorted values, right?)

Results are very similar when there is a good distribution of fdrs

image

Looks a lot weirder when nothing is significant

image

Edit 2

Replacing the function passes all unit tests despite the difference in plotted behavior. Some system tests fail due to small numeric differences.

@jspaezp
Copy link
Owner Author

jspaezp commented Dec 16, 2024

assign_confidence right now takes waaaay too many arguments and they are usually not ranked in importance ... we can make them kwarg only or simplify some of it ...

@jspaezp
Copy link
Owner Author

jspaezp commented Dec 17, 2024

Note: I think right now there is a pretty rough edge where the user can very easily over-write the confidence location without a major warning. I could hash the output and make sure that flags an error if a read is done on a file that does not have the same hash after the writer is finalized.

@jspaezp jspaezp merged commit c246df7 into feature/auto_pin_handling2 Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant