Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(protein-prospector): Add TMT converter for protein prospector #97

Merged
merged 7 commits into from
Aug 28, 2024

Conversation

tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Jul 25, 2024

Motivation and Context

In this issue, one of our users is requesting a new PTM+TMT converter for Protein Prospector. In order to create this converter, we first need to create a TMT converter.

Changes

  • Add a new internal function .cleanRawProteinProspector to clean up the protein prospector output
  • Create new ProteinProspectortoMSstatsTMTFormat function with basic functionality

Testing

  • Examples from .Rd files run successfully
  • Locally tested on a 45MB dataset and it processed successfully.
  • Added unit tests

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

R/clean_ProteinProspector.R Show resolved Hide resolved
R/clean_ProteinProspector.R Outdated Show resolved Hide resolved
@tonywu1999
Copy link
Contributor Author

tonywu1999 commented Aug 5, 2024

TODO:

  • Determine how to handle 0s, negative values, and values between 0 and 1
  • Add unit tests on 0-1, 0s, & negative value handling

@tonywu1999 tonywu1999 requested a review from mstaniak August 12, 2024 19:06
@mstaniak
Copy link
Contributor

mstaniak commented Aug 13, 2024

Hi @tonywu1999 , adding 1:n() to PSM is meant to help retain the identity of PSMs when melting, right?
The single underscore will/might be a problem for dataProcess(), "_[PSM ID]" should be removed after aggregating PSMs.
Also the := will work a bit better, data.table might have a faster ifelse, too

@tonywu1999
Copy link
Contributor Author

@mstaniak

adding 1:n() to PSM is meant to help retain the identity of PSMs when melting, right?

Yes, and I believe retaining the identity of PSMs is needed when aggregating them here within the same run/sequence/charge combination.

The single underscore will/might be a problem for dataProcess(), "_[PSM ID]" should be removed after aggregating PSMs.

It looks like the single underscore from adding 1:n is removed here.

Also the := will work a bit better, data.table might have a faster ifelse, too

I ended up updating the code to use := when adding the PSM column

@tonywu1999 tonywu1999 dismissed mstaniak’s stale review August 19, 2024 15:20

addressed feedback

@tonywu1999 tonywu1999 merged commit 6dc8dde into devel Aug 28, 2024
1 check passed
@tonywu1999 tonywu1999 deleted the protein-prospector-feature branch August 28, 2024 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants