Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request specification clarification on sequence truncations #4

Open
dtabb73 opened this issue Aug 19, 2021 · 2 comments
Open

Request specification clarification on sequence truncations #4

dtabb73 opened this issue Aug 19, 2021 · 2 comments

Comments

@dtabb73
Copy link

dtabb73 commented Aug 19, 2021

I would like to see a paragraph in the specification indicating how proteoform sequence truncations are to be specified. N-terminal truncations may be biological, as in the removal of the initial Met (perhaps with PTM) or the cleavage of a signal peptide or the action of a viral protease. The truncations may be instead be related to sample treatment, such as a rare cutter like CNBr for middle-down proteomics or due to a "hot" ion source. I believe ProForma should specify how a proteoform sequence compares to the sequence described by the accession, such as indicating the position of the first and last amino acids in the accession's sequence. Are amino acids preceding and succeeding the proteoform sequence expected to be included?

@javizca
Copy link
Contributor

javizca commented Aug 19, 2021

In my view, this is "metadata" information on top of the actual protein sequence. In the current version of the specification, we decided to handle those issues using the INFO tag providing the metadata there as free text.

Standardise every single annotation at this point is unfeasible in my view.

@edeutsch
Copy link

I think all this is beyond the scope of ProForma 2.0. ProForma is designed to describe the molecule that (someone claims) yielded a spectrum. Information about:

  • what is the parent accession
  • where in the parent is the peptidoform
  • what is missing in the peptidoform that might be expected based on the assumed parent
  • what are preceding and following amino acids
    is not designed to be encoded in ProForma 2.0. There could be 20 parent accessions, each with different offset, preceding and following amino acids. Try to capture that in ProForma 2.0 would be hideous.

Basically ProForma is about what you think you have observed, not about what you infer about context of that observation.
(with a minor exception that there is some ambiguity between I/L and a few other isobaric ambiguiities where ProForma allows the user to express one, but it is implied that isobaric alternatives are possible)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants