Skip to content

Commit

Permalink
Recommend always using the MM ? . encoding.
Browse files Browse the repository at this point in the history
The '.' code is the default interpretation, but historically tools
omitting '?' and '.' have used both styles.  An explicit definition in
the MM string removes any ambiguity.

See samtools#654 comments for background.

Co-authored-by: John Marshall <jmarshall@hey.com>
  • Loading branch information
jkbonfield and jmarshall committed Aug 15, 2022
1 parent 6e70f3a commit 34137f2
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion SAMtags.tex
Original file line number Diff line number Diff line change
Expand Up @@ -491,9 +491,11 @@ \subsection{Base modifications}
Note `{\tt N}' may be used to match any base rather than specifically an `{\tt N}' call by the sequencing instrument.
This may be used in situations where the base modification is not a derivation of a standard base type.
This is followed by either plus or minus indicating the strand the modification was observed on (relative to the original sequenced strand of {\sf SEQ} with plus meaning same orientation),\footnote{Hence a tool that may reverse complement sequences does not need to understand how to manipulate the {\tt MM} and {\tt ML} tags.} and one or more base modification codes.
Following the base modification codes is an optional `{\tt .}' or `{\tt ?}' describing how skipped seq bases of the stated base type should be interpreted by downstream tools.

Following the base modification codes is a recommended but optional `{\tt .}' or `{\tt ?}' describing how skipped seq bases of the stated base type should be interpreted by downstream tools.
When this flag is `{\tt ?}' there is no information about the modification status of the skipped bases provided.
When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilites below a threshold to provide a more compact modification tag.}

This is then followed by a comma separated list of how many seq bases of the stated base type to skip, stored as a delta to the last and starting with 0 as the first (or next) base, starting from the uncomplemented 5' end of the {\sf SEQ} field.
This number series is comparable to the numbers in an {\tt MD} tag,
albeit counting specific base types only and potentially reverse-complemented.
Expand Down

0 comments on commit 34137f2

Please sign in to comment.