Skip to content

Salmon 1.5.1

Compare
Choose a tag to compare
@rob-p rob-p released this 14 Jun 04:03
· 143 commits to master since this release

Note: If you downloaded the pre-compiled linux binary from this release page for v1.5.1 before 19:47 UTC on June 14, please check your version with salmon -v. For a short period of time, the executable posted here was actually v1.5.0. Other distribution mechanism (e.g. bioconda, docker hub, etc.) were not affected by this.

New features (in 1.5.0)

This release introduces an --ont flag, that is designed to improve quantification from Oxford Nanopore Technologies (ONT) long-reads (both cDNA and direct RNA). The main effect of this flag is twofold:

  • First, it enables an alignment error model designed to work with long-read alignments. Until this point, the recommendation when using salmon to quantify aligned long reads had been to disable the error model, since salmon's default error model is designed for short reads and did not work well with long read alignments. However, the error model enabled with the --ont flag is designed specifically for the alignment characteristics of long reads and should improve the quantification estimates produced for this data by providing a better estimate of the conditional probability of a read arising from a particular transcript given its alignment to that transcript (the testing for this feature has been done mostly using minimap2).

  • Second, it disables the length effect in the generative model when computing the conditional probability of observing a fragment given that it arises from a specific transcript. This is because in long-read sequencing, we do not expect to observe (i.e. sequence) multiple fragments from the same molecule, and thus we do not expect the transcript length to directly affect the observed fragment count directly. A consequence of this change is that the "EffectiveLength" of transcripts is not currently computed and used in the model in this mode, and this field in the output will be populated with a sentinel value of 100.

Other improvements (in 1.5.0)

  • When running alevin to generate a RAD file for alevin-fry (specifically when using --sketch mode), the sensitivity of mapping has been improved by allowing for reads that have only highly-repetitive seeds and map to a large number of loci.

  • It is no longer necessary to provide a transcript-to-gene --tgMap to the alevin command if alevin is being run with the --rad and/or --sketch flags.

  • Automatically detect and exit if alevin is run with an index including decoy sequences when using the --rad and/or --sketch flags. This functionality is not currently supported, and mapping against such an index can cause (cryptic) errors in downstream processing. Now, if such an index is passed when using these flags, an informative error message is printed and the program will exit with a return code of 1.

  • Support for the custom single-cell features (end, barcodeLength umiLength) simultaneously with the --citeseq command-line flags has been dropped, although they can still be used independently. A user has to either use the --citeseq flag with predefined sets of features (CB: 16, UMI: 10) or use the umi-geometry, bc-geometry, read-geometry flags for a customized extraction of the barcode sequences. Note, in the geometry mode, the user has to explicitly provide keepCBFraction 1.0 and a tgMap file, while it's not necessary to provide either in citeseq based mode.

Bug fixes

  • Fix an issue where the size of the representation used for the barcode length and UMI length when writing output to a RAD file was mistakenly linked. As most current protocols use a 32-bit integer for both, most runs are not affected.

  • Fix an issue where the barcode and UMI length may not be properly set when using the custom geometry format (addresses #670).