Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 6 revisions

Biopiece: mean_scores

Description

mean_scores calculates either the global or local mean value or quality SCORES in the stream. The quality SCORES are encoded Phred style in character string.

The global (default) behaviour calculates the SCORES_MEAN as the sum of all the scores over the length of the SCORES string.

The local means SCORES_MEAN_LOCAL are calculated using means from a sliding window, where the smallest mean is returned.

Thus, subquality records, with either a overall too low mean quality or with local dip in quality, can then be filtered using grab.

Usage

... | mean_scores [options]

Options

[-?          | --help]                #  Print full usage description.
[-l          | --local]               #  Calculate local means.
[-w <uint>   | --window_size=<uint>]  #  Window size                  -  Default=5
[-I <file!>  | --stream_in=<file!>]   #  Read input from stream file  -  Default=STDIN
[-O <file>   | --stream_out=<file>]   #  Write output to stream file  -  Default=STDOUT
[-v          | --verbose]             #  Verbose output.

Examples

Consider the following Fastq entry in the file test.fastq:

@HWI-EAS157_20FFGAAXX:2:1:888:434
TTGGTCGCTCGCTCCGCGACCTCAGATCAGACGTGG
+HWI-EAS157_20FFGAAXX:2:1:888:434
abcdefghhhhhhh[[[KKKKKheehhhhhhhhhhh

The values of the scores in decimal are:

SCORES: 33;34;35;36;37;38;39;40;40;40;40;40;40;40;27;27;27;11;
        11;11;11;11;40;37;37;40;40;40;40;40;40;40;40;40;40;40;

To read in these entries and calculate the mean for the quality scores of each entry, do:

read_fastq -i test.fastq | mean_scores

SEQ_NAME: HWI-EAS157_20FFGAAXX:2:1:888:434
SEQ: TTGGTCGCTCGCTCCGCGACCTCAGATCAGACGTGG
SEQ_LEN: 36
SCORES: abcdefghhhhhhh[[[KKKKKheehhhhhhhhhhh
SCORES_MEAN: 33.94
---

To calculate local means for a sliding window, do:

read_fastq -i test.fastq | mean_scores -l

SEQ_NAME: HWI-EAS157_20FFGAAXX:2:1:888:434
SEQ: TTGGTCGCTCGCTCCGCGACCTCAGATCAGACGTGG
SEQ_LEN: 36
SCORES: abcdefghhhhhhh[[[KKKKKheehhhhhhhhhhh
SCORES_MEAN_LOCAL: 11.0
---

Which indicates a local minimum was located at the stretch of KKKKK = 11+11+11+11+11 / 5 = 11.0

See also

read_fastq

read_fastq

mean_vals

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

June 2010

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

mean_scores is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally