Given the importance of the use of these scores both in FASTQ and MAQ (for MAQ (for me), specifically using alignment quality scores from Illumina sequencing runs to monitor run and sample quality), I was a bit surprised to not find some complete work-up of the meanings, the scores, the glyphs coordinated to the scores, and the encoding interpretations of these scores in one location. The two (three) tables shown here hopefully provide a meaningful summary.

I should qualify that much of the background for this page was taken from four key places. First is the wikipedia entry for FASTQ. Second is the wikipedia entry for Phred quality score. Third is the Rosetta Stone of Phred Score interpretation in the form of the open access article: P. J. A. Cock, C. J. Fields, N. Goto, M. L. Heuer and P. M. Rice, "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants." Nucleic Acids Research, 2010, Vol. 38, No. 6, 1767-1771 doi:10.1093/nar/gkp1137. Fourth is seqanswers.com in various forms.

## (Sanger) Phred Quality Scores

I refer you to the two wikipedia articles on FASTQ and Phred Quality Scores for historical content (and for a brief discussion of the processing of chromatogram data for the production of quality scores). Table 1 shows the Q[Phred] (**Phred Q**) from P[Phred] values (**Probability (P) Of Wrong Base**), then adds the ASCII glyph codes (**Sanger "Q + 33" Shift**) and characters (**Sanger "Q + 33" ASCII GLYPH**) for the original Phred scores (Phred scores 0-to-93 use ASCII characters 33-to-126 in the Sanger method – this is performed to keep the single-character associated letters readable) and the Illumina 1.3+ codes (**Illumina 1.3+ "Q + 64" Shift**, using ASCII glyphs 64-to-126 to score from 0-to-62 on the "P" scale) and corresponding ASCII glyphs (**Illumina 1.3+ "Q + 64" ASCII GLYPH**). This is all likely completely self-explanatory (or hopefully will be by the bottom of the post). For review, the relationship between Phred quality score **Q[Sanger]** and the base-calling error probability **P** is

_{10}P

or, re-written for the logarithmically challenged…

Table 1. Phred Quality Scores (Q), Wrong Base Probabilities, And Sanger And Illumina 1.3+ ASCII Glyphs. | |||||
---|---|---|---|---|---|

Q |
Of Wrong Base |
"Q + 33" Shift |
"Q + 33" ASCII GLYPH |
"Q + 64" Shift |
"Q + 64" ASCII GLYPH |

An assumption going in when I was producing plots from the Q[Sanger] and Q[Solexa] data was that the "P" was the same value and the Solexa system simply opted to use the Odds (P/(1-P)) as their metric. A proper two-second consideration of the shape of the form of P and P/(1-P) would have lead to the immediate conclusion that something was afoot. The table columns on the left of the black bar in Table 2 (2A) are the Q[Solexa] values based on the use of the Q[Sanger] probabilities. This is here simply to show that they are, in fact, not the same and if you've spent any time wondering why you can't adequately… manipulate Excel's rounding tools to reproduce the Q[Solexa] integer values, this is why.

The probabilities obtained for Q[Solexa] were, in fact, worked backwards from the integer values of Q[Solexa] (having found no table online that gives a number-by-number summary of the probability or odds). For background, the Q[Solexa] values are obtained from:

_{10}[(P/1-P)]

Table 2A: Q[Solexa] from P[Sanger] | Table 2B: Q[Solexa] and associated odds (P/(1-P)). | ||||||
---|---|---|---|---|---|---|---|

Probability(P) Of Wrong Base |
AssociatedSanger Odds [P/(1-P)] |
Q[Solexa]Based On Phred Probability |
Solexa Q[-5 to 62] |
SolexaProbability (P) Of Wrong Base |
SolexaOdds [P/(1-P)] |
Solexa"Q + 64" Q Shift |
Solexa"Q + 64" ASCII GLYPH |

The presentation of this data is likely complete overkill, but I have found it useful in discussion. Hopefully your having tables in front of someone during an explanation will help clarify that explanation.