Given the importance of the use of these scores both in FASTQ and MAQ (for MAQ (for me), specifically using alignment quality scores from Illumina sequencing runs to monitor run and sample quality), I was a bit surprised to not find some complete work-up of the meanings, the scores, the glyphs coordinated to the scores, and the encoding interpretations of these scores in one location. The two (three) tables shown here hopefully provide a meaningful summary.
I should qualify that much of the background for this page was taken from four key places. First is the wikipedia entry for FASTQ. Second is the wikipedia entry for Phred quality score. Third is the Rosetta Stone of Phred Score interpretation in the form of the open access article: P. J. A. Cock, C. J. Fields, N. Goto, M. L. Heuer and P. M. Rice, "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants." Nucleic Acids Research, 2010, Vol. 38, No. 6, 1767-1771 doi:10.1093/nar/gkp1137. Fourth is seqanswers.com in various forms.
(Sanger) Phred Quality Scores
I refer you to the two wikipedia articles on FASTQ and Phred Quality Scores for historical content (and for a brief discussion of the processing of chromatogram data for the production of quality scores). Table 1 shows the Q[Phred] (Phred Q) from P[Phred] values (Probability (P) Of Wrong Base), then adds the ASCII glyph codes (Sanger "Q + 33" Shift) and characters (Sanger "Q + 33" ASCII GLYPH) for the original Phred scores (Phred scores 0-to-93 use ASCII characters 33-to-126 in the Sanger method – this is performed to keep the single-character associated letters readable) and the Illumina 1.3+ codes (Illumina 1.3+ "Q + 64" Shift, using ASCII glyphs 64-to-126 to score from 0-to-62 on the "P" scale) and corresponding ASCII glyphs (Illumina 1.3+ "Q + 64" ASCII GLYPH). This is all likely completely self-explanatory (or hopefully will be by the bottom of the post). For review, the relationship between Phred quality score Q[Sanger] and the base-calling error probability P is
Q[Sanger]= -10 * log10P
or, re-written for the logarithmically challenged…
P = 10^[-Q/10]
Table 1. Phred Quality Scores (Q), Wrong Base Probabilities, And Sanger And Illumina 1.3+ ASCII Glyphs.
Phred
Q
Probability (P)
Of Wrong Base
Sanger
"Q + 33"
Shift
Sanger
"Q + 33"
ASCII GLYPH
Illumina 1.3+
"Q + 64"
Shift
Illumina 1.3+
"Q + 64"
ASCII GLYPH
00
1.0000000000
033
!
064
@
01
0.7943282347
034
"
065
A
02
0.6309573445
035
#
066
B
03
0.5011872336
036
$
067
C
04
0.3981071706
037
%
068
D
05
0.3162277660
038
&
069
E
06
0.2511886432
039
'
070
F
07
0.1995262315
040
(
071
G
08
0.1584893192
041
)
072
H
09
0.1258925412
042
*
073
I
10
0.1000000000
043
+
074
J
11
0.0794328235
044
,
075
K
12
0.0630957344
045
–
076
L
13
0.0501187234
046
.
077
M
14
0.0398107171
047
/
078
N
15
0.0316227766
048
0
079
O
16
0.0251188643
049
1
080
P
17
0.0199526231
050
2
081
Q
18
0.0158489319
051
3
082
R
19
0.0125892541
052
4
083
S
20
0.0100000000
053
5
084
T
21
0.0079432823
054
6
085
U
22
0.0063095734
055
7
086
V
23
0.0050118723
056
8
087
W
24
0.0039810717
057
9
088
X
25
0.0031622777
058
:
089
Y
26
0.0025118864
059
;
090
Z
27
0.0019952623
060
<
091
[
28
0.0015848932
061
=
092
\
29
0.0012589254
062
>
093
]
30
0.0010000000
063
?
094
^
31
0.0007943282
064
@
095
_
32
0.0006309573
065
A
096
`
33
0.0005011872
066
B
097
a
34
0.0003981072
067
C
098
b
35
0.0003162278
068
D
099
c
36
0.0002511886
069
E
100
d
37
0.0001995262
070
F
101
e
38
0.0001584893
071
G
102
f
39
0.0001258925
072
H
103
g
40
0.0001000000
073
I
104
h
41
0.0000794328
074
J
105
i
42
0.0000630957
075
K
106
j
43
0.0000501187
076
L
107
k
44
0.0000398107
077
M
108
l
45
0.0000316228
078
N
109
m
46
0.0000251189
079
O
110
n
47
0.0000199526
080
P
111
o
48
0.0000158489
081
Q
112
p
49
0.0000125893
082
R
113
q
50
0.0000100000
083
S
114
r
51
0.0000079433
084
T
115
s
52
0.0000063096
085
U
116
t
53
0.0000050119
086
V
117
u
54
0.0000039811
087
W
118
v
55
0.0000031623
088
X
119
w
56
0.0000025119
089
Y
120
x
57
0.0000019953
090
Z
121
y
58
0.0000015849
091
[
122
z
59
0.0000012589
092
\
123
{
60
0.0000010000
093
]
124
|
61
0.0000007943
094
^
125
}
62
0.0000006310
095
_
126
~
63
0.0000005012
096
`
64
0.0000003981
097
a
65
0.0000003162
098
b
66
0.0000002512
099
c
67
0.0000001995
100
d
68
0.0000001585
101
e
69
0.0000001259
102
f
70
0.0000001000
103
g
71
0.0000000794
104
h
72
0.0000000631
105
i
73
0.0000000501
106
j
74
0.0000000398
107
k
75
0.0000000316
108
l
76
0.0000000251
109
m
77
0.0000000200
110
n
78
0.0000000158
111
o
79
0.0000000126
112
p
80
0.0000000100
113
q
81
0.0000000079
114
r
82
0.0000000063
115
s
83
0.0000000050
116
t
84
0.0000000040
117
u
85
0.0000000032
118
v
86
0.0000000025
119
w
87
0.0000000020
120
x
88
0.0000000016
121
y
89
0.0000000013
122
z
90
0.0000000010
123
{
91
0.0000000008
124
|
92
0.0000000006
125
}
93
0.0000000005
126
~
An assumption going in when I was producing plots from the Q[Sanger] and Q[Solexa] data was that the "P" was the same value and the Solexa system simply opted to use the Odds (P/(1-P)) as their metric. A proper two-second consideration of the shape of the form of P and P/(1-P) would have lead to the immediate conclusion that something was afoot. The table columns on the left of the black bar in Table 2 (2A) are the Q[Solexa] values based on the use of the Q[Sanger] probabilities. This is here simply to show that they are, in fact, not the same and if you've spent any time wondering why you can't adequately… manipulate Excel's rounding tools to reproduce the Q[Solexa] integer values, this is why.
The probabilities obtained for Q[Solexa] were, in fact, worked backwards from the integer values of Q[Solexa] (having found no table online that gives a number-by-number summary of the probability or odds). For background, the Q[Solexa] values are obtained from:
Q[Solexa] = -10 * log10[(P/1-P)]
Table 2A: Q[Solexa] from P[Sanger]
Table 2B: Q[Solexa] and associated odds (P/(1-P)).
Probability
(P) Of
Wrong Base
Associated
Sanger
Odds
[P/(1-P)]
Q[Solexa]
Based On
Phred
Probability
Solexa Q
[-5 to 62]
Solexa
Probability
(P) Of
Wrong Base
Solexa
Odds
[P/(1-P)]
Solexa
"Q + 64"
Q Shift
Solexa
"Q + 64"
ASCII
GLYPH
0.7943282
3.8621161
-5.8682532
-5
0.7597469
3.1622774
59
;
0.6309573
1.7097139
-2.3292343
-4
0.7152527
2.5118860
60
<
0.5011872
1.0047602
-0.0206244
-3
0.6661394
1.9952619
61
=
0.3981072
0.6614253
1.7951917
-2
0.6131368
1.5848929
62
>
0.3162278
0.4624753
3.3491146
-1
0.5573117
1.2589255
63
?
0.2511886
0.3354498
4.7437242
0
0.5000000
1.0000000
64
@
0.1995262
0.2492602
6.0334710
1
0.4426884
0.7943284
65
A
0.1584893
0.1883390
7.2505963
2
0.3868632
0.6309575
66
B
0.1258925
0.1440241
8.4156483
3
0.3338606
0.5011873
67
C
0.1000000
0.1111111
9.5424251
4
0.2847473
0.3981072
68
D
0.0794328
0.0862868
10.6405549
5
0.2402531
0.3162278
69
E
0.0630957
0.0673449
11.7169522
6
0.2007600
0.2511887
70
F
0.0501187
0.0527631
12.7766933
7
0.1663376
0.1995263
71
G
0.0398107
0.0414613
13.8235685
8
0.1368069
0.1584893
72
H
0.0316228
0.0326554
14.8604457
9
0.1118158
0.1258926
73
I
0.0251189
0.0257661
15.8895167
10
0.0909091
0.1000000
74
J
0.0199526
0.0203588
16.9124707
11
0.0735876
0.0794328
75
K
0.0158489
0.0161042
17.9306177
12
0.0593509
0.0630957
76
L
0.0125893
0.0127498
18.9449785
13
0.0477267
0.0501187
77
M
0.0100000
0.0101010
19.9563519
14
0.0382865
0.0398107
78
N
0.0079433
0.0080069
20.9653650
15
0.0306534
0.0316228
79
O
0.0063096
0.0063496
21.9725111
16
0.0245034
0.0251189
80
P
0.0050119
0.0050371
22.9781790
17
0.0195623
0.0199526
81
Q
0.0039811
0.0039970
23.9826759
18
0.0156017
0.0158489
82
R
0.0031623
0.0031723
24.9862446
19
0.0124327
0.0125893
83
S
0.0025119
0.0025182
25.9890773
20
0.0099010
0.0100000
84
T
0.0019953
0.0019993
26.9913260
21
0.0078807
0.0079433
85
U
0.0015849
0.0015874
27.9931114
22
0.0062700
0.0063096
86
V
0.0012589
0.0012605
28.9945291
23
0.0049869
0.0050119
87
W
0.0010000
0.0010010
29.9956549
24
0.0039653
0.0039811
88
X
0.0007943
0.0007950
30.9965489
25
0.0031523
0.0031623
89
Y
0.0006310
0.0006314
31.9972589
26
0.0025056
0.0025119
90
Z
0.0005012
0.0005014
32.9978228
27
0.0019913
0.0019953
91
[
0.0003981
0.0003983
33.9982707
28
0.0015824
0.0015849
92
\
0.0003162
0.0003163
34.9986264
29
0.0012573
0.0012589
93
]
0.0002512
0.0002513
35.9989090
30
0.0009990
0.0010000
94
^
0.0001995
0.0001996
36.9991334
31
0.0007937
0.0007943
95
_
0.0001585
0.0001585
37.9993116
32
0.0006306
0.0006310
96
`
0.0001259
0.0001259
38.9994532
33
0.0005009
0.0005012
97
a
0.0001000
0.0001000
39.9995657
34
0.0003979
0.0003981
98
b
0.0000794
0.0000794
40.9996550
35
0.0003161
0.0003162
99
c
0.0000631
0.0000631
41.9997260
36
0.0002511
0.0002512
100
d
0.0000501
0.0000501
42.9997823
37
0.0001995
0.0001995
101
e
0.0000398
0.0000398
43.9998271
38
0.0001585
0.0001585
102
f
0.0000316
0.0000316
44.9998627
39
0.0001259
0.0001259
103
g
0.0000251
0.0000251
45.9998909
40
0.0001000
0.0001000
104
h
0.0000200
0.0000200
46.9999133
41
0.0000794
0.0000794
105
i
0.0000158
0.0000158
47.9999312
42
0.0000631
0.0000631
106
j
0.0000126
0.0000126
48.9999453
43
0.0000501
0.0000501
107
k
0.0000100
0.0000100
49.9999566
44
0.0000398
0.0000398
108
l
0.0000079
0.0000079
50.9999655
45
0.0000316
0.0000316
109
m
0.0000063
0.0000063
51.9999726
46
0.0000251
0.0000251
110
n
0.0000050
0.0000050
52.9999782
47
0.0000200
0.0000200
111
o
0.0000040
0.0000040
53.9999827
48
0.0000158
0.0000158
112
p
0.0000032
0.0000032
54.9999863
49
0.0000126
0.0000126
113
q
0.0000025
0.0000025
55.9999891
50
0.0000100
0.0000100
114
r
0.0000020
0.0000020
56.9999913
51
0.0000079
0.0000079
115
s
0.0000016
0.0000016
57.9999931
52
0.0000063
0.0000063
116
t
0.0000013
0.0000013
58.9999945
53
0.0000050
0.0000050
117
u
0.0000010
0.0000010
59.9999957
54
0.0000040
0.0000040
118
v
0.0000008
0.0000008
60.9999966
55
0.0000032
0.0000032
119
w
0.0000006
0.0000006
61.9999973
56
0.0000025
0.0000025
120
x
0.0000005
0.0000005
62.9999978
57
0.0000020
0.0000020
121
y
0.0000004
0.0000004
63.9999983
58
0.0000016
0.0000016
122
z
0.0000003
0.0000003
64.9999986
59
0.0000013
0.0000013
123
{
0.0000003
0.0000003
65.9999989
60
0.0000010
0.0000010
124
|
0.0000002
0.0000002
66.9999991
61
0.0000008
0.0000008
125
}
0.0000002
0.0000002
67.9999993
62
0.0000006
0.0000006
126
~
With all three data sets, I reproduce a plot familiar to the FASTQ community below, showing the asymptotic behavior of the Q[Solexa] and Q[Sanger] values at high Q (which represent the lowest read errors. They approach one another because the numbers are simply too damn small on the plot). Also obvious from the plot is that the plots show poor agreement with each other in the range where the error probability is highest (so the entire analysis goes to pot as the data quality goes to pot [ed. Note for the international reader: "pot" refers to the device found in the water-closet). The grey line is a good plot of the wrong data (that in Table 2A).
The presentation of this data is likely complete overkill, but I have found it useful in discussion. Hopefully your having tables in front of someone during an explanation will help clarify that explanation.
So, with the BclConverterinstallation complete and a small QSEQ-to-FASTQscript available to convert the QSEQ output, the/a next step is the alignment of your lane-worth of sequenced DNA. The Maq program is used by the Cornell Sequencing Center (and was recommended as the workhorse tool for this task) and is available by link from the Illumina third-party tools list. In keeping with my no-interest-in-installing-another-distro run of Ubuntu luck, the procedure below explains the process of building Maq using as much apt-get as possible. In the case of Maq, there is one small busy step in the installation process because we need a copy of libstdc++.so.5 local that is NOT available by some easy package install (although what one has to do isn't terribly difficult either and I've linked local copies of the two .deb files below).
Installation Procedure
The process begins with apt-get, continues to dpkg, and then is finished with an easy make.
1. apt-get Install List
The official package list, I am quite sure, is below. From a Terminal window:
I say this because I (1) have installed several other packages on the machines I've been working on prior to the Maq builds and (2) I've no interest in wiping machines to perfect a super-clean install. If there is an error in the Maq-make, it is possible an additional package is missing (although I suspect this will not be the case, as there is little needed for the Maq build). If there is an error, the solution may simply be to blindly add the following additional packages (and, if you installed the BclConverter, you have this all installed anyway).
YOU LIKELY DON'T NEED THE FOLLOWING, BUT JUST IN CASE:
2. Adding 32-bit (Needed For Both) And 64-bit (If Running 64-bit) libstdc++.so.5
The following process assumes you know where the two .deb files are sitting and that you have access to this folder (I assume you've downloaded to Downloads or Desktop, drive your Terminal window in that direction with cd ~/Downloads or cd ~/Desktop). The two .deb files in question that contain (I believe) the most recent versions of libstdc++.so.5 are linked below (and sitting on my website – you'll have to unzip them with a double-click or a gunzip *.zip in the download'ed directory):
A. For the 32-bit version, the installation is simple:
sudo dpkg -i libstdc++5_3.3.6-18_i386.deb
B. For the 64-bit version, the installation is also simple:
sudo dpkg -i libstdc++5_3.3.6-18_amd64.deb
Output as below:
Selecting previously deselected package libstdc++5.
(Reading database ... 169294 files and directories currently installed.)
Unpacking libstdc++5 (from libstdc++5_3.3.6-18_amd64.deb) ...
Setting up libstdc++5 (1:3.3.6-18) ...
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place
The second step is only mildly more involved. These five steps (1) extract out the contents of libstdc++5_3.3.6-18_i386.deb without installing the library (so no over-writing), (2) enter the usr/lib directory you just extracted, (3) copy libstdc++.so.5.0.7 to /usr/lib32, (4) cd into /usrlib32, and (5) make a symbolic link for libstdc++.so.5.
dpkg --extract libstdc++5_3.3.6-18_i386.deb ./
cd usr/lib
sudo cp libstdc++.so.5.0.7 /usr/lib32
cd /usr/lib32/
sudo ln -s libstdc++.so.5.0.7 libstdc++.so.5
cd ~/
And that is all.
3. Installing Maq
sudo mv maq-0.7.1.tar.bz2 /opt/
cd /opt
sudo tar xvjf maq-0.7.1.tar.bz2
Mapass2 is a software that builds mapping assemblies from short reads
generated by the next-generation sequencing machines. It is particularly
designed for Illumina-Solexa 1G Genetic Analyzer, which typically
generates reads 25-35bp in length.
Mapass2 first aligns reads to reference sequences and then calls the
consensus. At the mapping stage, maq performs ungapped alignment. For
single-end reads, maq is able to find all hits with up to 2 or 3
mismatches, depending on a command-line option; for paired-end reads, it
always finds all paired hits with one of the two reads containing up to
1 mismatch. At the assembling stage, maq calls the consensus based on a
statistical model. It calls the base which maximizes the posterior
probability and calculates a phred quality at each position along the
consensus. Heterozygotes are also called in this process.
For more information, see also maq website:
http://mapass.sourceforge.net
INSTALL Contents
There are two ways to compile maq. The first way is to use the GNU
building systems. Simply type './configure; make; make install' to
compile and to install maq. Three executables 'maq', 'maq.pl' and
'farm-run.pl' will be copied to '/usr/local/bin' by default.
Alternatively, one could compile with 'make -f Makefile.generic' and
manually copy the three executables to the destination directory.
Modification to 'Makefile.generic' is sometimes needed for different
architectures.
As I'm running this from /opt, we'll be doing the first way to compile Maq but using "sudo" in each case.
USERID@MACHINE:/opt/maq-0.7.1$ sudo ./configure
Produces…
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking if gcc accepts -m64... yes
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking zlib.h usability... yes
checking zlib.h presence... yes
checking for zlib.h... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating config.h
USERID@MACHINE:/opt/maq-0.7.1$ sudo make
Produces…
cd . && /bin/bash /opt/maq-0.7.1/missing --run autoheader
/opt/maq-0.7.1/missing: line 54: autoheader: command not found
WARNING: `autoheader' is missing on your system. You should only need it if
you modified `acconfig.h' or `configure.ac'. You might want
to install the `Autoconf' and `GNU m4' packages. Grab them
from any GNU archive site.
rm -f stamp-h1
touch config.h.in
cd . && /bin/bash ./config.status config.h
config.status: creating config.h
config.status: config.h is unchanged
make all-am
make[1]: Entering directory `/opt/maq-0.7.1'
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c main.c
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c const.c
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c seq.c
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c bfa.c
bfa.c: In function ‘nst_load_bfa1':
bfa.c:31: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result
bfa.c:32: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result
bfa.c:33: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result
bfa.c:35: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result
bfa.c:37: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result
bfa.c: In function ‘nst_bfa_len':
bfa.c:46: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result
bfa.c:48: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o read.o read.cc
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c fasta2bfa.c
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c fastq2bfq.c
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o merge.o merge.cc
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o match_aux.o
match_aux.cc
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o match.o match.cc
match.cc: In function ‘int alt_cal_mm(bit64_t)':
match.cc:58: warning: suggest parentheses around ‘+' in operand of ‘&'
match.cc:61: warning: suggest parentheses around ‘+' in operand of ‘&'
match.cc: In function ‘int alt_cal_err(bit64_t, bit64_t)':
match.cc:67: warning: suggest parentheses around ‘+' in operand of ‘&'
match.cc:70: warning: suggest parentheses around ‘+' in operand of ‘&'
match.cc: In function ‘int ma_match(int, char**)':
match.cc:525: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)', declared
with attribute
warn_unused_result
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o sort_mapping.o
sort_mapping.cc
sort_mapping.cc: In function ‘int ma_make_pair(const match_aux_t*, const match_info_t*, const
match_info_t*,
pair_info_t*)':
sort_mapping.cc:59: warning: suggest parentheses around arithmetic in operand of ‘^'
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o assemble.o
assemble.cc
assemble.cc: In function ‘base_call_aux_t* assemble_cns_collect(assemble_pos_t*, const
assemble_aux_t*)':
assemble.cc:106: warning: suggest parentheses around arithmetic in operand of ‘|'
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o
pileup.o pileup.cc
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o
mapcheck.o mapcheck.cc
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c get_pos.c
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c assopt.c
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c aux_utils.c
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o rbcc.o rbcc.cc
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o subsnp.o
subsnp.cc
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o pair_stat.o
pair_stat.cc
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o indel_soa.o
indel_soa.cc
indel_soa.cc: In function ‘void fill_counter(bit32_t*, int, nst_bfa1_t*, void*)':
indel_soa.cc:42: warning: suggest parentheses around ‘-' inside ‘< <'
indel_soa.cc:56: warning: suggest parentheses around ‘-' inside ‘<<'
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c maqmap.c
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c maqmap_conv.c
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o altchr.o
altchr.cc
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c submap.c
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o rmdup.o
rmdup.cc
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c simulate.c
In file included from /usr/include/string.h:640,
from maqmap.h:23,
from simulate.c:11:
In function ‘memset',
inlined from ‘simustat_core' at simulate.c:386:
/usr/include/bits/string3.h:86: warning: call to __builtin___memset_chk will always overflow
destination buffer
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c genran.c
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o indel_pe.o
indel_pe.cc
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c stdaln.c
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o indel_call.o
indel_call.cc
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o
eland2maq.o eland2maq.cc
eland2maq.cc: In function ‘hash_map_char* read_list(FILE*)':
eland2maq.cc:33: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)',
declared with attribute warn_unused_result
eland2maq.cc: In function ‘void eland2maq_core(FILE*, FILE*, void*)':
eland2maq.cc:88: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)',
declared with attribute warn_unused_result
eland2maq.cc:96: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)',
declared with attribute warn_unused_result
eland2maq.cc:99: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)',
declared with attribute warn_unused_result
eland2maq.cc: In function ‘void novo2maq_core(FILE*, FILE*, void*)':
eland2maq.cc:323: warning: ignoring return value of ‘char* fgets(char*, int, FILE*)', declared
with attribute warn_unused_result
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o
csmap2ntmap.o csmap2ntmap.cc
gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c break_pair.c
g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o
glfgen.o glfgen.cc
glfgen.cc: In function ‘glf1_t* glfgen1_core(assemble_pos_t*, const assemble_aux_t*, bit8_t)':
glfgen.cc:43: warning: suggest parentheses around arithmetic in operand of ‘|'
g++ -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -o maq main.o const.o seq.o bfa.o read.o
fasta2bfa.o fastq2bfq.o merge.o match_aux.o match.o sort_mapping.o assemble.o pileup.o
mapcheck.o
get_pos.o assopt.o aux_utils.o rbcc.o subsnp.o pair_stat.o indel_soa.o maqmap.o maqmap_conv.o
altchr.o submap.o rmdup.o simulate.o genran.o indel_pe.o stdaln.o indel_call.o eland2maq.o
csmap2ntmap.o break_pair.o glfgen.o -lm -lz
make[1]: Leaving directory `/opt/maq-0.7.1'
USERID@MACHINE:/opt/maq-0.7.1$ sudo make install
Produces…
make[1]: Entering directory `/opt/maq-0.7.1'
test -z "/usr/local/bin" || /bin/mkdir -p "/usr/local/bin"
/usr/bin/install -c 'maq' '/usr/local/bin/maq'
test -z "/usr/local/bin" || /bin/mkdir -p "/usr/local/bin"
/usr/bin/install -c 'scripts/maq.pl' '/usr/local/bin/maq.pl'
/usr/bin/install -c 'scripts/farm-run.pl' '/usr/local/bin/farm-run.pl'
/usr/bin/install -c 'scripts/maq_plot.pl' '/usr/local/bin/maq_plot.pl'
/usr/bin/install -c 'scripts/maq_eval.pl' '/usr/local/bin/maq_eval.pl'
make[1]: Nothing to be done for `install-data-am'.
make[1]: Leaving directory `/opt/maq-0.7.1'
The Maq package is installed in /usr/local/bin and should be available immediately without any path calls. In the interest of running a brief test, I've provided a fastq file for phi-X174 and the "easy" command line run to run an alignment (rendered movie from the virusworld website, including a second half featuring the lovely QuteMol program, is below). While I wouldn't object to hosting a full lane of phi-X174, the 1.9 GB of fragments = unbearably long server upload. Suffice it to say, if you have a phi-X174 lane and you've run the BCL-to-QSEQ-to-FASTQ procedure in the BclConverter post, you have a properly formatted PhiXSequence.fastq for running this example.
You can download the phi-X174 sequence at phi_X174_sequence.fastq.gz, a local version of the file you can find at the National Center for Biotechnology Information. The sequence is below because, well, I wanted to have a sequence present on the blog post (and it is absolutely fascinating to me that this is the instruction manual for something).
With this file downloaded and your phi-X174 fragment collection sitting in a file [I will assume is named] phi_X174_seq_fragments.fastq in the same directory, the command line run is simple:
drwxr-xr-x 2 user user 4096 2010-12-11 17:18 phi_X174
-rw-r--r-- 1 user user 3524 2010-12-11 17:18 phi_X174.log
-rw-r--r-- 1 user user 5529 2010-12-11 16:39 phi_X174_seq.fastq
-rw-r--r-- 1 user user 1921502305 2010-12-11 16:40 phi_X174_seq_fragments.fastq
This will produce the phi_X174.log results file (check for errors. Log contents below)…
…and a "phi_X174" directory (from the "-d") containing (hopefully) the following file list:
drwxr-xr-x 2 user user 4096 2010-12-11 17:18 .
drwxr-xr-x 19 user user 4096 2010-12-11 17:09 ..
-rw-r--r-- 1 user user 337296876 2010-12-11 17:16 all.map
-rw-r--r-- 1 user user 41635667 2010-12-11 17:13 aln1@10000001.map
-rw-r--r-- 1 user user 18493 2010-12-11 17:13 aln1@10000001.map.log
-rw-r--r-- 1 user user 42219864 2010-12-11 17:15 aln1@12000001.map
-rw-r--r-- 1 user user 18493 2010-12-11 17:15 aln1@12000001.map.log
-rw-r--r-- 1 user user 42885466 2010-12-11 17:14 aln1@14000001.map
-rw-r--r-- 1 user user 18493 2010-12-11 17:14 aln1@14000001.map.log
-rw-r--r-- 1 user user 5808963 2010-12-11 17:13 aln1@16000001.map
-rw-r--r-- 1 user user 18484 2010-12-11 17:13 aln1@16000001.map.log
-rw-r--r-- 1 user user 42616782 2010-12-11 17:14 aln1@1.map
-rw-r--r-- 1 user user 18493 2010-12-11 17:14 aln1@1.map.log
-rw-r--r-- 1 user user 41452684 2010-12-11 17:12 aln1@2000001.map
-rw-r--r-- 1 user user 18493 2010-12-11 17:12 aln1@2000001.map.log
-rw-r--r-- 1 user user 41223383 2010-12-11 17:14 aln1@4000001.map
-rw-r--r-- 1 user user 18493 2010-12-11 17:14 aln1@4000001.map.log
-rw-r--r-- 1 user user 41423788 2010-12-11 17:13 aln1@6000001.map
-rw-r--r-- 1 user user 18493 2010-12-11 17:13 aln1@6000001.map.log
-rw-r--r-- 1 user user 42709777 2010-12-11 17:12 aln1@8000001.map
-rw-r--r-- 1 user user 18493 2010-12-11 17:12 aln1@8000001.map.log
-rw-r--r-- 1 user user 8704 2010-12-11 17:18 assemble.log
lrwxrwxrwx 1 user user 13 2010-12-11 17:18 cns.filter.snp -> cns.final.snp
-rw-r--r-- 1 user user 509 2010-12-11 17:18 cns.final.snp
-rw-r--r-- 1 user user 10983 2010-12-11 17:18 cns.fq
-rw-r--r-- 1 user user 0 2010-12-11 17:18 cns.indelse
-rw-r--r-- 1 user user 571 2010-12-11 17:18 cns.snp
-rw-r--r-- 1 user user 452 2010-12-11 17:18 cns.win
-rw-r--r-- 1 user user 3555 2010-12-11 17:18 consensus.cns
-rw-r--r-- 1 user user 4525 2010-12-11 17:17 mapcheck.txt
-rw-r--r-- 1 user user 51219471 2010-12-11 17:11 read1@10000001.bfq
-rw-r--r-- 1 user user 51533445 2010-12-11 17:11 read1@12000001.bfq
-rw-r--r-- 1 user user 52014489 2010-12-11 17:12 read1@14000001.bfq
-rw-r--r-- 1 user user 6777154 2010-12-11 17:12 read1@16000001.bfq
-rw-r--r-- 1 user user 51764856 2010-12-11 17:09 read1@1.bfq
-rw-r--r-- 1 user user 51064770 2010-12-11 17:10 read1@2000001.bfq
-rw-r--r-- 1 user user 50938162 2010-12-11 17:10 read1@4000001.bfq
-rw-r--r-- 1 user user 51025084 2010-12-11 17:10 read1@6000001.bfq
-rw-r--r-- 1 user user 51803458 2010-12-11 17:11 read1@8000001.bfq
-rw-r--r-- 1 user user 2744 2010-12-11 17:09 ref.bfa
-rw-r--r-- 1 user user 3579605 2010-12-11 17:13 unmap1@10000001.txt
-rw-r--r-- 1 user user 3592541 2010-12-11 17:15 unmap1@12000001.txt
-rw-r--r-- 1 user user 3790240 2010-12-11 17:14 unmap1@14000001.txt
-rw-r--r-- 1 user user 492518 2010-12-11 17:13 unmap1@16000001.txt
-rw-r--r-- 1 user user 3768747 2010-12-11 17:14 unmap1@1.txt
-rw-r--r-- 1 user user 3668574 2010-12-11 17:12 unmap1@2000001.txt
-rw-r--r-- 1 user user 3608445 2010-12-11 17:13 unmap1@4000001.txt
-rw-r--r-- 1 user user 3429470 2010-12-11 17:13 unmap1@6000001.txt
-rw-r--r-- 1 user user 3410453 2010-12-11 17:12 unmap1@8000001.txt
-rw-r--r-- 1 user user 0 2010-12-11 17:18 unmap.indel
I've included the file sizes so you can see the amount of data generated from a 4 Kb phi_X174_sequence.fastq file and a 1.9 GB phi_X174_seq_fragments.fastq file.