So, with the BclConverter installation complete and a small QSEQ-to-FASTQ script available to convert the QSEQ output, the/a next step is the alignment of your lane-worth of sequenced DNA. The Maq program is used by the Cornell Sequencing Center (and was recommended as the workhorse tool for this task) and is available by link from the Illumina third-party tools list. In keeping with my no-interest-in-installing-another-distro run of Ubuntu luck, the procedure below explains the process of building Maq using as much apt-get as possible. In the case of Maq, there is one small busy step in the installation process because we need a copy of libstdc++.so.5 local that is NOT available by some easy package install (although what one has to do isn't terribly difficult either and I've linked local copies of the two .deb files below).
Installation Procedure
The process begins with apt-get, continues to dpkg, and then is finished with an easy make.
1. apt-get Install List
The official package list, I am quite sure, is below. From a Terminal window:
sudo apt-get install zlib1g-dev libssl-dev build-essential gcc g++ rpm ia32-libs
I say this because I (1) have installed several other packages on the machines I've been working on prior to the Maq builds and (2) I've no interest in wiping machines to perfect a super-clean install. If there is an error in the Maq-make, it is possible an additional package is missing (although I suspect this will not be the case, as there is little needed for the Maq build). If there is an error, the solution may simply be to blindly add the following additional packages (and, if you installed the BclConverter, you have this all installed anyway).
YOU LIKELY DON'T NEED THE FOLLOWING, BUT JUST IN CASE:
sudo apt-get install build-essential mercurial cmake python2.6-dev python3.1-dev gettext libopenal1 libopenexr-dev libavdevice52 freeglut3-dev libglew1.5-dev libxmu-dev libxi-dev libfreeimage-dev doxygen libqt4-dev bison flex libbz2-dev libpng12-dev libxml-simple-perl ia32-libs lib32asound2 lib32ncurses5 lib32nss-mdns lib32z1 lib32gfortran3 gcc-4.3-multilib gcc-multilib lib32gomp1 libc6-dev-i386 lib32mudflap0 lib32gcc1 lib32gcc1-dbg lib32stdc++6 lib32stdc++6-4.3-dbg libc6-i386 csh g++ g++-4.3 libstdc++6-4.3-dev g++-multilib g++-4.3-multilib gcc-4.3-doc libstdc++6-4.3-dbg libstdc++6-4.3-doc nfs-common nfs-kernel-server portmap ssh gnuplot
2. Adding 32-bit (Needed For Both) And 64-bit (If Running 64-bit) libstdc++.so.5
The following process assumes you know where the two .deb files are sitting and that you have access to this folder (I assume you've downloaded to Downloads or Desktop, drive your Terminal window in that direction with cd ~/Downloads or cd ~/Desktop). The two .deb files in question that contain (I believe) the most recent versions of libstdc++.so.5 are linked below (and sitting on my website – you'll have to unzip them with a double-click or a gunzip *.zip in the download'ed directory):
libstdc++5_3.3.6-18_i386.deb (as libstdc5_3.3.6_18_i386.deb.zip)
libstdc++5_3.3.6-18_amd64.deb (as libstdc5_3.3.6_18_amd64.deb.zip)
These are an "additional runtime library for C++ programs built with the GNU compiler." The i386 is the 32-bit version. You definitely need this one. The amd64 is needed if you installed the 64-bit Ubuntu distro. You'll STILL need to install the i386 version.
A. For the 32-bit version, the installation is simple:
sudo dpkg -i libstdc++5_3.3.6-18_i386.deb
B. For the 64-bit version, the installation is also simple:
sudo dpkg -i libstdc++5_3.3.6-18_amd64.deb
Output as below:
Selecting previously deselected package libstdc++5. (Reading database ... 169294 files and directories currently installed.) Unpacking libstdc++5 (from libstdc++5_3.3.6-18_amd64.deb) ... Setting up libstdc++5 (1:3.3.6-18) ... Processing triggers for libc-bin ... ldconfig deferred processing now taking place
The second step is only mildly more involved. These five steps (1) extract out the contents of libstdc++5_3.3.6-18_i386.deb without installing the library (so no over-writing), (2) enter the usr/lib directory you just extracted, (3) copy libstdc++.so.5.0.7 to /usr/lib32, (4) cd into /usrlib32, and (5) make a symbolic link for libstdc++.so.5.
dpkg --extract libstdc++5_3.3.6-18_i386.deb ./ cd usr/lib sudo cp libstdc++.so.5.0.7 /usr/lib32 cd /usr/lib32/ sudo ln -s libstdc++.so.5.0.7 libstdc++.so.5 cd ~/
And that is all.
3. Installing Maq
sudo mv maq-0.7.1.tar.bz2 /opt/ cd /opt sudo tar xvjf maq-0.7.1.tar.bz2
Produces…
tar: Record size = 8 blocks maq-0.7.1/ maq-0.7.1/AUTHORS maq-0.7.1/COPYING maq-0.7.1/ChangeLog maq-0.7.1/FUTURES maq-0.7.1/INSTALL maq-0.7.1/Makefile.am maq-0.7.1/Makefile.generic maq-0.7.1/Makefile.in maq-0.7.1/NEWS maq-0.7.1/PROBLEMS maq-0.7.1/README maq-0.7.1/aclocal.m4 maq-0.7.1/algo.hh maq-0.7.1/altchr.cc maq-0.7.1/assemble.cc maq-0.7.1/assemble.h maq-0.7.1/assopt.c maq-0.7.1/autogen.sh maq-0.7.1/aux_utils.c maq-0.7.1/bfa.c maq-0.7.1/bfa.h maq-0.7.1/break_pair.c maq-0.7.1/cleanup.sh maq-0.7.1/config.guess maq-0.7.1/config.h.in maq-0.7.1/config.sub maq-0.7.1/configure maq-0.7.1/configure.ac maq-0.7.1/const.c maq-0.7.1/const.h maq-0.7.1/csmap2ntmap.cc maq-0.7.1/dword.hh maq-0.7.1/eland2maq.cc maq-0.7.1/fasta2bfa.c maq-0.7.1/fastq2bfq.c maq-0.7.1/genran.c maq-0.7.1/genran.h maq-0.7.1/get_pos.c maq-0.7.1/glf.h maq-0.7.1/glfgen.cc maq-0.7.1/indel_call.cc maq-0.7.1/indel_pe.cc maq-0.7.1/indel_soa.cc maq-0.7.1/install-sh maq-0.7.1/main.c maq-0.7.1/main.h maq-0.7.1/mapcheck.cc maq-0.7.1/maq.1 maq-0.7.1/maq.pdf maq-0.7.1/maq.pod maq-0.7.1/maqmap.c maq-0.7.1/maqmap.h maq-0.7.1/maqmap_conv.c maq-0.7.1/match.cc maq-0.7.1/match.hh maq-0.7.1/match_aux.cc maq-0.7.1/merge.cc maq-0.7.1/missing maq-0.7.1/pair_stat.cc maq-0.7.1/pileup.cc maq-0.7.1/rbcc.cc maq-0.7.1/read.cc maq-0.7.1/read.h maq-0.7.1/rmdup.cc maq-0.7.1/scripts/ maq-0.7.1/scripts/asub maq-0.7.1/scripts/farm-run.pl maq-0.7.1/scripts/fq_all2std.pl maq-0.7.1/scripts/maq.pl maq-0.7.1/scripts/maq_eval.pl maq-0.7.1/scripts/maq_plot.pl maq-0.7.1/scripts/maq_post.pl maq-0.7.1/scripts/maq_sanger.pl maq-0.7.1/scripts/paf_utils.pl maq-0.7.1/scripts/solid2fastq.pl maq-0.7.1/seq.c maq-0.7.1/seq.h maq-0.7.1/simulate.c maq-0.7.1/sort_mapping.cc maq-0.7.1/stdaln.c maq-0.7.1/stdaln.h maq-0.7.1/stdhash.hh maq-0.7.1/submap.c maq-0.7.1/subsnp.cc
cd maq-0.7.1/
README Contents
Mapass2 is a software that builds mapping assemblies from short reads
generated by the next-generation sequencing machines. It is particularly
designed for Illumina-Solexa 1G Genetic Analyzer, which typically
generates reads 25-35bp in length.Mapass2 first aligns reads to reference sequences and then calls the
consensus. At the mapping stage, maq performs ungapped alignment. For
single-end reads, maq is able to find all hits with up to 2 or 3
mismatches, depending on a command-line option; for paired-end reads, it
always finds all paired hits with one of the two reads containing up to
1 mismatch. At the assembling stage, maq calls the consensus based on a
statistical model. It calls the base which maximizes the posterior
probability and calculates a phred quality at each position along the
consensus. Heterozygotes are also called in this process.For more information, see also maq website:
http://mapass.sourceforge.net
INSTALL Contents
There are two ways to compile maq. The first way is to use the GNU
building systems. Simply type './configure; make; make install' to
compile and to install maq. Three executables 'maq', 'maq.pl' and
'farm-run.pl' will be copied to '/usr/local/bin' by default.Alternatively, one could compile with 'make -f Makefile.generic' and
manually copy the three executables to the destination directory.
Modification to 'Makefile.generic' is sometimes needed for different
architectures.
As I'm running this from /opt, we'll be doing the first way to compile Maq but using "sudo" in each case.
USERID@MACHINE:/opt/maq-0.7.1$ sudo ./configure
Produces…
checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes checking build system type... x86_64-unknown-linux-gnu checking host system type... x86_64-unknown-linux-gnu checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking for g++... g++ checking whether we are using the GNU C++ compiler... yes checking whether g++ accepts -g... yes checking if gcc accepts -m64... yes checking how to run the C preprocessor... gcc -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking zlib.h usability... yes checking zlib.h presence... yes checking for zlib.h... yes configure: creating ./config.status config.status: creating Makefile config.status: creating config.h
USERID@MACHINE:/opt/maq-0.7.1$ sudo make
Produces…
cd . && /bin/bash /opt/maq-0.7.1/missing --run autoheader /opt/maq-0.7.1/missing: line 54: autoheader: command not found WARNING: `autoheader' is missing on your system. You should only need it if you modified `acconfig.h' or `configure.ac'. You might want to install the `Autoconf' and `GNU m4' packages. Grab them from any GNU archive site. rm -f stamp-h1 touch config.h.in cd . && /bin/bash ./config.status config.h config.status: creating config.h config.status: config.h is unchanged make all-am make[1]: Entering directory `/opt/maq-0.7.1' gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c main.c gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c const.c gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c seq.c gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c bfa.c bfa.c: In function ‘nst_load_bfa1': bfa.c:31: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result bfa.c:32: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result bfa.c:33: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result bfa.c:35: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result bfa.c:37: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result bfa.c: In function ‘nst_bfa_len': bfa.c:46: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result bfa.c:48: warning: ignoring return value of ‘fread', declared with attribute warn_unused_result g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o read.o read.cc gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c fasta2bfa.c gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c fastq2bfq.c g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o merge.o merge.cc g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o match_aux.o match_aux.cc g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o match.o match.cc match.cc: In function ‘int alt_cal_mm(bit64_t)': match.cc:58: warning: suggest parentheses around ‘+' in operand of ‘&' match.cc:61: warning: suggest parentheses around ‘+' in operand of ‘&' match.cc: In function ‘int alt_cal_err(bit64_t, bit64_t)': match.cc:67: warning: suggest parentheses around ‘+' in operand of ‘&' match.cc:70: warning: suggest parentheses around ‘+' in operand of ‘&' match.cc: In function ‘int ma_match(int, char**)': match.cc:525: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)', declared with attribute warn_unused_result g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o sort_mapping.o sort_mapping.cc sort_mapping.cc: In function ‘int ma_make_pair(const match_aux_t*, const match_info_t*, const match_info_t*, pair_info_t*)': sort_mapping.cc:59: warning: suggest parentheses around arithmetic in operand of ‘^' g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o assemble.o assemble.cc assemble.cc: In function ‘base_call_aux_t* assemble_cns_collect(assemble_pos_t*, const assemble_aux_t*)': assemble.cc:106: warning: suggest parentheses around arithmetic in operand of ‘|' g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o pileup.o pileup.cc g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o mapcheck.o mapcheck.cc gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c get_pos.c gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c assopt.c gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c aux_utils.c g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o rbcc.o rbcc.cc g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o subsnp.o subsnp.cc g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o pair_stat.o pair_stat.cc g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o indel_soa.o indel_soa.cc indel_soa.cc: In function ‘void fill_counter(bit32_t*, int, nst_bfa1_t*, void*)': indel_soa.cc:42: warning: suggest parentheses around ‘-' inside ‘< <' indel_soa.cc:56: warning: suggest parentheses around ‘-' inside ‘<<' gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c maqmap.c gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c maqmap_conv.c g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o altchr.o altchr.cc gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c submap.c g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o rmdup.o rmdup.cc gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c simulate.c In file included from /usr/include/string.h:640, from maqmap.h:23, from simulate.c:11: In function ‘memset', inlined from ‘simustat_core' at simulate.c:386: /usr/include/bits/string3.h:86: warning: call to __builtin___memset_chk will always overflow destination buffer gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c genran.c g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o indel_pe.o indel_pe.cc gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c stdaln.c g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o indel_call.o indel_call.cc g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o eland2maq.o eland2maq.cc eland2maq.cc: In function ‘hash_map_char* read_list(FILE*)': eland2maq.cc:33: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)', declared with attribute warn_unused_result eland2maq.cc: In function ‘void eland2maq_core(FILE*, FILE*, void*)': eland2maq.cc:88: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)', declared with attribute warn_unused_result eland2maq.cc:96: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)', declared with attribute warn_unused_result eland2maq.cc:99: warning: ignoring return value of ‘int fscanf(FILE*, const char*, ...)', declared with attribute warn_unused_result eland2maq.cc: In function ‘void novo2maq_core(FILE*, FILE*, void*)': eland2maq.cc:323: warning: ignoring return value of ‘char* fgets(char*, int, FILE*)', declared with attribute warn_unused_result g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o csmap2ntmap.o csmap2ntmap.cc gcc -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c break_pair.c g++ -DHAVE_CONFIG_H -I. -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -c -o glfgen.o glfgen.cc glfgen.cc: In function ‘glf1_t* glfgen1_core(assemble_pos_t*, const assemble_aux_t*, bit8_t)': glfgen.cc:43: warning: suggest parentheses around arithmetic in operand of ‘|' g++ -Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -o maq main.o const.o seq.o bfa.o read.o fasta2bfa.o fastq2bfq.o merge.o match_aux.o match.o sort_mapping.o assemble.o pileup.o mapcheck.o get_pos.o assopt.o aux_utils.o rbcc.o subsnp.o pair_stat.o indel_soa.o maqmap.o maqmap_conv.o altchr.o submap.o rmdup.o simulate.o genran.o indel_pe.o stdaln.o indel_call.o eland2maq.o csmap2ntmap.o break_pair.o glfgen.o -lm -lz make[1]: Leaving directory `/opt/maq-0.7.1'
USERID@MACHINE:/opt/maq-0.7.1$ sudo make install
Produces…
make[1]: Entering directory `/opt/maq-0.7.1' test -z "/usr/local/bin" || /bin/mkdir -p "/usr/local/bin" /usr/bin/install -c 'maq' '/usr/local/bin/maq' test -z "/usr/local/bin" || /bin/mkdir -p "/usr/local/bin" /usr/bin/install -c 'scripts/maq.pl' '/usr/local/bin/maq.pl' /usr/bin/install -c 'scripts/farm-run.pl' '/usr/local/bin/farm-run.pl' /usr/bin/install -c 'scripts/maq_plot.pl' '/usr/local/bin/maq_plot.pl' /usr/bin/install -c 'scripts/maq_eval.pl' '/usr/local/bin/maq_eval.pl' make[1]: Nothing to be done for `install-data-am'. make[1]: Leaving directory `/opt/maq-0.7.1'
The Maq package is installed in /usr/local/bin and should be available immediately without any path calls. In the interest of running a brief test, I've provided a fastq file for phi-X174 and the "easy" command line run to run an alignment (rendered movie from the virusworld website, including a second half featuring the lovely QuteMol program, is below). While I wouldn't object to hosting a full lane of phi-X174, the 1.9 GB of fragments = unbearably long server upload. Suffice it to say, if you have a phi-X174 lane and you've run the BCL-to-QSEQ-to-FASTQ procedure in the BclConverter post, you have a properly formatted PhiXSequence.fastq for running this example.
You can download the phi-X174 sequence at phi_X174_sequence.fastq.gz, a local version of the file you can find at the National Center for Biotechnology Information. The sequence is below because, well, I wanted to have a sequence present on the blog post (and it is absolutely fascinating to me that this is the instruction manual for something).
>gi|216019|gb|J02482.1|PX1CG Coliphage phi-X174, complete genome GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTT GATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAA ATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTG TCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTA GATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGATTACTATC TGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTT TCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCGAAGATGATTT CGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCT TGCGTTTATGGTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCG TCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGGAAGGCGCTGAATTTAC GGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTA CGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAG TGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACT AAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGC CCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTCCCA TCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGAC TCCTTCGAGATGGACGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTA CTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGATTAAGTTCATGAA GGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTT GGCACGATTAACCCTGATACCAATAAAATCCCTAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACA ACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGC TCGTTATGGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTT TCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAAGCTGCTTATGCTAATTTGC ATACTGACCAAGAACGTGATTACTTCATGCAGCGTTACCATGATGTTATTTCTTCATTTGGAGGTAAAAC CTCTTATGACGCTGACAACCGTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCTATGATGTT GATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAACATTCTGTGC CGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCGAC TAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGACTTATACCGATATTGCTGGCGACCCTGTTTTG TATGGCAACTTGCCGCCGCGTGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGT TTAAGATTGCTGAGGGTCAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGA AGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCGCCACCATGAT TATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTTT ATCGCAATCTGCCGACCACTCGCGATTCAATCATGACTTCGTGATAAAAGATTGAGTGTGAGGTTATAAC GCCGAAGCGGTAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCAAGCGAAGCGCGGTAGGTTTTCTGC TTAGGAGTTTAATCATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGT TCTCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGCTACATCGTCAACGTTA TATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTG TCAACGCCGCTAATCAGGTTGTTTCTGTTGGTGCTGATATTGCTTTTGATGCCGACCCTAAATTTTTTGC CTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTG AATGGTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGC CGGGCAATAACGTTTATGTTGGTTTCATGGTTTGGTCTAACTTTACCGCTACTAAATGCCGCGGATTGGT TTCGCTGAATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTG CTATTGCTGGCGGTATTGCTTCTGCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGAGGCGGTCAAAA AGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACCGATAACAATACTGTAGGCATGGGTGATGCT GGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAGGCCGCCCCTAGTTTTGTTTCTG GTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAGGTACGTTGCAGGCTGGCACTTCTGCCGTTTCTGA TAAGTTGCTTGATTTGGTTGGACTTGGTGGCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTGATTAT CTTGCTGCTGCATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGG TTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAATCAGAAAGAGATTGCCGA GATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTTCACGCCAGAATACGAAAGAC CAGGTATATGCACAAAATGAGATGCTTGCTTATCAACAGAAGGAGTCTACTGCTCGCGTTGCGTCTATTA TGGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTCA AACGGCTGGTCAGTATTTTACCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGAC TTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTT CTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGA TACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCG TCAGGATTGACACCCTCCCAATTGTATGTTTTCATGCCTCCAAATCTTGGAGGCTTTTTTATGGTTCGTT CTTATTACCCTTCTGAATGTCACGCTGATTATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTAT TGAGGCTTGTGGCATTTCTACTCTTTCTCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGATAACCGC ATCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTATG TTGACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGA ATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTATAGACCACCGCCCCGAAGGG GACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGCAGTTTTGCCGCAAGCTGGCTGCTGAACGCC CTCTTAAGGATATTCGCGATGAGTATAATTACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATT GCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATGAATGCAATGCGACAG GCTCATGCTGATGGTTGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTT ATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGGTCGTATGGTTCTTGCTGCCGAGGGTCG CAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTACAGCTAATGGC CGTCTTCATTTCCATGCGGTGCACTTTATGCGGACACTTCCTACAGGTAGCGTTGACCCTAATTTTGGTC GTCGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCAT CGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAG CCGCTTAAAGCTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATA TGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACT TCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTG TCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGATATTGAAGC AGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACC TGTGACGACAAATCTGCTCAAATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCA
With this file downloaded and your phi-X174 fragment collection sitting in a file [I will assume is named] phi_X174_seq_fragments.fastq in the same directory, the command line run is simple:
maq.pl easyrun -d phi_X174 phi_X174_seq.fastq phi_X174_seq_fragments.fastq >& phi_X174.log
drwxr-xr-x 2 user user 4096 2010-12-11 17:18 phi_X174 -rw-r--r-- 1 user user 3524 2010-12-11 17:18 phi_X174.log -rw-r--r-- 1 user user 5529 2010-12-11 16:39 phi_X174_seq.fastq -rw-r--r-- 1 user user 1921502305 2010-12-11 16:40 phi_X174_seq_fragments.fastq
This will produce the phi_X174.log results file (check for errors. Log contents below)…
-- CMD: /usr/local/bin/maq fasta2bfa /home/user/phi_X174_seq.fastq \ phi_X174/ref.bfa 2> /dev/null -- CMD: /usr/local/bin/maq fastq2bfq -n 2000000 /home/user/phi_X174_seq_fragments.fastq \ phi_X174/read1 -- finish writing file 'phi_X174/read1@1.bfq' -- finish writing file 'phi_X174/read1@2000001.bfq' -- finish writing file 'phi_X174/read1@4000001.bfq' -- finish writing file 'phi_X174/read1@6000001.bfq' -- finish writing file 'phi_X174/read1@8000001.bfq' -- finish writing file 'phi_X174/read1@10000001.bfq' -- finish writing file 'phi_X174/read1@12000001.bfq' -- finish writing file 'phi_X174/read1@14000001.bfq' -- finish writing file 'phi_X174/read1@16000001.bfq' -- 16259703 sequences were loaded. -- CMD: (cd phi_X174; /usr/local/bin/maq map -n 2 -e 70 -u unmap1@8000001.txt \ aln1@8000001.map ref.bfa read1@8000001.bfq 2> aln1@8000001.map.log) -- CMD: (cd phi_X174; /usr/local/bin/maq map -n 2 -e 70 -u unmap1@2000001.txt \ aln1@2000001.map ref.bfa read1@2000001.bfq 2> aln1@2000001.map.log) -- CMD: (cd phi_X174; /usr/local/bin/maq map -n 2 -e 70 -u unmap1@6000001.txt \ aln1@6000001.map ref.bfa read1@6000001.bfq 2> aln1@6000001.map.log) -- CMD: (cd phi_X174; /usr/local/bin/maq map -n 2 -e 70 -u unmap1@10000001.txt \ aln1@10000001.map ref.bfa read1@10000001.bfq 2> aln1@10000001.map.log) -- CMD: (cd phi_X174; /usr/local/bin/maq map -n 2 -e 70 -u unmap1@16000001.txt \ aln1@16000001.map ref.bfa read1@16000001.bfq 2> aln1@16000001.map.log) -- CMD: (cd phi_X174; /usr/local/bin/maq map -n 2 -e 70 -u unmap1@4000001.txt \ aln1@4000001.map ref.bfa read1@4000001.bfq 2> aln1@4000001.map.log) -- CMD: (cd phi_X174; /usr/local/bin/maq map -n 2 -e 70 -u unmap1@14000001.txt \ aln1@14000001.map ref.bfa read1@14000001.bfq 2> aln1@14000001.map.log) -- CMD: (cd phi_X174; /usr/local/bin/maq map -n 2 -e 70 -u unmap1@1.txt aln1@1.map \ ref.bfa read1@1.bfq 2> aln1@1.map.log) -- CMD: (cd phi_X174; /usr/local/bin/maq map -n 2 -e 70 -u unmap1@12000001.txt \ aln1@12000001.map ref.bfa read1@12000001.bfq 2> aln1@12000001.map.log) -- CMD: (cd phi_X174; /usr/local/bin/maq mapmerge all.map aln1@8000001.map \ aln1@2000001.map aln1@6000001.map aln1@10000001.map aln1@16000001.map \ aln1@4000001.map aln1@14000001.m ap aln1@1.map aln1@12000001.map) -- CMD: (cd phi_X174; /usr/local/bin/maq mapcheck ref.bfa all.map > mapcheck.txt) [ma_mapcheck] processing gi|216019|gb|J02482.1|PX1CG... -- CMD: (cd phi_X174; /usr/local/bin/maq assemble -N 2 -Q 60 consensus.cns ref.bfa \ all.map 2> assemble.log) -- CMD: /usr/local/bin/maq cns2fq phi_X174/consensus.cns > phi_X174/cns.fq -- CMD: /usr/local/bin/maq cns2snp phi_X174/consensus.cns > phi_X174/cns.snp -- CMD: /usr/local/bin/maq cns2win phi_X174/consensus.cns > phi_X174/cns.win -- CMD: /usr/local/bin/maq indelsoa phi_X174/ref.bfa phi_X174/all.map > phi_X174/cns.indelse -- CMD: (cd phi_X174; touch unmap.indel) -- CMD: /usr/local/bin/maq.pl SNPfilter -q 40 -w 5 -N 2 -f phi_X174/cns.indelse -d 3 \ -D 256 -n 20 phi_X174/cns.snp > phi_X174/cns.final.snp -- 0 potential soa-indels pass the filter. -- CMD: (cd phi_X174; ln -s cns.final.snp cns.filter.snp) -- CMD: /usr/local/bin/maq.pl statmap phi_X174/*.map.log -- == statmap report == -- # single end (SE) reads: 16259703 -- # mapped SE reads: 16011454 (/ 16259703 = 98.47%) -- # paired end (PE) reads: 0 -- # mapped PE reads: 0 (/ 0 = NA%) -- # reads that are mapped in pairs: 0 (/ 0 = NA%) -- # Q>=30 reads that are moved to meet mate-pair requirement: 0 (/ 0 = NA%) -- # Q<30 reads that are moved to meet mate-pair requirement: 0 (NA%)
…and a "phi_X174" directory (from the "-d") containing (hopefully) the following file list:
drwxr-xr-x 2 user user 4096 2010-12-11 17:18 . drwxr-xr-x 19 user user 4096 2010-12-11 17:09 .. -rw-r--r-- 1 user user 337296876 2010-12-11 17:16 all.map -rw-r--r-- 1 user user 41635667 2010-12-11 17:13 aln1@10000001.map -rw-r--r-- 1 user user 18493 2010-12-11 17:13 aln1@10000001.map.log -rw-r--r-- 1 user user 42219864 2010-12-11 17:15 aln1@12000001.map -rw-r--r-- 1 user user 18493 2010-12-11 17:15 aln1@12000001.map.log -rw-r--r-- 1 user user 42885466 2010-12-11 17:14 aln1@14000001.map -rw-r--r-- 1 user user 18493 2010-12-11 17:14 aln1@14000001.map.log -rw-r--r-- 1 user user 5808963 2010-12-11 17:13 aln1@16000001.map -rw-r--r-- 1 user user 18484 2010-12-11 17:13 aln1@16000001.map.log -rw-r--r-- 1 user user 42616782 2010-12-11 17:14 aln1@1.map -rw-r--r-- 1 user user 18493 2010-12-11 17:14 aln1@1.map.log -rw-r--r-- 1 user user 41452684 2010-12-11 17:12 aln1@2000001.map -rw-r--r-- 1 user user 18493 2010-12-11 17:12 aln1@2000001.map.log -rw-r--r-- 1 user user 41223383 2010-12-11 17:14 aln1@4000001.map -rw-r--r-- 1 user user 18493 2010-12-11 17:14 aln1@4000001.map.log -rw-r--r-- 1 user user 41423788 2010-12-11 17:13 aln1@6000001.map -rw-r--r-- 1 user user 18493 2010-12-11 17:13 aln1@6000001.map.log -rw-r--r-- 1 user user 42709777 2010-12-11 17:12 aln1@8000001.map -rw-r--r-- 1 user user 18493 2010-12-11 17:12 aln1@8000001.map.log -rw-r--r-- 1 user user 8704 2010-12-11 17:18 assemble.log lrwxrwxrwx 1 user user 13 2010-12-11 17:18 cns.filter.snp -> cns.final.snp -rw-r--r-- 1 user user 509 2010-12-11 17:18 cns.final.snp -rw-r--r-- 1 user user 10983 2010-12-11 17:18 cns.fq -rw-r--r-- 1 user user 0 2010-12-11 17:18 cns.indelse -rw-r--r-- 1 user user 571 2010-12-11 17:18 cns.snp -rw-r--r-- 1 user user 452 2010-12-11 17:18 cns.win -rw-r--r-- 1 user user 3555 2010-12-11 17:18 consensus.cns -rw-r--r-- 1 user user 4525 2010-12-11 17:17 mapcheck.txt -rw-r--r-- 1 user user 51219471 2010-12-11 17:11 read1@10000001.bfq -rw-r--r-- 1 user user 51533445 2010-12-11 17:11 read1@12000001.bfq -rw-r--r-- 1 user user 52014489 2010-12-11 17:12 read1@14000001.bfq -rw-r--r-- 1 user user 6777154 2010-12-11 17:12 read1@16000001.bfq -rw-r--r-- 1 user user 51764856 2010-12-11 17:09 read1@1.bfq -rw-r--r-- 1 user user 51064770 2010-12-11 17:10 read1@2000001.bfq -rw-r--r-- 1 user user 50938162 2010-12-11 17:10 read1@4000001.bfq -rw-r--r-- 1 user user 51025084 2010-12-11 17:10 read1@6000001.bfq -rw-r--r-- 1 user user 51803458 2010-12-11 17:11 read1@8000001.bfq -rw-r--r-- 1 user user 2744 2010-12-11 17:09 ref.bfa -rw-r--r-- 1 user user 3579605 2010-12-11 17:13 unmap1@10000001.txt -rw-r--r-- 1 user user 3592541 2010-12-11 17:15 unmap1@12000001.txt -rw-r--r-- 1 user user 3790240 2010-12-11 17:14 unmap1@14000001.txt -rw-r--r-- 1 user user 492518 2010-12-11 17:13 unmap1@16000001.txt -rw-r--r-- 1 user user 3768747 2010-12-11 17:14 unmap1@1.txt -rw-r--r-- 1 user user 3668574 2010-12-11 17:12 unmap1@2000001.txt -rw-r--r-- 1 user user 3608445 2010-12-11 17:13 unmap1@4000001.txt -rw-r--r-- 1 user user 3429470 2010-12-11 17:13 unmap1@6000001.txt -rw-r--r-- 1 user user 3410453 2010-12-11 17:12 unmap1@8000001.txt -rw-r--r-- 1 user user 0 2010-12-11 17:18 unmap.indel
I've included the file sizes so you can see the amount of data generated from a 4 Kb phi_X174_sequence.fastq file and a 1.9 GB phi_X174_seq_fragments.fastq file.