UNAFold 3.8, MFold Utilities 4.5/4.6 And Additional Component Installation (Using XCode Tools 3 And Fink 0.29.21) For OSX 10.6.x

NOTE: The version numbers for everything are given specifically because aspects of the installation process may change with different versions and, in the event, I will not necessarily know the answer to subsequent problems if major version changes include major changes to the below (and that should clear up the "qualifications" section).

The UNAFold (UNified Nucleic Acid Fold(ing)) nucleic acid folding and hybridization prediction program set (here using version 3.8) can by itself be built with few (and not important) errors in OSX with Xcode Tools 3. The actual running of UNAFold.pl produces several errors that do not affect the run but do affect the amount/format of the output. It is my assumption that any OS running a less-than "kitchen sink" installation of Linux/Unix (Ubuntu, gentoo and Damn Small Linux come to mind) will have these errors and will require subsequent installations of programs/libraries that pieces of UNAFold rely on for processing output into, specifically, images and PDF files. OSX has the same issue that is easy to handle using Fink (and less so trying to install otherwise completely unrelated programs to make these "dependencies" (programs and libraries) available to UNAFold). Once Fink is installed, it is a few-step process to build UNAFold, move the Mfold Utilities contents to their proper folders (and there is a small trick here as well), and generate a UNAFold-complete install for all your DNA/RNA needs.

1. UNAFold 3.8 Installation

To begin, download (currently at mfold.rna.albany.edu/?q=DINAMelt/software), extract, open a terminal, cd into the unafold_3.8 directory (likely ~/Downloads/unafold_3.8), and run ./configure.

[prompt]$ cd ~/Downloads/unafold_3.8
[prompt]$ ./configure

On my machine (MacBook Pro, 10.6.x OSX + XCode Tools 3), this produces the output found in the local file 2011june_unafold_configure_output.txt.

You will likely note two sets of errors in the ./configure output:

./configure: line 8579: sort: No such file or directory
./configure: line 8576: sed: No such file or directory
./configure: line 10077: sort: No such file or directory
./configure: line 10074: sed: No such file or directory

The 10077 and 10074 errors are a bit odd because there are only 10039 lines in the configure file.

Are these errors important? No, you can build UNAFold just fine. I have run into these two "sort" and "sed" problems with a few other build attempts in OSX but have no good answer as to how to get around them (in case you're wondering, sort and sed are most certainly installed on the machine. The "sort" error can be removed by specifying the path explicitly in the configure file (in line 8579, change "sort" the "/usr/bin/sort"), but the sed error persists in the few attempts I tried to work around it. It doesn't appear to be a simple PATH issue. I'm not yet interested enough in finding a proper solution but, if you know, please post a comment or send a message. Is it just a character issue as discussed at itmercenary.livejournal.com/1585.html?).

running "make" produces the output found in the local file 2011june_unafold_make_output.txt.

[prompt]$ make

No issues. To install UNAFold, which will default to putting components into /usr/local/bin and /usr/local/share/, run sudo make install, which produces the output found in the local file 2011june_unafold_sudo_make_install_output.txt.

[prompt]$ sudo make install

Again, no issues. You will now have a populated /usr/local/bin folder.

2. MFold Utilities 4.5 (and, currently, the source for 4.6)

The next (optional) step is the inclusion of the mfold_util-4.5-Mac binaries (currently available at mfold.rna.albany.edu/?q=mfold/download-mfold), which I've also placed into the /usr/local/bin folder by extracting the contents of this file, them performing a cp * /usr/local/bin from within the MacBin directory.

[prompt]$ cd ~/Downloads/MacBin/
[prompt]$ sudo cp * /usr/local/bin

The processing of the data into plots with these programs requires that a set of *.col files be placed in the folder /usr/local/shared/mfold_util. Furthermore, these *.col are NOT provided in the mfold_util-4.5-Mac binary package. To get these files, you need only download the mfold_util-4.6.tar.gz file (currently at mfold.rna.albany.edu/?q=mfold/download-mfold), cd your way into src, make the /usr/local/shared/mfold_util folder, and copy the *.col files to /usr/local/shared/mfold_util.

[prompt]$ sudo mkdir /usr/local/shared/mfold_util
[prompt]$ cd Downloads/mfold_util-4.6/src
[prompt]$ sudo cp *.col /usr/local/shared/mfold_util

3. Fink 0.29.21 Install From Scratch

The first indication that other work was required came from trying to run mutplot randomly, which produced the following error:

dyld: Library not loaded: /sw/lib/libpng12.0.dylib
  Referenced from: /usr/local/bin/mutplot
  Reason: image not found
Trace/BPT trap

As digging around for libraries is not as straightforward as it would be for a Linux distro, I chose instead to solve the many problems by installing dependencies through the Fink program (currently fink-0.29.21). As 10.6.x users will find that there is no available Fink binary, you must build this from the source (which, with Xcode Tools 3 installed, occurs without error. If you don't have Xcode Tools 3 installed, the new mechanism for buying a copy of XCode Tools 4 is less than ideal (to me, anyway. $4.99?) but now occurs through the App Store).

Download the fink source (fink 0.29.21), extract, cd into the fink-0.29.21 directory, and run bootstrap. Upon completion, you run pathsetup.sh, source your .profile, and update fink.

[prompt]$ cd ~/Downloads/fink-0.29.21
[prompt]$ ./bootstrap
[prompt]$ . /sw/bin/pathsetup.sh 
[prompt]$ cd ~/
[prompt]$ source .profile
[prompt]$ fink selfupdate-rsync
[prompt]$ fink update-all

The output for my installation can be found in 2011june_fink_install_output.txt. The rsync output can be found in 2011june_fink_selfupdate_rsync_output.txt. NOTE: You will be asked several questions about the installation process. Be prepared to blindly select the default settings with [enter], but don't just walk away from the screen.

This completes the UNAFold install, MacBin install, and Fink install, meaning now we can walk through the dependencies.

4. Installing UNAFold (well, MFold Utils) Dependencies

First dependency-free UNAFold.pl run attempt produces the following error:

[prompt]$ UNAFold.pl seqtest.txt 
Checking for boxplot_ng... dyld: Library not loaded: /sw/lib/libpng12.0.dylib
  Referenced from: /usr/local/bin/boxplot_ng
  Reason: image not found
found, supports Postscript
Checking for hybrid-plot-ng... found, supports Postscript
Checking for sir_graph_ng or sir_graph... dyld: Library not loaded: /sw/lib/libpng12.0.dylib
  Referenced from: /usr/local/bin/sir_graph
  Reason: image not found
found, supports Postscript
Checking for ps2pdfwr... not found
Calculating for seqtest.txt, t = 37

As the UNAFold install page states, you need glut, the GD library, and gnuplot installed (and all of the many libraries therein).

[prompt]$ fink install libjpeg tetex gd2 gnuplot

For gnuplot, you will be required to make a few selections during the build process (blindly hitting the enter key at these questions will do, but this is not just a "type and go" install process. And it took about two hours on a MBP).

A final working error-free run looks as below, leaving you to process the data with the MFold Utilities as you like:

[prompt]$ UNAFold.pl seqtest.txt 
Checking for boxplot_ng... found, supports Postscript
Checking for hybrid-plot-ng... found, supports Postscript
Checking for sir_graph_ng or sir_graph... found, supports Postscript
Checking for ps2pdfwr... found
Calculating for seqtest.txt, t = 37

BclConverter-1.7.1 Installation In Ubuntu 10.04 LTS (And Related)

What follows is the procedure for successfully building and running BclConverter-1.7.1 under Ubuntu (specifically 10.04, but this will likely be generic for other versions) using only apt-get to install missing programs and libraries, thereby trying to keep the install process as build-friendly as possible to the general (non-coding) user.

So, What's BCL And Why Does It Need Converting?

The newest version of the Illumina sequencing software no longer uses the QSEQ format during the sequencing run, relying now on BCL files. This 12 January 2010 post snip from www.politigenomics.com covers the intro nicely.

Gone are the QSEQ files, they are replaced by BCL files which are binary, per image, per cycle files that contain the base call and quality information. Because they are per image, per cycle files, they can be transferred cycle by cycle as they are generated (as opposed to QSEQ files which are read based). The BCL files are also more compact, requiring only 1 byte/base (B/b) as compared to QSEQ files which require about 2.5 B/b. In addition, the intensity files are also not transferred by default, so RTA output goes from 10 B/b to just 1 B/b. Thus, even though you are generating five times more sequence data than a GA, your RTA directory will actually be smaller (about 250 GB).

That is all well and good, most of the open source programs of relevance (to me, anyway) require FASTQ format as input. As there is no one-stop conversion from BCL to FASTQ from the illumina downloads, the generation of QSEQ files is still a necessary (although not significantly difficult) evil. QSEQ files are generate-able from the BCL data (with maintenance of the illumina directory structure!) with the BclConverter code.

Unfortunately, the conversion from BCL format to QSEQ format (which is a file format for which many scripts exist online for conversion into the ever-familiar FASTQ format) requires an additional installation on your network machine, this installation being the BclConverter (v1.7.1) program available from the Illumina iCom website (registration required). This BclConverter program is not a pre-compiled binary, .rpm, .deb, .etc package, meaning the build is done by you from scratch. For many Linux distributions, this is non-problematic, as the build uses fairly standard tools. If you're running Ubuntu, you will find yourself compiling (and running) with a host of show-stopping (or eye candy-stopping) errors. What lies below takes care of these errors.

Quick Summary

If you walk through the following steps, you'll have no issue installing BclConverter. The more exhaustive discussion is below.

> sudo aptitude update

> sudo aptitude upgrade

> sudo apt-get install build-essential mercurial cmake python2.6-dev python3.1-dev gettext
libopenal1 libopenexr-dev libavdevice52 freeglut3-dev libglew1.5-dev libxmu-dev libxi-dev
libfreeimage-dev doxygen libqt4-dev bison flex libbz2-dev libpng12-dev libxml-simple-perl
ia32-libs lib32asound2 lib32ncurses5 lib32nss-mdns lib32z1 lib32gfortran3 gcc-4.3-multilib
gcc-multilib lib32gomp1 libc6-dev-i386 lib32mudflap0 lib32gcc1 lib32gcc1-dbg lib32stdc++6
lib32stdc++6-4.3-dbg libc6-i386 csh g++ g++-4.3 libstdc++6-4.3-dev g++-multilib
g++-4.3-multilib gcc-4.3-doc libstdc++6-4.3-dbg libstdc++6-4.3-doc nfs-common
nfs-kernel-server portmap ssh gnuplot

> sudo tar xvf BclConverter-1.7.1.tar.gz

> cd BclConverter-1.7.1

> sudo make install

Installation – The Long And Sometimes Error-Filled Version

There's a lot of error message duplication and step-wise discussion below because I assume that you found this page by searching against errors as they came up in the build process.

NOTE: The first installation attempt failed with the following packages installed additionally during the initial setup of the machine:

sudo apt-get install ia32-libs lib32asound2 lib32ncurses5 lib32nss-mdns lib32z1 lib32gfortran3
gcc-4.3-multilib gcc-multilib lib32gomp1 libc6-dev-i386 lib32mudflap0 lib32gcc1 lib32gcc1-dbg
lib32stdc++6 lib32stdc++6-4.3-dbg libc6-i386 csh g++ g++-4.3 libstdc++6-4.3-dev g++-multilib
g++-4.3-multilib gcc-4.3-doc libstdc++6-4.3-dbg libstdc++6-4.3-doc nfs-common nfs-kernel-server
portmap ssh

You may or may not need some of these (especially if you're running a 32-bit version of Ubuntu), but I can't say definitively that something above is NOT ALSO required beyond the apt-get list provided below, so just install them anyway (the NFS stuff may be overkill, but if you're going to mount this machine for sequencer file transfer, you'll need this and/or SAMBA anyway).

My first build attempt of BclConverter with a mostly fresh Ubuntu install provided the following error:

...failed updating 2 targets...
...skipped 3 targets...
...updated 7846 targets...
boost.sh: build failed: Terminating...
CMake Error at c++/CMakeLists.txt:177 (message):
  Failed to build Boost


-- Configuring incomplete, errors occurred!
make: *** [build/Makefile] Error 1

So, we know that the Boost 1.42 libraries are not installed. Part of the BclConverter build process involves building a copy of these libraries (what failed above). The problem above was not a missing Boost as much as it was missing build tools for the whole program.

If the problem is Boost 1.42, why not just install the Ubuntu package? I'm not entirely sure, but there may or may not be something about the BclConverter build that requires something in Boost 1.42 to be findable by the BclConverter in its local directory (not too likely, but I didn't diagnose it). Also, the problem may be version-specific (more likely than not), as the Boost build one can apt-get is 2.0-m12-2. Which is to say, installing the Ubuntu package…

sudo apt-get install boost-build

…did not solve the Boost problem. The possible solutions are to (1) build Boost 1.42 yourself or (2) simply let the BclConverter build take care of this (since the Boost 1.42 library is included in the BclConverter package for building). A Boost 1.42 build attempt external to the BclConverter program did not, in fact, solve the Boost problem in a subsequent BclConverter build attempt (I spare you repeat of the same error), making the successful apt-get-based approach all the easier.

We begin by updating our aptitude database and upgrading your machine (this is a skip-able step, but I prefer keeping everything up-to-date).

sudo aptitude update
sudo aptitude upgrade

The required build programs and libraries for BclConverter-1.7.1 (that are not part of my standard lib32 et al. install-ables listed above) are install-able as below:

sudo apt-get install build-essential mercurial cmake python2.6-dev python3.1-dev gettext
libopenal1 libopenexr-dev libavdevice52 freeglut3-dev libglew1.5-dev libxmu-dev libxi-dev
libfreeimage-dev doxygen libqt4-dev bison flex libbz2-dev libpng12-dev libxml-simple-perl
gnuplot

My routine setup preference is to place installed programs into /opt (purely for organizational purposes. It really doesn't matter where within reason). With BclConverter-1.7.1 downloaded from iCom, we'll move the .gz/.zip file to /opt, extract, untar, and install. With a Terminal window open and cd'ed to the BclConverter-1.7.1 download location (likely ~/Downloads, maybe ~/Desktop):

sudo mv BclConverter-1.7.1.tar.gz /opt
cd /opt
sudo tar xvf BclConverter-1.7.1.tar.gz
cd BclConverter-1.7.1
sudo make install

If, for any reason, you wish to see what the install log looks like, you can download mine for this session (in the 2010dec7__bclconverter_1_7_1_logs.zip file, see 2010dec7__bclconverter_1_7_1_build3b__successful__BUILD).

The last piece of the puzzle is to add the /opt/BclConverter/bin directory to your path, which we do in .profile as follows:

cd ~/
pico .profile

In .profile, add the following to the bottom somewhere…

PATH="/opt/BclConverter-1.7.1/bin/:$PATH"

Save and exit.

source .profile

Potential Errors Along The Way

This section is the most important part as it's likely how you found this post. Below are the few problems (and messages) that might arise that are solved by the installation of specific packages).

1. Boost Error And Attempted sudo apt-get install boost-build

The error with and without a boost-build install is the same.

...failed updating 2 targets...
...skipped 3 targets...
...updated 7846 targets...
boost.sh: build failed: Terminating...
CMake Error at c++/CMakeLists.txt:177 (message):
  Failed to build Boost


-- Configuring incomplete, errors occurred!
make: *** [build/Makefile] Error 1

The full list from the build attempts for both cases can be viewed in (in 2010dec7__bclconverter_1_7_1_logs.zip:

* 2010dec7__bclconverter_1_7_1_build1__boosterror__FAILED.txt – initial error
* 2010dec7__bclconverter_1_7_1_build2__aptgetboost__FAILED.txt – after boost-build install

Running the full apt-get (see the contents of 2010dec7__bclconverter_1_7_1_logs.zip, with the results in 2010dec7__bclconverter_1_7_1_build3a__aptgetlist__RESULTS) produces a successful BclConverter build. The log for my build is available in 2010dec7__bclconverter_1_7_1_build3b__successful__BUILD.txt.

2. XML:Simple-Related Error And libxml-simple-perl

Without either gnuplot or the libxml-simple-perl installation, a setupBclToQseq.py run will successfully generate QSEQ files. The additional tools provide you with some statistical and visual analyses of your results (so are definitely worth installing).

If you don't have libxml-simple-perl installed, you'll see the following error after running the BaseCalls "make":

/opt/BclConverter-1.7.1/bin/plotIntensity_tiles.pl tiles.txt s_8 SignalMeans _all.txt 
_all.png && echo `date` > s_8_all_pngs.txt
Can't locate XML/Simple.pm in @INC (@INC contains: /opt/BclConverter-1.7.1/lib/perl /etc/perl
/usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5
/usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at 
/opt/BclConverter-1.7.1/lib/perl/Gerald/Common.pm line 128.
BEGIN failed--compilation aborted at /opt/BclConverter-1.7.1/lib/perl/Gerald/Common.pm
line 128.
Compilation failed in require at /opt/BclConverter-1.7.1/bin/plotIntensity_tiles.pl line 22.
...
make: *** [s_1_all_pngs.txt] Error 2
make: *** [s_2_all_pngs.txt] Error 2
make: *** [s_3_all_pngs.txt] Error 2
make: *** [s_4_all_pngs.txt] Error 2
make: *** [s_5_all_pngs.txt] Error 2
make: *** [s_6_all_pngs.txt] Error 2
make: *** [s_7_all_pngs.txt] Error 2
make: *** [s_8_all_pngs.txt] Error 2

The fix is trivial (if you're doing it incrementally and don't already have XML:simple installed):

sudo apt-get install libxml-simple-perl

3. gnuplot Errors And Fix

If you don't have gnuplot already installed (and why wouldn't you?), you'll receive the following error during the BCL-to-QSEQ "make" process:

sh: gnuplot: not found
/opt/BclConverter-1.7.1/share/makefiles/bclToQseq/FlowCellTargets.mk:76: [IVC.htm
(s_1_all_pngs.txt s_2_all_pngs.txt s_3_all_pngs.txt s_4_all_pngs.txt s_5_all_pngs.txt
s_6_all_pngs.txt s_7_all_pngs.txt s_8_all_pngs.txt plotIntensity_for_IVC_finished.txt)
(s_1_all_pngs.txt s_2_all_pngs.txt s_3_all_pngs.txt s_4_all_pngs.txt s_5_all_pngs.txt
s_6_all_pngs.txt s_7_all_pngs.txt s_8_all_pngs.txt plotIntensity_for_IVC_finished.txt)]

/opt/BclConverter-1.7.1/bin/create_IVC_thumbnail.pl . > IVC.htm.tmp && mv IVC.htm.tmp
IVC.htm
/opt/BclConverter-1.7.1/share/makefiles/bclToQseq/FlowCellTargets.mk:82: [All.htm
(s_1_all_pngs.txt s_2_all_pngs.txt s_3_all_pngs.txt s_4_all_pngs.txt s_5_all_pngs.txt
s_6_all_pngs.txt s_7_all_pngs.txt s_8_all_pngs.txt plotIntensity_for_IVC_finished.txt)
(s_1_all_pngs.txt s_2_all_pngs.txt s_3_all_pngs.txt s_4_all_pngs.txt s_5_all_pngs.txt
s_6_all_pngs.txt s_7_all_pngs.txt s_8_all_pngs.txt plotIntensity_for_IVC_finished.txt)]

/opt/BclConverter-1.7.1/bin/create_tile_thumbnails.pl all > FullAll.htm && \
	/opt/BclConverter-1.7.1/bin/create_tile_thumbnails.pl all --maxTiles=20 --link='
_a href="FullAll.htm"_Full output (Warning: may overload your browser!)_/a_' > All.htm.tmp 
&& mv All.htm.tmp All.htm

/opt/BclConverter-1.7.1/share/makefiles/bclToQseq/FlowCellTargets.mk:109:
[BustardSummary.xml (IVC.htm All.htm tiles.txt) (IVC.htm All.htm tiles.txt)]
/opt/BclConverter-1.7.1/bin/produceIntensityStats.pl .
Unable to find file /LOCATION_OF_INTENSITIES_FOLDER/Intensities/BaseCalls/http://www.somewhereville.com/
../samples.xml at /opt/BclConverter-1.7.1/lib/perl/Gerald/Jerboa.pm line 387.

...

/opt/BclConverter-1.7.1/share/makefiles/bclToQseq/FlowCellTargets.mk:58: [finished.txt
(Matrix Phasing s_1 s_2 s_3 s_4 s_5 s_6 s_7 s_8 BustardSummary.xml BustardSummary.xsl
IVC.htm All.htm) (Matrix Phasing s_1 s_2 s_3 s_4 s_5 s_6 s_7 s_8 BustardSummary.xml
BustardSummary.xsl IVC.htm All.htm)]
touch finished.txt.tmp && mv finished.txt.tmp finished.txt

With a:

sudo apt-get install gnuplot

All remaining errors in the BCL-to-QSEQ "make" process should disposed of, leaving you with a Plots directory containing multiple .png files after the QSEQ generation process.

4. Just Running "make" For BCL-to-QSEQ

The successful setupBclToQseq.py run:

setupBclToQseq.py -i /LOCATION_OF_FILES/Intensities/BaseCalls -p /LOCATION_OF_FILES/Intensities
 -o /LOCATION_OF_FILES/Intensities/BaseCalls --in-place --overwrite

ends with (also in setupBclToQseq.log):

setupBclToQseq.py version 1.7.1

Configuring /opt/BclConverter-1.7.1/share/makefiles/bclToQseq/Makefile to 
/LOCATION_OF_FILES/Intensities/BaseCalls/Makefile

Creating the 'Makefile.config'

Output directory succesfully initialized. Type 'make' in 
/LOCATION_OF_FILES/Intensities/BaseCalls to start the conversion

And if you simply type "make," you get the following error:

/opt/BclConverter-1.7.1/bin/plotIntensity_tiles.pl tiles.txt s_1 SignalMeans _all.txt
_all.png && echo `date` > s_1_all_pngs.txt
Can't locate XML/Simple.pm in @INC (@INC contains: /opt/BclConverter-1.7.1/lib/perl /etc/perl 
/usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 
/usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at 
/opt/BclConverter-1.7.1/lib/perl/Gerald/Common.pm line 128.
BEGIN failed--compilation aborted at /opt/BclConverter-1.7.1/lib/perl/Gerald/Common.pm 
line 128.
Compilation failed in require at /opt/BclConverter-1.7.1/bin/plotIntensity_tiles.pl line 22.
BEGIN failed--compilation aborted at /opt/BclConverter-1.7.1/bin/plotIntensity_tiles.pl
line 22.
make: *** [s_1_all_pngs.txt] Error 2

The complete log is available in 2010dec7__bclconverter_1_7_1_build3c__make_error_wo_j8.txt in
2010dec7__bclconverter_1_7_1_logs.zip.

A brief User Guide read will hip you to a proper run command in the BaseCalls directory (obvious, read the User Guide):

make -j 8

For a simple test (and I assume that your network directory structure for the Illumina is something like /LOCATION/TO/NETWORK/DATA/Data/Intensities and /LOCATION/TO/NETWORK/DATA/Data/Intensities/Basecalls (which it should be), we'll use the example in the BCLConverter User Guide (and be sureto download the .PDF).

This will produce a sizable logfile. You can check out a successful run in 2010dec7__bclconverter_1_7_1_build3i__with_gnuplot.txt in 2010dec7__bclconverter_1_7_1_logs.zip.

5. QSEQ-to-FASTQ Script

Not really an error, just a last little help to convert your QSEQ files into generic FASTQ format.

#!/bin/bash
for ((x=1;x< =8;x+=1)); do 
cat s_"$x"_1_*_qseq.txt | awk -F '\t' '{gsub(/\./,"N", $9); if ($11 > 0) printf("@%s_%04d:
%s:%s:%s:%s#%s/%s\n%s\n+%s_%04d:%s:%s:%s:%s#%s/%s\n%
\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$1,$2,$3,$4,$5,$6,$7,$8,$10)}' > s_
"$x"_sequence.fastq; 
done

NOTE: the "cat" contents has to be all on one line! Copy this script into a text editor and reformat (or download a copy – 2010dec7__qseq_to_fastq.script).