For The Windows-Specific: Sed For Windows And A .bat File To Get Gaussian09 Files Working With aClimax

Provided you've installed Sed For Windows and know its proper path, the .bat file below should make all the modifications you need to your Gaussian09 .out files (in differently-named files at that) to get them properly loading in aClimax (see the previous post for all the details). A few simple steps:

1. Download and install Sed for Windows. Currently available at: gnuwin32.sourceforge.net/packages/sed.htm

2. Find its location on your machine. Under XP (where I'm using aClimax), this should be C:\Program Files\GnuWin32\bin

3. Copy + paste the text below into Notepad and save that as "aClimax_converter.bat" or something. NOTE: The quotes are IMPORTANT! You risk saving the file as an aClimax_converter.bat.txt file otherwise. The pause is optional. If there's something wrong with the conversion, keeping the pause will let you see the error. If, by some miracle, your Sed is installed elsewhere, change the PATH statement below. The .aclimaxconversion_step1 file will be deleted (just there for doing sequential Sed'ing in case additional modifications are needed in the future).

PATH=C:\Program Files\GnuWin32\bin;
sed.exe "s/  Atom  AN/ Atom AN /g" %1 > %1.aclimaxconversion_step1
sed.exe "s/ Atom   / Atom/g" %1.aclimaxconversion_step1 > %1.aClimaxable.out
del %1.aclimaxconversion_step1
pause

4. If the path is right, just drag + drop your .out files onto the .bat file (with a shortcut to the .bat file, or place a copy of the file in your working directory).

5. Finally, try opening one of the .aClimaxeable.out files in aClimax and report back if you've any problems.

Stupid-Simple (*nix-Specific) Sed Scripts To Get (All Current) Gaussian09 Output Files Working With aClimax

The following three snippets of Gaussian output are for an optimization and normal mode analysis of simple olde methane (CH4).

...
 ******************************************
 Gaussian 03:  EM64L-G03RevE.01 11-Sep-2007
                31-Aug-2014 
 ******************************************
...
 incident light, reduced masses (AMU), force constants (mDyne/A),
 and normal coordinates:
                     1                      2                      3
                     T                      T                      T
 Frequencies --  1356.0070              1356.0070              1356.0070
 Red. masses --     1.1789                 1.1789                 1.1789
 Frc consts  --     1.2771                 1.2771                 1.2771
 IR Inten    --    14.1122                14.1122                14.1122
 Atom AN      X      Y      Z        X      Y      Z        X      Y      Z
   1   1     0.02  -0.42   0.43    -0.34  -0.13  -0.08    -0.36  -0.23  -0.23
   2   6     0.00   0.08  -0.09     0.00   0.09   0.08     0.12   0.00   0.00
...
 -------------------
 - Thermochemistry -
 -------------------
 Temperature   298.150 Kelvin.  Pressure   1.00000 Atm.
 Atom  1 has atomic number  1 and mass   1.00783
...
...
 ******************************************
 Gaussian 09:  EM64L-G09RevA.02 11-Jun-2009
                31-Aug-2014 
 ******************************************
...
 incident light, reduced masses (AMU), force constants (mDyne/A),
 and normal coordinates:
                     1                      2                      3
                     T                      T                      T
 Frequencies --  1356.0058              1356.0058              1356.0058
 Red. masses --     1.1789                 1.1789                 1.1789
 Frc consts  --     1.2771                 1.2771                 1.2771
 IR Inten    --    14.1123                14.1123                14.1123
  Atom  AN      X      Y      Z        X      Y      Z        X      Y      Z
     1   1    -0.03   0.42   0.43    -0.34  -0.14   0.07    -0.36  -0.23   0.23
     2   6     0.00  -0.08  -0.10     0.01   0.10  -0.08     0.12   0.00   0.00
...
-------------------
 - Thermochemistry -
 -------------------
 Temperature   298.150 Kelvin.  Pressure   1.00000 Atm.
 Atom     1 has atomic number  1 and mass   1.00783
...
...
 ******************************************
 Gaussian 09:  EM64L-G09RevD.01 24-Apr-2013
                31-Aug-2014 
 ******************************************
...
 incident light, reduced masses (AMU), force constants (mDyne/A),
 and normal coordinates:
                      1                      2                      3
                     ?A                     ?A                     ?A
 Frequencies --   1356.0132              1356.0132              1356.0132
 Red. masses --      1.1789                 1.1789                 1.1789
 Frc consts  --      1.2771                 1.2771                 1.2771
 IR Inten    --     14.1119                14.1119                14.1119
  Atom  AN      X      Y      Z        X      Y      Z        X      Y      Z
     1   1     0.02   0.42   0.43     0.34  -0.14   0.08    -0.36   0.23  -0.23
     2   6     0.00  -0.08  -0.09    -0.01   0.09  -0.08     0.12   0.00   0.00
...
 -------------------
 - Thermochemistry -
 -------------------
 Temperature   298.150 Kelvin.  Pressure   1.00000 Atm.
 Atom     1 has atomic number  1 and mass   1.00783
...

Two of these things are not like the other. The data's nearly identical (and thank heavens. Unfortunately, Gaussian09 D.01 didn't see the fully-optimized methane as belonging to the Td point group – despite all three versions being run with the same exact input file – but a rigorous re-symmetrization would have taken care of that), but there are some subtle formatting differences between all three versions (including differences between both Gaussian09 versions) that cause the venerable, all-encompassing aClimax program (developed by Timmy, the venerable, all-encompassing A. J. Ramirez-Cuesta) to throw out the following errors for all three cases when you use *.log files from a *nix (UNIX, Linux) machine.

Serious Error: A-CLIMAX has encountered an unhanded error. Please Save your data and contact support
aClimax: Quote Error Number 9
Error Loading File: Error reading data. Please check and try again.
aClimax: WARNING loaded file containing no frequencies

Problem number 1 is the existence of *nix newlines (carriage returns) in the *.log files coming off a *nix machine. Performing a conversion from *nix to DOS (for myself, using LineBreak in OSX, but tofrodos works just as well), the Gaussian03 file now opens just fine in aClimax:

File Loaded: Data Loaded Succesfully [sic].

This, unfortunately, does not improve the matter with the Gaussian09 files, which produce the following error:

Error: One of the numbers you have entered is of the wrong type.Please recheck and try again
Error Loading File: Error reading data. Please check and try again.

Given how little of the .log file aClimax actually needs to produce simulated inelastic neutron scattering (INS) spectra, I ran the methane normal mode analyses in three different Gaussian versions to determine what, in G09, was changed to make it just un-G03 enough to fail to load. With those changes figured out, I had a Perl script drafted up that would have converted everything back to the original G03 format. It was awesome. That said, after a small amount of testing to see where aClimax's sensitivities lay, I discovered that very little of the .log file contents needed to be changed out, meaning that simple sed scripts would work just as well for those of us using our Windows boxes (or VirtualBox emulations) only for that "one stupid program" that keeps us having to log in (and, by that, I mean that we have sed already on our computers).

So, the problems between G09 and aClimax not related to carriage returns lie in two places.

1. The spacing of "Atom AN" – at the top of the eigenvector lists are the column labels, beginning with "Atom AN" – or something very close to "Atom AN" (the "|" in the boxes below mark the left edge of the output):

G03 E01 | Atom AN
G09 A02 |   Atom  AN
G09 D01 |  Atom  AN

Yes, the addition of a space or two results in a read error by aClimax. I would call this an… aggressive stringency in aClimax. That said, what did the original space in G03 versions not do that they do do in G09?

2. The spacing of "Atom N" – In the "Thermochemistry" section below the eigenvectors, atomic masses are listed as "Atom N" – or something very close to "Atom N" (again, the "|" in the boxes below mark the left edge of the output):

G03 E01 |  Atom  1
G09 A02 |    Atom     1
G09 D01 |   Atom     1

This change in spacing is also enough to cause aClimax to error out.

The Solution

A small sed script performs the necessary conversions on your *nix box (including OSX) for all .log files in a directory without issue:

#!/bin/sh

# This section converts all .log files to aClimax-friendly G03-ish format
find . -type f -name '*.log' -print | while read i
do
sed 's|  Atom  AN| Atom AN |g' $i > $i.aclimaxconversion_step1
sed 's| Atom   | Atom|g' $i.aclimaxconversion_step1 > $i.aClimaxable.log
rm $i.aclimaxconversion_step1
done

# This section converts all .out files to aClimax-friendly G03-ish format
find . -type f -name '*.out' -print | while read i
do
sed 's|  Atom  AN| Atom AN |g' $i > $i.aclimaxconversion_step1
sed 's| Atom   | Atom|g' $i.aclimaxconversion_step1 > $i.aClimaxable.out
rm $i.aclimaxconversion_step1
done

But Wait! Running G0* Jobs Under *nix? Convert To DOS Carriage Returns

The final problem halting your aClimax spectrum generation is the DOS carriage return (^M). For those running DOS-based Gaussian calculations (likely with a .out suffix), your conversion with the short script above (under *nix) likely (hopefully) worked just fine. For those running under *nix, you performed the conversion and still received the following aClimax error:

Serious Error: A-CLIMAX has encountered an unhanded error. Please Save your data and contact support
aClimax: Quote Error Number 9
Error Loading File: Error reading data. Please check and try again.
aClimax: WARNING loaded file containing no frequencies

The solution is an additional line in the sed script that will globally replace all *nix newlines with proper DOS carriage returns. The .out section remains the same.

#!/bin/sh

# This section converts all .log files to aClimax-friendly G03-ish format
find . -type f -name '*.log' -print | while read i
do
sed 's|  Atom  AN| Atom AN |g' $i > $i.aclimaxconversion_step1
sed 's| Atom   | Atom|g' $i.aclimaxconversion_step1 > $i.aclimaxconversion_step2
# This section converts your *nix newlines into DOS carriage returns
CR=`echo "\0015"`  # define the Carriage Return
sed -e "s/$/${CR}/g" $i.aclimaxconversion_step2 > $i.aClimaxable.log
done
# this cleans up your folder of temp files
rm *.aclimaxconversion_step1
rm *.aclimaxconversion_step2

# This section converts all .out files to aClimax-friendly G03-ish format
find . -type f -name '*.out' -print | while read i
do
sed 's|  Atom  AN| Atom AN |g' $i > $i.aclimaxconversion_step1
sed 's| Atom   | Atom|g' $i.aclimaxconversion_step1 > $i.aClimaxable.out
rm $i.aclimaxconversion_step1
done

Q. But what if I run the *nix-to-DOS version of the script on an already DOS-output file?

A1. The simple answer is that you'll make your text file double-spaced (which is bad enough). aClimax will then provide the following error when you try to open it:

Error Reading File: Unexpected File End. File May be incorrect or corrupt.
Error Loading File: Error reading data. Please check and try again.

A2. I will assume that your problem is that you're running the script in DOS to try to get your G09 to read more like G03. In this case (assuming you're generating .out files), you'll want to use a text editor to make the replacements described above (which is to say, that Perl script might makes it way to this page eventually. If you write a DOS .bat file or similar script for all OS's, I'd be happy to link to it).

sed-Based Script For Converting NAMOT And NAMOT2 DNA Output To ffAMBER Format For GROMACS Topology Generation v1

In continuing efforts to streamline the simulation of atomistic DNA structures in GROMACS using the ffAMBER force field (the port of AMBER for GROMACS), the following script takes the .pdb output of NAMOT or NAMOT2 and does all of the atom label and atom label position conversions, correct 3' and 5' terminal H atom assignments, and random changes throughout the .pdb file to provide something that should flow seamlessly into GROMACS.

"Did you need to post the entire script and not just provide the downloadable text file as a link?" Of course, as I suspect no small number of people looking for how to convert a NAMOT pdb file into ffAMBER-speak will begin by searching based on GROMACS errors, which occur one missing residue label at a time. Hopefully, having the entire script readable by google and yahoo will cause it to pop up high in the search ranking.

Less searching, more simulating.

Now, there is one problem. NAMOT and NAMOT2 do not include the methyl group hydrogen atoms on the thymine residue, defaulting to a single C5M. In most of the GROMACS force fields, this is just fine, as the hydrogen atoms are subsumed into the methyl carbon (all non-polar C-H bonds are treated this way). For AMBER (ffAMBER, that is), all hydrogens are included. This fix is performed on the ffAMBER/GROMACS side in a modification to the .hdb and .rtp files that I will describe in an upcoming post.

How to use:

As a series of sed operations, you obviously need sed, which is available for all platforms and "pre-installed" with any self-respecting Linux/UNIX distro (which, of course, means OSX (the OS under which the script was generated).

To run this script, have the script and your NAMOT/NAMOT2-generated .pdb in the same directory and type:

./NAMOT_to_ffAMBER_in_GROMACS.script FILENAME.pdb NN

Where:

NAMOT_to_ffAMBER_in_GROMACS.script is the name of the script

FILENAME.pdb is the .pdb file (include the .pdb)

NN is the number of bases in each strand. This number is required in order to correctly change the atom types on the 3' end of each strand.

This script is downloadable form the following link: NAMOT_to_ffAMBER_in_GROMACS.script

I also include a 35-base C-G double helix NAMOT .pdb file at C_G_NAMOT.pdb. To test the script on your machine, type the following in a Terminal window:

./NAMOT_to_ffAMBER_in_GROMACS.script C_G_NAMOT.pdb 35

As usual, if you have problems, comments, questions, concerns, etc. please either make an account and post a comment for this post or send me an email and I'll keep the running tally.

C_Gpdb_QuteMolX_image_may2008

Also, this same scripting procedure works just fine for the GROMOS96 force fields (ffG53a5, ffG53a6, etc.) and I'll be posting the one I use for those calculations in short order (they are, in fact, easier to work with for GROMACS, as they also neglect the methyl group hydrogen atoms on the Thymine. In fact, they neglect ALL of the non-polar C-H bonds, so you end up deleting atoms from the NAMOT/NAMOT2 .pdb files).

################################################################################
#
# Questions?  Problems?  Complaints?  Better Ideas?
# Damian Allis, damian@somewhereville.com, www.somewhereville.com
#
# This script takes the double helix output from NAMOT and NAMOT2 (a and b
# strands) and converts them into a format that the current ffAMBER
# implementation for GROMACS can use in the generation of the GROMACS .top file.
#
################################################################################
#
# Generally, the following list of GROMACS runs should get you through an
# energy minimization without problem.  Note only 10 cations are added
# to your structure.  Change accordingly (or don't.  It doesn't matter for
# the test).
#
# Run these in order:
#
# pdb2gmx -nomerge -f DNA.pdb -o DNA_pdb2gmx.gro -p DNA_pdb2gmx.top
# editconf -f DNA_pdb2gmx.gro -o DNA_editconf.gro -d 1.0 -bt triclinic
# genbox -cp DNA_editconf.gro -cs -o DNA_genbox.gro -p DNA_pdb2gmx.top
# grompp -f em -c DNA_genion.gro -p DNA_pdb2gmx.top -o DNA_grompp2em.tpr
# genion -np 10 -norandom -pname Na -o DNA_genion.gro -s DNA_gromppem.tpr
#   -p DNA_pdb2gmx.top (this .top goes in the same line as the genion)
# grompp -f em -c DNA_genion.gro -p DNA_pdb2gmx.top -o DNA_grompp2em.tpr2
# mdrun -s DNA_grompp2em.tpr -o DNA_md_em.trr -c DNA_md_em.pdb -v
#
################################################################################
#
# In case you don't have one handy, here's the contents of an em.mpd file
# for use in the energy minimization test.
#
# Copy this content below, remove the "#", save as a text filed named
# -> em.mpd
#
# cpp                 =  /usr/bin/cpp
# define              =  -DFLEXIBLE
# integrator          =  steep
# nsteps              =  5000
# emtol               =  10.0
# emstep              =  0.01
# nstcgsteep          =  100
# coulombtype         = PME
# rvdw                = 1.0
# rlist               = 1.1
# rcoulomb            = 1.1
# pme_order           = 4
# ewald_rtol          = 1e-5
# vdwtype             = shift
# ns_type             = grid
# nstlist             = 10
#
################################################################################
#
# Here's the command line:
#
# ./NAMOT_to_ffAMBER_in_GROMACS.sed $1 $2
#
# $1 = file name (including the .pdb, as I often forget to not include it)
# $2 = number of the 3' base for conversion into Dn3 (n = A,T,G,C)
# the number in $2 will automatically do the 3' and 5' conversion (keep the
# terminal hydrogens on the PO4- groups)
#
################################################################################
################################################################################
#
# The magic happens below.
#
################################################################################
################################################################################
#
# First thing first, make a backup of the original pdb file in case you goof.
#
cp $1 $1_original
#
################################################################################
#
# This section converts all of the "*" with "z" so that you're not using the
# asterisk during the editing.  Replacing with the ffAMBER-requisite
# "single-quote" (') makes the sed script more complicated than it needs to be.
#
sed 's/*/z/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section changes the nitrogen hydrogen (NH2) labels to those expected by
# ffMABER.  Hn2/1, where n is the atom number in the formal labeling scheme.
#
# THYIME is its own problem, as the methyl carbon needs modification.
# This is addressed by a separate ffAMBER modification.  Check
# https://www.somewhereville.com/?cat=74 (my AMBER category) for details.
#
#
# HN2A/B occurs in GUANINE.
#
sed 's/HN2A/ H21/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN2B/ H22/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
# HN4A/B occurs in CYTOSINE.
#
sed 's/HN4A/ H41/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN4B/ H42/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
# HN6A/B occurs in ADENINE.
#
sed 's/HN6A/ H61/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN6B/ H62/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section converts ADE, CYT, GUA, THY in the NAMOT output to DA, DC, DG, DT
# in accord with the topology labels used by ffAMBER in GROMACS for the nucleic
# acids.
#
#
sed 's/ADE / DA /' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/CYT / DC /' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/GUA / DG /' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/THY / DT /' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This is the bulk of the conversion, moving atoms around and formatting.
# Mostly, just moving atom labels over one column to the left.  This doesn't
# necessarily have to be done, but conforms to pdb format better and some
# program may need the atom labels in the columns as defined below.
#
# This section changes all of the hydrogen atom labels.
#
sed 's/1H5z/H5z1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/2H5z/H5z2/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/H2Az/H2z1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/H2Bz/H2z2/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# These are global changes in column position and fix all nucleic acids.
#
sed 's/  P  D/P    D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ N9  D/N9   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ N7  D/N7   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ N6  D/N6   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ N3  D/N3   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ N1  D/N1   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ C8  D/C8   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ C6  D/C6   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ C5  D/C5   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ C4  D/C4   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ C2  D/C2   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ H8  D/H8   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ H2  D/H2   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ H3  D/H3   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ H6  D/H6   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ O3  D/O3   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ O4  D/O4   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ O2  D/O    D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/C5M  D/C7   D/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section converts the 5' end of both chains into ffAMBER format.  Always
# begins with "1".  If you deleted some of the double strand at the 5' end and
# the first base number is NOT 1, this script will still run but give you
# a final structure that will require additional modification before running
# the pdb2gmx topology generator.
#
################################################################################
#
# chain a ADENINE adjustment
#
sed 's/  DA a   1/ DA5 a   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DA5/H5T DA5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DA5/O5z DA5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DA/H3T DA3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DA/O3z DA3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# chain b ADENINE adjustment
#
sed 's/  DA b   1/ DA5 b   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DA5/H5T DA5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DA5/O5z DA5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DA/H3T DA3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DA/O3z DA3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# chain a THYMINE adjustment
#
sed 's/  DT a   1/ DT5 a   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DT5/H5T DT5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DT5/O5z DT5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DT/H3T DT3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DT/O3z DT3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/  DG a   1/ DG5 a   1/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# chain b THYMINE adjustment
#
sed 's/  DT b   1/ DT5 b   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DT5/H5T DT5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DT5/O5z DT5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DT/H3T DT3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DT/O3z DT3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# chain a GUANINE adjustment
#
sed 's/ HB DG5/H5T DG5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DG5/O5z DG5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DG/H3T DG3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DG/O3z DG3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# chain b GUANINE adjustment
#
sed 's/  DG b   1/ DG5 b   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DG5/H5T DG5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DG5/O5z DG5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE DG3/H3T DG3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T DG3/O3z DG3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# chain a CYTOSINE adjustment
#
sed 's/  DC a   1/ DC5 a   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DC5/H5T DC5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DC5/O5z DC5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DC/H3T DC3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DC/O3z DC3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# chain b CYTOSINE adjustment
#
sed 's/  DC b   1/ DC5 b   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DC5/H5T DC5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DC5/O5z DC5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE DC3/H3T DC3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T DC3/O3z DC3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section changes the last base in each chain (a and b) from the default
# "Dn" to "Dn3" so that the topology generation gets the 3' end correct.
# Goes by units, tens, hun, thou and searches specifically for the pattern
# in question (taking care to follow the standard  format for base number.
#
# NOTE: We do the junction, crossover, etc. generation outside of NAMOT.
# Therefore, each file output by NAMOT only has chain "a" and chain "b".
#
################################################################################
#
# changes the 3' strand if the length is from 1 to 9 (units)
# strand 1/a
#
sed 's/ DA a   '$2'/DA3 a   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC a   '$2'/DC3 a   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG a   '$2'/DG3 a   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT a   '$2'/DT3 a   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 1 to 9 (units)
# strand 2/b
#
sed 's/ DA b   '$2'/DA3 b  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC b   '$2'/DC3 b   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG b   '$2'/DG3 b   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT b   '$2'/DT3 b   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 10 to 99 (tens)
# strand 1/a
#
sed 's/ DA a  '$2'/DA3 a  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC a  '$2'/DC3 a  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG a  '$2'/DG3 a  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT a  '$2'/DT3 a  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 10 to 99 (tens)
# strand 2/b
#
sed 's/ DA b  '$2'/DA3 b  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC b  '$2'/DC3 b  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG b  '$2'/DG3 b  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT b  '$2'/DT3 b  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 100 to 999 (hund)
# strand 1/a
#
sed 's/ DA a '$2'/DA3 a '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC a '$2'/DC3 a '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG a '$2'/DG3 a '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT a '$2'/DT3 a '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 100 to 999 (hund)
# strand 2/b
#
sed 's/ DA b '$2'/DA3 b '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC b '$2'/DC3 b '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG b '$2'/DG3 b '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT b '$2'/DT3 b '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 1000 to 9999 (thou)
# strand 1/a
#
sed 's/ DA a'$2'/DA3 a'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC a'$2'/DC3 a'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG a'$2'/DG3 a'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT a'$2'/DT3 a'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 1000 to 9999 (thou)
# strand 2/b
#
sed 's/ DA b'$2'/DA3 b'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC b'$2'/DC3 b'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG b'$2'/DG3 b'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT b'$2'/DT3 b'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# Home stretch.  Changes all of the "z" atoms in the pdb file to ' (single-
# quotes) for ffMABER.
#
sed s/\z/\'/g $1 > $1_temp
rm $1
mv $1_temp $1_proper_pdb
#
#
################################################################################
#
# Questions?  Problems?  Complaints?  Better Ideas?
# Damian Allis, damian@somewhereville.com, www.somewhereville.com
#
################################################################################

www.somewhereville.com/?p=114
en.wikipedia.org/wiki/DNA
www.gromacs.org
chemistry.csulb.edu/ffamber
amber.scripps.edu
namot.lanl.gov
en.wikipedia.org/wiki/Thymine
en.wikipedia.org/wiki/Sed
en.wikipedia.org/wiki/Linux
en.wikipedia.org/wiki/Unix
www.apple.com/macosx