sed-Based Script For Converting NAMOT And NAMOT2 DNA Output To GROMOS96 Format For GROMACS Topology Generation v1

NOTE: This script works with additions to the ffG53a5/6.rtp (residue topology) files. This information is available at Modifications To The ffG53a6.rtp And ffG53a5.rtp Residue Topology Files Required For Using GROMOS96-NAMOT-GROMACS v1.

The script below is the precursor to the ffAMBER/NAMOT/GROMACS script posted previously. This script takes the output of a NAMOT or NAMOT2 DNA structure generation (.pdb) and does all of the atom label and atom label position conversions, correct 3'and 5'terminal H atom assignments, and makes changes throughout the .pdb file to provide something that should flow seamlessly into the GROMACS pdb2gmx .top generator for the GROMOS96 force field.

To reiterate a previous point: Did you need to post the entire script and not just provide the downloadable text file as a link? Of course, as I suspect no small number of people looking for how to convert a NAMOT pdb file into GROMOS96-speak will begin by searching based on GROMACS errors, which occur one missing residue label at a time. Hopefully, having the entire script readable by google and yahoo will cause it to pop up high in the search ranking.

The problem in the ffAMBER script with thymine (the lack of methyl hydrogens in the NAMOT and NAMOT2 pdb output) isn't a problem for GROMOS96, as these hydrogen atoms (and all non-polar (C-H) hydrogen atoms) are subsumed into their associated carbons. That is, only DNA O-H, N-H, and pi-system C-H hydrogens are considered in the GROMOS96 force field.

How to use:

As a series of sed operations, you obviously need sed, which is available for all platforms and "pre-installed" with any self-respecting Linux/UNIX distro (which, of course, means OSX (the OS under which the script was generated).

To run this script, have the script and your NAMOT/NAMOT2-generated .pdb in the same directory and type:

./NAMOT_to_GROMOS96_in_GROMACS.script FILENAME.pdb NN

Where:

NAMOT_to_GROMOS96_in_GROMACS.script is the name of the script

FILENAME.pdb is the .pdb file (include the .pdb)

NN is the number of bases in each strand. This number is required in order to correctly change the atom types on the 3' end of each strand.

This script is downloadable form the following link: NAMOT_to_GROMOS96_in_GROMACS.script

I also include a 35-base C-G double helix NAMOT .pdb file at C_G_NAMOT.pdb. To test the script on your machine, type the following in a Terminal window:

./NAMOT_to_GROMOS96_in_GROMACS.script C_G_NAMOT.pdb 35

As usual, if you have problems, comments, questions, concerns, etc. please either make an account and post a comment for this post or send me an email and I'll keep the running tally.

##############################################################################
#
# Questions?  Problems?  Complaints?  Better Ideas?
# Damian Allis, damian@somewhereville.com, www.somewhereville.com
#
# This script takes the double helix output from NAMOT and NAMOT2 (a and b
# strands) and converts them into a format that the current GROMOS96 (53a6/5)
# force field in GROMACS can use in the generation of the GROMACS .top file.
#
################################################################################
#
# Generally, the following list of GROMACS runs should get you through an
# energy minimization without problem.  Note only 10 cations are added
# to your structure.  Change accordingly (or don't.  It doesn't matter for
# the test).
#
# Run these in order:
#
# pdb2gmx -nomerge -f DNA.pdb -o DNA_pdb2gmx.gro -p DNA_pdb2gmx.top
# editconf -f DNA_pdb2gmx.gro -o DNA_editconf.gro -d 1.0 -bt triclinic
# genbox -cp DNA_editconf.gro -cs -o DNA_genbox.gro -p DNA_pdb2gmx.top
# grompp -f em -c DNA_genion.gro -p DNA_pdb2gmx.top -o DNA_grompp2em.tpr
# genion -np 10 -norandom -pname Na -o DNA_genion.gro -s DNA_gromppem.tpr
#   -p DNA_pdb2gmx.top (this .top goes in the same line as the genion)
# grompp -f em -c DNA_genion.gro -p DNA_pdb2gmx.top -o DNA_grompp2em.tpr2
# mdrun -s DNA_grompp2em.tpr -o DNA_md_em.trr -c DNA_md_em.pdb -v
#
################################################################################
#
# In case you don't have one handy, here's the contents of an em.mpd file
# for use in the energy minimization test.
#
# Copy this content below, remove the "#", save as a text filed named
# -> em.mpd
#
# cpp                 =  /usr/bin/cpp
# define              =  -DFLEXIBLE
# integrator          =  steep
# nsteps              =  5000
# emtol               =  10.0
# emstep              =  0.01
# nstcgsteep          =  100
# coulombtype         = PME
# rvdw                = 1.0
# rlist               = 1.1
# rcoulomb            = 1.1
# pme_order           = 4
# ewald_rtol          = 1e-5
# vdwtype             = shift
# ns_type             = grid
# nstlist             = 10
#
################################################################################
#
# Here's the command line:
#
# ./NAMOT_to_ffAMBER_in_GROMACS.sed $1 $2
#
# $1 = file name (including the .pdb, as I often forget to not include it)
# $2 = number of the 3' base for conversion into Dn3 (n = A,T,G,C)
# the number in $2 will automatically do the 3' and 5' conversion (keep the
# terminal hydrogens on the PO4- groups)
#
################################################################################
################################################################################
#
# The magic happens below.
#
################################################################################
################################################################################
#
# First thing first, make a backup of the original pdb file in case you goof.
#
cp $1 $1_original
#
################################################################################
#
# This section converts all of the "*" with "z" so that you're not using the
# asterisk during the editing.  Replacing with the ffAMBER-requisite
# "single-quote" (') makes the sed script more complicated than it needs to be.
#
sed 's/*/z/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# The section below deletes all of the non-polar CH2 and CH hydrogen atoms from
# the nucleic acids. In GROMACS, these are mass-subsumed into the carbon atom,
# so are ignored in the toplogy.
#
#
# deletes from ADE
#
sed '/H2Az ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/H2Bz ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/1H5z ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/2H5z ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H1z ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H3z ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H4z ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done deleting from ADE
#
# deletes from CYT
#
sed '/H2Az CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/H2Bz CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/1H5z CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/2H5z CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H1z CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H3z CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H4z CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done deleting from CYT
#
# deletes from GUA
#
sed '/H2Az GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/H2Bz GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/1H5z GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/2H5z GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H1z GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H3z GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H4z GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done deleting from GUA
#
# deletes from THY
#
sed '/H2Az THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/H2Bz THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/1H5z THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/2H5z THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H1z THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H3z THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H4z THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done deleting from THY
#
################################################################################
#
# This section changes the two nitrogen hydrogen labels to those expected by
# GROMACS.  Hn2/1, where n is the atom number in the formal labeling scheme.
#
#
# renames the NH2 hydrogen atoms for DADE
#
sed 's/HN6A/ H61/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN6B/ H62/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done renaming the NH2 hydrogen atoms for DADE
#
# renames the NH2 hydrogen atoms for DCYT
#
sed 's/HN4A/ H41/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN4B/ H42/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done renaming the NH2 hydrogen atoms for DCYT
#
# renames the NH2 hydrogen atoms for DGUA
#
sed 's/HN2A/ H21/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN2B/ H22/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done renaming the NH2 hydrogen atoms for DGUA
#
################################################################################
#
# This section renames the O3' and O5' oxygen atoms from the NAMOT output
# (O5T to O5* and O3T to O3*) to a PDB format so that the terminal H atoms
# can be added on with the T/F_NA topologies.
#
#
sed 's/O5T/O5z/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
sed 's/O3T/O3z/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section converts ADE, CYT, GUA, THY to DADE, DCYT, DGUA, DTHY in accord
# with the topology labels used by GROMACS for the nucleic acids.
#
#
sed 's/ADE /DADE/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/CYT /DCYT/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/GUA /DGUA/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/THY /DTHY/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section moves the column 23 chain labels to column 22 so that grompp
# can generate unique topology files for each unique chain.  As long as only
# lower-case letters are used for the chain labels (and this is the NAMOT
# default) the below moves everything and not any single-atom labels (which
# are generally all uppercase).
#
#
sed 's/ [a-z] /[a-z]  /' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section only replaces the FIRST nucleic acid for each chain with the F
# (5' end) labels for use with the F_NA topology.
#
# Yes, NAMOT starts its structures at the 5' end.
#
#
sed 's/DADEa   1/FADEa   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTa   1/FCYTa   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAa   1/FGUAa   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYa   1/FTHYa   1/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
sed 's/DADEb   1/FADEb   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTb   1/FCYTb   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAb   1/FGUAb   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYb   1/FTHYb   1/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
################################################################################
#
# This section changes the last base in the series to a "T" from the default
# "D" so that the topology corrects the 3' end.  Goes by units, tens, hun, thou
# and searches specifically for the pattern in question (taking care to follow
# the standard  format for base number.
#
################################################################################
#
# changes the 3' strand if the length is from 1 to 9 (units)
# strand 1/a
#
sed 's/DADEa   '$2'/TADEa   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTa   '$2'/TCYTa   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAa   '$2'/TGUAa   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYa   '$2'/TTHYa   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 1 to 9 (units)
# strand 2/b
#
sed 's/DADEb   '$2'/TADEb   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTb   '$2'/TCYTb   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAb   '$2'/TGUAb   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYb   '$2'/TTHYb   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 10 to 99 (tens)
# strand 1/a
#
sed 's/DADEa  '$2'/TADEa  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTa  '$2'/TCYTa  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAa  '$2'/TGUAa  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYa  '$2'/TTHYa  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 10 to 99 (tens)
# strand 2/b
#
sed 's/DADEb  '$2'/TADEb  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTb  '$2'/TCYTb  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAb  '$2'/TGUAb  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYb  '$2'/TTHYb  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 100 to 999 (hund)
# strand 1/a
#
sed 's/DADEa '$2'/TADEa '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTa '$2'/TCYTa '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAa '$2'/TGUAa '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYa '$2'/TTHYa '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 100 to 999 (hund)
# strand 2/b
#
sed 's/DADEb '$2'/TADEb '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTb '$2'/TCYTb '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAb '$2'/TGUAb '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYb '$2'/TTHYb '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 1000 to 9999 (thou)
# strand 1/a
#
sed 's/DADEa'$2'/TADEa'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTa'$2'/TCYTa'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAa'$2'/TGUAa'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYa'$2'/TTHYa'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 1000 to 9999 (thou)
# strand 2/b
#
sed 's/DADEb'$2'/TADEb'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTb'$2'/TCYTb'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAb'$2'/TGUAb'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYb'$2'/TTHYb'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
################################################################################
#
# Home stretch.  Changes all of the "z" atoms in the pdb file to * for GROMACS.
#
sed 's/z/*/' $1 > $1_temp
rm $1
cp $1_temp $1
cp $1_temp $1_postscript
rm $1_temp
#
################################################################################
#
# Questions?  Problems?  Complaints?  Better Ideas?
# Damian Allis, damian@somewhereville.com, www.somewhereville.com
#
################################################################################

chemistry.csulb.edu/ffamber
namot.lanl.gov
www.gromacs.org
en.wikipedia.org/wiki/DNA
www.rcsb.org/pdb/home/home.do
en.wikipedia.org/wiki/GROMOS
en.wikipedia.org/wiki/Thymine
en.wikipedia.org/wiki/Sed
en.wikipedia.org/wiki/Linux
en.wikipedia.org/wiki/Unix
www.apple.com/macosx

sed-Based Script For Converting NAMOT And NAMOT2 DNA Output To ffAMBER Format For GROMACS Topology Generation v1

In continuing efforts to streamline the simulation of atomistic DNA structures in GROMACS using the ffAMBER force field (the port of AMBER for GROMACS), the following script takes the .pdb output of NAMOT or NAMOT2 and does all of the atom label and atom label position conversions, correct 3' and 5' terminal H atom assignments, and random changes throughout the .pdb file to provide something that should flow seamlessly into GROMACS.

"Did you need to post the entire script and not just provide the downloadable text file as a link?" Of course, as I suspect no small number of people looking for how to convert a NAMOT pdb file into ffAMBER-speak will begin by searching based on GROMACS errors, which occur one missing residue label at a time. Hopefully, having the entire script readable by google and yahoo will cause it to pop up high in the search ranking.

Less searching, more simulating.

Now, there is one problem. NAMOT and NAMOT2 do not include the methyl group hydrogen atoms on the thymine residue, defaulting to a single C5M. In most of the GROMACS force fields, this is just fine, as the hydrogen atoms are subsumed into the methyl carbon (all non-polar C-H bonds are treated this way). For AMBER (ffAMBER, that is), all hydrogens are included. This fix is performed on the ffAMBER/GROMACS side in a modification to the .hdb and .rtp files that I will describe in an upcoming post.

How to use:

As a series of sed operations, you obviously need sed, which is available for all platforms and "pre-installed" with any self-respecting Linux/UNIX distro (which, of course, means OSX (the OS under which the script was generated).

To run this script, have the script and your NAMOT/NAMOT2-generated .pdb in the same directory and type:

./NAMOT_to_ffAMBER_in_GROMACS.script FILENAME.pdb NN

Where:

NAMOT_to_ffAMBER_in_GROMACS.script is the name of the script

FILENAME.pdb is the .pdb file (include the .pdb)

NN is the number of bases in each strand. This number is required in order to correctly change the atom types on the 3' end of each strand.

This script is downloadable form the following link: NAMOT_to_ffAMBER_in_GROMACS.script

I also include a 35-base C-G double helix NAMOT .pdb file at C_G_NAMOT.pdb. To test the script on your machine, type the following in a Terminal window:

./NAMOT_to_ffAMBER_in_GROMACS.script C_G_NAMOT.pdb 35

As usual, if you have problems, comments, questions, concerns, etc. please either make an account and post a comment for this post or send me an email and I'll keep the running tally.

C_Gpdb_QuteMolX_image_may2008

Also, this same scripting procedure works just fine for the GROMOS96 force fields (ffG53a5, ffG53a6, etc.) and I'll be posting the one I use for those calculations in short order (they are, in fact, easier to work with for GROMACS, as they also neglect the methyl group hydrogen atoms on the Thymine. In fact, they neglect ALL of the non-polar C-H bonds, so you end up deleting atoms from the NAMOT/NAMOT2 .pdb files).

################################################################################
#
# Questions?  Problems?  Complaints?  Better Ideas?
# Damian Allis, damian@somewhereville.com, www.somewhereville.com
#
# This script takes the double helix output from NAMOT and NAMOT2 (a and b
# strands) and converts them into a format that the current ffAMBER
# implementation for GROMACS can use in the generation of the GROMACS .top file.
#
################################################################################
#
# Generally, the following list of GROMACS runs should get you through an
# energy minimization without problem.  Note only 10 cations are added
# to your structure.  Change accordingly (or don't.  It doesn't matter for
# the test).
#
# Run these in order:
#
# pdb2gmx -nomerge -f DNA.pdb -o DNA_pdb2gmx.gro -p DNA_pdb2gmx.top
# editconf -f DNA_pdb2gmx.gro -o DNA_editconf.gro -d 1.0 -bt triclinic
# genbox -cp DNA_editconf.gro -cs -o DNA_genbox.gro -p DNA_pdb2gmx.top
# grompp -f em -c DNA_genion.gro -p DNA_pdb2gmx.top -o DNA_grompp2em.tpr
# genion -np 10 -norandom -pname Na -o DNA_genion.gro -s DNA_gromppem.tpr
#   -p DNA_pdb2gmx.top (this .top goes in the same line as the genion)
# grompp -f em -c DNA_genion.gro -p DNA_pdb2gmx.top -o DNA_grompp2em.tpr2
# mdrun -s DNA_grompp2em.tpr -o DNA_md_em.trr -c DNA_md_em.pdb -v
#
################################################################################
#
# In case you don't have one handy, here's the contents of an em.mpd file
# for use in the energy minimization test.
#
# Copy this content below, remove the "#", save as a text filed named
# -> em.mpd
#
# cpp                 =  /usr/bin/cpp
# define              =  -DFLEXIBLE
# integrator          =  steep
# nsteps              =  5000
# emtol               =  10.0
# emstep              =  0.01
# nstcgsteep          =  100
# coulombtype         = PME
# rvdw                = 1.0
# rlist               = 1.1
# rcoulomb            = 1.1
# pme_order           = 4
# ewald_rtol          = 1e-5
# vdwtype             = shift
# ns_type             = grid
# nstlist             = 10
#
################################################################################
#
# Here's the command line:
#
# ./NAMOT_to_ffAMBER_in_GROMACS.sed $1 $2
#
# $1 = file name (including the .pdb, as I often forget to not include it)
# $2 = number of the 3' base for conversion into Dn3 (n = A,T,G,C)
# the number in $2 will automatically do the 3' and 5' conversion (keep the
# terminal hydrogens on the PO4- groups)
#
################################################################################
################################################################################
#
# The magic happens below.
#
################################################################################
################################################################################
#
# First thing first, make a backup of the original pdb file in case you goof.
#
cp $1 $1_original
#
################################################################################
#
# This section converts all of the "*" with "z" so that you're not using the
# asterisk during the editing.  Replacing with the ffAMBER-requisite
# "single-quote" (') makes the sed script more complicated than it needs to be.
#
sed 's/*/z/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section changes the nitrogen hydrogen (NH2) labels to those expected by
# ffMABER.  Hn2/1, where n is the atom number in the formal labeling scheme.
#
# THYIME is its own problem, as the methyl carbon needs modification.
# This is addressed by a separate ffAMBER modification.  Check
# http://www.somewhereville.com/?cat=74 (my AMBER category) for details.
#
#
# HN2A/B occurs in GUANINE.
#
sed 's/HN2A/ H21/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN2B/ H22/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
# HN4A/B occurs in CYTOSINE.
#
sed 's/HN4A/ H41/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN4B/ H42/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
# HN6A/B occurs in ADENINE.
#
sed 's/HN6A/ H61/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN6B/ H62/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section converts ADE, CYT, GUA, THY in the NAMOT output to DA, DC, DG, DT
# in accord with the topology labels used by ffAMBER in GROMACS for the nucleic
# acids.
#
#
sed 's/ADE / DA /' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/CYT / DC /' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/GUA / DG /' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/THY / DT /' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This is the bulk of the conversion, moving atoms around and formatting.
# Mostly, just moving atom labels over one column to the left.  This doesn't
# necessarily have to be done, but conforms to pdb format better and some
# program may need the atom labels in the columns as defined below.
#
# This section changes all of the hydrogen atom labels.
#
sed 's/1H5z/H5z1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/2H5z/H5z2/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/H2Az/H2z1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/H2Bz/H2z2/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# These are global changes in column position and fix all nucleic acids.
#
sed 's/  P  D/P    D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ N9  D/N9   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ N7  D/N7   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ N6  D/N6   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ N3  D/N3   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ N1  D/N1   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ C8  D/C8   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ C6  D/C6   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ C5  D/C5   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ C4  D/C4   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ C2  D/C2   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ H8  D/H8   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ H2  D/H2   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ H3  D/H3   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ H6  D/H6   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ O3  D/O3   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ O4  D/O4   D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ O2  D/O    D/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/C5M  D/C7   D/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section converts the 5' end of both chains into ffAMBER format.  Always
# begins with "1".  If you deleted some of the double strand at the 5' end and
# the first base number is NOT 1, this script will still run but give you
# a final structure that will require additional modification before running
# the pdb2gmx topology generator.
#
################################################################################
#
# chain a ADENINE adjustment
#
sed 's/  DA a   1/ DA5 a   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DA5/H5T DA5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DA5/O5z DA5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DA/H3T DA3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DA/O3z DA3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# chain b ADENINE adjustment
#
sed 's/  DA b   1/ DA5 b   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DA5/H5T DA5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DA5/O5z DA5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DA/H3T DA3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DA/O3z DA3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# chain a THYMINE adjustment
#
sed 's/  DT a   1/ DT5 a   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DT5/H5T DT5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DT5/O5z DT5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DT/H3T DT3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DT/O3z DT3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/  DG a   1/ DG5 a   1/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# chain b THYMINE adjustment
#
sed 's/  DT b   1/ DT5 b   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DT5/H5T DT5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DT5/O5z DT5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DT/H3T DT3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DT/O3z DT3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# chain a GUANINE adjustment
#
sed 's/ HB DG5/H5T DG5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DG5/O5z DG5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DG/H3T DG3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DG/O3z DG3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# chain b GUANINE adjustment
#
sed 's/  DG b   1/ DG5 b   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DG5/H5T DG5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DG5/O5z DG5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE DG3/H3T DG3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T DG3/O3z DG3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# chain a CYTOSINE adjustment
#
sed 's/  DC a   1/ DC5 a   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DC5/H5T DC5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DC5/O5z DC5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE  DC/H3T DC3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T  DC/O3z DC3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# chain b CYTOSINE adjustment
#
sed 's/  DC b   1/ DC5 b   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HB DC5/H5T DC5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O5T DC5/O5z DC5/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ HE DC3/H3T DC3/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/O3T DC3/O3z DC3/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section changes the last base in each chain (a and b) from the default
# "Dn" to "Dn3" so that the topology generation gets the 3' end correct.
# Goes by units, tens, hun, thou and searches specifically for the pattern
# in question (taking care to follow the standard  format for base number.
#
# NOTE: We do the junction, crossover, etc. generation outside of NAMOT.
# Therefore, each file output by NAMOT only has chain "a" and chain "b".
#
################################################################################
#
# changes the 3' strand if the length is from 1 to 9 (units)
# strand 1/a
#
sed 's/ DA a   '$2'/DA3 a   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC a   '$2'/DC3 a   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG a   '$2'/DG3 a   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT a   '$2'/DT3 a   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 1 to 9 (units)
# strand 2/b
#
sed 's/ DA b   '$2'/DA3 b  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC b   '$2'/DC3 b   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG b   '$2'/DG3 b   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT b   '$2'/DT3 b   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 10 to 99 (tens)
# strand 1/a
#
sed 's/ DA a  '$2'/DA3 a  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC a  '$2'/DC3 a  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG a  '$2'/DG3 a  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT a  '$2'/DT3 a  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 10 to 99 (tens)
# strand 2/b
#
sed 's/ DA b  '$2'/DA3 b  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC b  '$2'/DC3 b  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG b  '$2'/DG3 b  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT b  '$2'/DT3 b  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 100 to 999 (hund)
# strand 1/a
#
sed 's/ DA a '$2'/DA3 a '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC a '$2'/DC3 a '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG a '$2'/DG3 a '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT a '$2'/DT3 a '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 100 to 999 (hund)
# strand 2/b
#
sed 's/ DA b '$2'/DA3 b '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC b '$2'/DC3 b '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG b '$2'/DG3 b '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT b '$2'/DT3 b '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 1000 to 9999 (thou)
# strand 1/a
#
sed 's/ DA a'$2'/DA3 a'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC a'$2'/DC3 a'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG a'$2'/DG3 a'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT a'$2'/DT3 a'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 1000 to 9999 (thou)
# strand 2/b
#
sed 's/ DA b'$2'/DA3 b'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DC b'$2'/DC3 b'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DG b'$2'/DG3 b'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/ DT b'$2'/DT3 b'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# Home stretch.  Changes all of the "z" atoms in the pdb file to ' (single-
# quotes) for ffMABER.
#
sed s/\z/\'/g $1 > $1_temp
rm $1
mv $1_temp $1_proper_pdb
#
#
################################################################################
#
# Questions?  Problems?  Complaints?  Better Ideas?
# Damian Allis, damian@somewhereville.com, www.somewhereville.com
#
################################################################################

www.somewhereville.com/?p=114
en.wikipedia.org/wiki/DNA
www.gromacs.org
chemistry.csulb.edu/ffamber
amber.scripps.edu
namot.lanl.gov
en.wikipedia.org/wiki/Thymine
en.wikipedia.org/wiki/Sed
en.wikipedia.org/wiki/Linux
en.wikipedia.org/wiki/Unix
www.apple.com/macosx