home

sed-Based Script For Converting NAMOT And NAMOT2 DNA Output To GROMOS96 Format For GROMACS Topology Generation v1

NOTE: This script works with additions to the ffG53a5/6.rtp (residue topology) files. This information is available at Modifications To The ffG53a6.rtp And ffG53a5.rtp Residue Topology Files Required For Using GROMOS96-NAMOT-GROMACS v1.

The script below is the precursor to the ffAMBER/NAMOT/GROMACS script posted previously. This script takes the output of a NAMOT or NAMOT2 DNA structure generation (.pdb) and does all of the atom label and atom label position conversions, correct 3′and 5′terminal H atom assignments, and makes changes throughout the .pdb file to provide something that should flow seamlessly into the GROMACS pdb2gmx .top generator for the GROMOS96 force field.

To reiterate a previous point: Did you need to post the entire script and not just provide the downloadable text file as a link? Of course, as I suspect no small number of people looking for how to convert a NAMOT pdb file into GROMOS96-speak will begin by searching based on GROMACS errors, which occur one missing residue label at a time. Hopefully, having the entire script readable by google and yahoo will cause it to pop up high in the search ranking.

The problem in the ffAMBER script with thymine (the lack of methyl hydrogens in the NAMOT and NAMOT2 pdb output) isn’t a problem for GROMOS96, as these hydrogen atoms (and all non-polar (C-H) hydrogen atoms) are subsumed into their associated carbons. That is, only DNA O-H, N-H, and pi-system C-H hydrogens are considered in the GROMOS96 force field.

How to use:

As a series of sed operations, you obviously need sed, which is available for all platforms and “pre-installed” with any self-respecting Linux/UNIX distro (which, of course, means OSX (the OS under which the script was generated).

To run this script, have the script and your NAMOT/NAMOT2-generated .pdb in the same directory and type:

./NAMOT_to_GROMOS96_in_GROMACS.script FILENAME.pdb NN

Where:

NAMOT_to_GROMOS96_in_GROMACS.script is the name of the script

FILENAME.pdb is the .pdb file (include the .pdb)

NN is the number of bases in each strand. This number is required in order to correctly change the atom types on the 3′ end of each strand.

This script is downloadable form the following link: NAMOT_to_GROMOS96_in_GROMACS.script

I also include a 35-base C-G double helix NAMOT .pdb file at C_G_NAMOT.pdb. To test the script on your machine, type the following in a Terminal window:

./NAMOT_to_GROMOS96_in_GROMACS.script C_G_NAMOT.pdb 35

As usual, if you have problems, comments, questions, concerns, etc. please either make an account and post a comment for this post or send me an email and I’ll keep the running tally.

##############################################################################
#
# Questions?  Problems?  Complaints?  Better Ideas?
# Damian Allis, damian@somewhereville.com, www.somewhereville.com
#
# This script takes the double helix output from NAMOT and NAMOT2 (a and b
# strands) and converts them into a format that the current GROMOS96 (53a6/5)
# force field in GROMACS can use in the generation of the GROMACS .top file.
#
################################################################################
#
# Generally, the following list of GROMACS runs should get you through an
# energy minimization without problem.  Note only 10 cations are added
# to your structure.  Change accordingly (or don't.  It doesn't matter for
# the test).
#
# Run these in order:
#
# pdb2gmx -nomerge -f DNA.pdb -o DNA_pdb2gmx.gro -p DNA_pdb2gmx.top
# editconf -f DNA_pdb2gmx.gro -o DNA_editconf.gro -d 1.0 -bt triclinic
# genbox -cp DNA_editconf.gro -cs -o DNA_genbox.gro -p DNA_pdb2gmx.top
# grompp -f em -c DNA_genion.gro -p DNA_pdb2gmx.top -o DNA_grompp2em.tpr
# genion -np 10 -norandom -pname Na -o DNA_genion.gro -s DNA_gromppem.tpr
#   -p DNA_pdb2gmx.top (this .top goes in the same line as the genion)
# grompp -f em -c DNA_genion.gro -p DNA_pdb2gmx.top -o DNA_grompp2em.tpr2
# mdrun -s DNA_grompp2em.tpr -o DNA_md_em.trr -c DNA_md_em.pdb -v
#
################################################################################
#
# In case you don't have one handy, here's the contents of an em.mpd file
# for use in the energy minimization test.
#
# Copy this content below, remove the "#", save as a text filed named
# -> em.mpd
#
# cpp                 =  /usr/bin/cpp
# define              =  -DFLEXIBLE
# integrator          =  steep
# nsteps              =  5000
# emtol               =  10.0
# emstep              =  0.01
# nstcgsteep          =  100
# coulombtype         = PME
# rvdw                = 1.0
# rlist               = 1.1
# rcoulomb            = 1.1
# pme_order           = 4
# ewald_rtol          = 1e-5
# vdwtype             = shift
# ns_type             = grid
# nstlist             = 10
#
################################################################################
#
# Here's the command line:
#
# ./NAMOT_to_ffAMBER_in_GROMACS.sed $1 $2
#
# $1 = file name (including the .pdb, as I often forget to not include it)
# $2 = number of the 3' base for conversion into Dn3 (n = A,T,G,C)
# the number in $2 will automatically do the 3' and 5' conversion (keep the
# terminal hydrogens on the PO4- groups)
#
################################################################################
################################################################################
#
# The magic happens below.
#
################################################################################
################################################################################
#
# First thing first, make a backup of the original pdb file in case you goof.
#
cp $1 $1_original
#
################################################################################
#
# This section converts all of the "*" with "z" so that you're not using the
# asterisk during the editing.  Replacing with the ffAMBER-requisite
# "single-quote" (') makes the sed script more complicated than it needs to be.
#
sed 's/*/z/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# The section below deletes all of the non-polar CH2 and CH hydrogen atoms from
# the nucleic acids. In GROMACS, these are mass-subsumed into the carbon atom,
# so are ignored in the toplogy.
#
#
# deletes from ADE
#
sed '/H2Az ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/H2Bz ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/1H5z ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/2H5z ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H1z ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H3z ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H4z ADE/d' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done deleting from ADE
#
# deletes from CYT
#
sed '/H2Az CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/H2Bz CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/1H5z CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/2H5z CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H1z CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H3z CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H4z CYT/d' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done deleting from CYT
#
# deletes from GUA
#
sed '/H2Az GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/H2Bz GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/1H5z GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/2H5z GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H1z GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H3z GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H4z GUA/d' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done deleting from GUA
#
# deletes from THY
#
sed '/H2Az THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/H2Bz THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/1H5z THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/2H5z THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H1z THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H3z THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
sed '/ H4z THY/d' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done deleting from THY
#
################################################################################
#
# This section changes the two nitrogen hydrogen labels to those expected by
# GROMACS.  Hn2/1, where n is the atom number in the formal labeling scheme.
#
#
# renames the NH2 hydrogen atoms for DADE
#
sed 's/HN6A/ H61/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN6B/ H62/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done renaming the NH2 hydrogen atoms for DADE
#
# renames the NH2 hydrogen atoms for DCYT
#
sed 's/HN4A/ H41/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN4B/ H42/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done renaming the NH2 hydrogen atoms for DCYT
#
# renames the NH2 hydrogen atoms for DGUA
#
sed 's/HN2A/ H21/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/HN2B/ H22/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# done renaming the NH2 hydrogen atoms for DGUA
#
################################################################################
#
# This section renames the O3' and O5' oxygen atoms from the NAMOT output
# (O5T to O5* and O3T to O3*) to a PDB format so that the terminal H atoms
# can be added on with the T/F_NA topologies.
#
#
sed 's/O5T/O5z/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
sed 's/O3T/O3z/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section converts ADE, CYT, GUA, THY to DADE, DCYT, DGUA, DTHY in accord
# with the topology labels used by GROMACS for the nucleic acids.
#
#
sed 's/ADE /DADE/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/CYT /DCYT/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/GUA /DGUA/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/THY /DTHY/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section moves the column 23 chain labels to column 22 so that grompp
# can generate unique topology files for each unique chain.  As long as only
# lower-case letters are used for the chain labels (and this is the NAMOT
# default) the below moves everything and not any single-atom labels (which
# are generally all uppercase).
#
#
sed 's/ [a-z] /[a-z]  /' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# This section only replaces the FIRST nucleic acid for each chain with the F
# (5' end) labels for use with the F_NA topology.
#
# Yes, NAMOT starts its structures at the 5' end.
#
#
sed 's/DADEa   1/FADEa   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTa   1/FCYTa   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAa   1/FGUAa   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYa   1/FTHYa   1/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
sed 's/DADEb   1/FADEb   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTb   1/FCYTb   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAb   1/FGUAb   1/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYb   1/FTHYb   1/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
################################################################################
#
# This section changes the last base in the series to a "T" from the default
# "D" so that the topology corrects the 3' end.  Goes by units, tens, hun, thou
# and searches specifically for the pattern in question (taking care to follow
# the standard  format for base number.
#
################################################################################
#
# changes the 3' strand if the length is from 1 to 9 (units)
# strand 1/a
#
sed 's/DADEa   '$2'/TADEa   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTa   '$2'/TCYTa   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAa   '$2'/TGUAa   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYa   '$2'/TTHYa   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 1 to 9 (units)
# strand 2/b
#
sed 's/DADEb   '$2'/TADEb   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTb   '$2'/TCYTb   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAb   '$2'/TGUAb   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYb   '$2'/TTHYb   '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 10 to 99 (tens)
# strand 1/a
#
sed 's/DADEa  '$2'/TADEa  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTa  '$2'/TCYTa  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAa  '$2'/TGUAa  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYa  '$2'/TTHYa  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 10 to 99 (tens)
# strand 2/b
#
sed 's/DADEb  '$2'/TADEb  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTb  '$2'/TCYTb  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAb  '$2'/TGUAb  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYb  '$2'/TTHYb  '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 100 to 999 (hund)
# strand 1/a
#
sed 's/DADEa '$2'/TADEa '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTa '$2'/TCYTa '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAa '$2'/TGUAa '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYa '$2'/TTHYa '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 100 to 999 (hund)
# strand 2/b
#
sed 's/DADEb '$2'/TADEb '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTb '$2'/TCYTb '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAb '$2'/TGUAb '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYb '$2'/TTHYb '$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
################################################################################
#
# changes the 3' strand if the length is from 1000 to 9999 (thou)
# strand 1/a
#
sed 's/DADEa'$2'/TADEa'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTa'$2'/TCYTa'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAa'$2'/TGUAa'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYa'$2'/TTHYa'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
# changes the 3' strand if the length is from 1000 to 9999 (thou)
# strand 2/b
#
sed 's/DADEb'$2'/TADEb'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DCYTb'$2'/TCYTb'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DGUAb'$2'/TGUAb'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
sed 's/DTHYb'$2'/TTHYb'$2'/' $1 > $1_temp
rm $1
mv $1_temp $1
#
#
################################################################################
#
# Home stretch.  Changes all of the "z" atoms in the pdb file to * for GROMACS.
#
sed 's/z/*/' $1 > $1_temp
rm $1
cp $1_temp $1
cp $1_temp $1_postscript
rm $1_temp
#
################################################################################
#
# Questions?  Problems?  Complaints?  Better Ideas?
# Damian Allis, damian@somewhereville.com, www.somewhereville.com
#
################################################################################

chemistry.csulb.edu/ffamber
namot.lanl.gov
www.gromacs.org
en.wikipedia.org/wiki/DNA
www.rcsb.org/pdb/home/home.do
en.wikipedia.org/wiki/GROMOS
en.wikipedia.org/wiki/Thymine
en.wikipedia.org/wiki/Sed
en.wikipedia.org/wiki/Linux
en.wikipedia.org/wiki/Unix
www.apple.com/macosx

Tags:

Leave a Reply

You must be logged in to post a comment.

Obligatory

  • CNYO

  • Sol. Sys. Amb.

  • Salt City Miners

  • Ubuntu 4 Nano

  • NMT Review

  • N-Fact. Collab.

  • T R P Nanosys

  • Nano Gallery

  • nano gallery
  • Aerial Photos

    More @ flickr.com

    Syracuse Scenes

    More @ flickr.com