The Binding Of Vitamin B12 To Transcobalamin(II); Structural Considerations For Bioconjugate Design – A Molecular Dynamics Study

In press, in the journal Molecular Biosystems. A first official foray into molecular dynamics-only (MD-only) computational work and I am pleased to report that the computational results not only make sense with respect to the experimental results, they also indicate a possible new way to use vitamin B12 for the oral delivery of bio-active molecules more complicated than the binary bioconjugates considered to date.

The Interesting Result

The conclusion from the previous study was that the insulin B Chain (figure below) acts as a tether to separate the structured region of insulin (the region with the largest inflexible steric bulk, see below) from the region of the transcobalamin II (TCII) that bind vitamin B12. It was then determined that the approach employed for the B12-insulin bioconjugate, simply linking one biomolecule onto another with known binding and transport properties (this is a common theme in all bioconjugate design), worked because the last 10 residues in the insulin B Chain (B22 to B30) are flexible in solution (they, in fact, cover the insulin binding region in the crystal form, then uncover this region in the biologically active form).

As a general procedure for B12 bioconjugate design, one of the key requirements for a functional product is a tether length that provides sufficient separation between B12 and any molecular structure large enough to affect B12 binding within its transport proteins (makes sense, as a tethered structure that does not enable B12 binding in its transport proteins will find the B12 bioconjugate delivered to the gut where acids and digestive enzymes will hide the failed binding). This leads to the question, “How long must a tether be to meet this rather general criterion?” This is, partly, the correct question, as the retention of B12 binding within its transport proteins is a function of both proper tether length and [transport protein]-[“other molecule”] interaction (in this first case, “other molecule” = insulin).

Saving the exhaustive analysis for the paper, this new study used this flexible region of human insulin (that is, B22 to B30, with the B12 linkage occurring on the B29 lysine side chain) as a proxy for any arbitrary tether, then used MD simulations to consider how the flexibility of this tether might lead to changes in B12 binding within its TCII pocket (the transport protein for which we have the best crystal structure). The result of these simulations was the identification of the side chain of lysine itself being just long enough to separate the B Chain tether region from the TCII protein surface. This does not mean that lysine will always serve as a perfect linkage. This means that, if the tether structure is effectively non-interacting with TCII (so not sterically demanding by itself), the lysine side chain is long enough to span the solvent-accessible hole produced by the encapsulation of B12 in (in this case) TCII.

The result is a design constraint when using lysine that is quite fortuitous! If the target peptide (insulin or whatnot) has a surface-accessible lysine side chain within a region that is flexible in solution, some simple amide chemistry may produce a viable B12 bioconjugate for delivering that peptide orally (thereby avoiding complete peptide degradation in the G.I. tract).

The More Interesting Result

Buried deep within the bottom of the Discussion section. If you watch the dynamics simulation of the TCII-[B12-tether] complex (shown below for a 300 K 50 ns simulation with 1.5 fs time steps in 14,000 waters (not shown)), you see that the binding of B12 within TCII and the geometry of the encapsulation complex are strongly linked. That is, TCII (and, presumably, its cohorts in the B12 transport pathway) can be thought of as two quite rigid fragments (Red and Blue in the animation) connected by a long tether (Green) that are separated in solution but brought into contact by the binding of vitamin B12 (Gold). The B12 is a glue that holds the fragments together, and a simple tabulation of hydrogen-bonding interactions in the crystal structure reveal that the B12 has more interactions to the A and B fragments of TCII individually than A and B have with each other (which is to say, the B12-A Segment interaction and B12-B Segment interaction are stronger than the A-B Segment interaction). From a biological perspective, this should make perfect sense. B12 is a large, extremely important biomolecule that, since we do not make it ourselves, is to be captured and transported as effectively as possible. The best way to bind this molecule is not to wait for it to burrow into a binding pocket, but rather to encapsulate it in a “clam shell” maneuver that provides “maximum embedding.” The tether between the A and B Segments technically would not have to be present if the A and B fragments were present in large quantities (although, as you might expect, the A-B tether does considerably reduce the time to complete encapsulation by forcing these fragments within close proximity).

According to the crystal structure, the B12 is entirely embedded within TCII, with only the solvent-accessible hole at the 5′-ribose position readily accessible for bioconjugate formation. If the overall structure were as rigid as a crystal structure might lead one to believe, functionalization at the cobalt position in the corrin ring would be out of the question.

As I just stated that such a binding mode would otherwise be unlikely, you can guess that there are B12 bioconjugates linked at the cobalt ring that are bio-active.

If you watch the dynamics simulation of the TCII-[B12-tether] complex, you see that the clam shell binding mode of TCII is one with a “loose hinge.” This loose hinge is really a result of the flexibility of the two protein fragments (typical protein motion) and flexibility in the short propionamide side chains of vitamin B12 that provide a bit of “spring” in the complete complex. In effect, the flexibility within the structure provides a means for cobalt to be coordinated to something without loss of B12 binding provided that the tether linking the cobalt and the “other” molecule is small enough that it does not require a large change in the A-B binding arrangement (that is, does not affect B12-A and B12-B binding).

And Then There Were Three…

The expectation/prediction/untested hypothesis is that vitamin B12 may be able to happily accommodate two additional molecules at the 5’-ribose and cobalt positions (properly designed) that then provide for the transport of two molecules and/or the delivery of three molecules (one being vitamin B12). This opens the door to a wealth of possibilities, from trinary delivery to combined drug delivery + radiopharma characterization. This is the possibility I’m most interested in pursuing in the next rounds of calculations, with the theory (presumably) providing a very good initial guess about the ideal tether designs to use with B12 for enabling delivery and bio-activity.

And Now For The Hard Work

Stepping back from the theoretical analysis for a moment, the most difficult obstacles to overcome in this study were the generation AND incorporation of force field parameters for vitamin B12 and a B12-Lysine mini-bioconjugate into GROMACS, a problem that I’ve addressed only in passing in several previous posts. What I won’t do in this post is explain the procedure (a single blog post will not do the procedure justice given the complexity of force field parameter generation). What I will do is provide the files for the topology for these systems and a short list of the modifications one needs to make in order to get these systems working. For additional reference, the same topology files are provided in the Supplemental Material for the paper (so, if you find yourself using these, obviously cite the paper and not my humble blog).

Files And Contents:

These are not files to be placed in a single directory, but are segments of file that are going to be placed directly into pre-existing topology files. This is not the best way to do it but is the procedure I began with and will not be changing without finding a very simple tutorial on how-to (which, if you have, I’d be happy to read).

The contents of the topology file (which I assume for you will be ffG53a6 but should work generally) are provided below:

ffG53a6_B12_BCN_LYB_LCB_topology.txt

The topology specifications for vitamin B12 (nothing bound to the cobalt in the corrin ring), cyanocobalamin (CN-B12, with a cyanide bound to the cobalt), B12 with a lysine residue attached to the 5’-ribose hydroxyl position (the tether linkage for the GROMACS prep programs), and CN-B12 with a lysine residue attached to the 5’-ribose hydroxyl position.

I am assuming that you’re using the ffG53a6 force field, meaning you add the topology sets to the bottom of the ffG53a6.rtp file.

GROMACS Modifications:

GROMACS force field and topology files must be modified slightly in order to read the topologies generated above and, depending on where you got the B12 structure, add/correct the hydrogen atoms in the B12 molecule.

In a typical UNIX/Linux installation (which I have provided compilation instructions for in a previous post), the files to be modified can be found in /usr/local/gromacs. And, if you’re using Ubuntu like I am, you’ll need to “sudo” these modifications.

1. aminoacids.dat

If you open this file, you see a list of three- and four-letter codes in the format:

50
ABU
ACE
...
VAL
PGLU

The “50” refers to the number of codes. As we’re going to be adding the codes B12, BCN, LYB, and LCB into GROMACS, we first change 50 to 54, then just list the four codes at the bottom of the file:

54
ABU
ACE
...
VAL
PGLU
B12
BCN
LYB
LCB

You’ll note that B12 and BCN aren’t like the others, LYB is not LYS, and LCB is also nowhere to be seen. The codes in this file are STANDARD and make sure you don’t inadvertently name your inserted structure one of the structures in the list.

2. ffG53a6.hdb

I specifically used the ffG53a6 force field for the TCII-B12 work, meaning I only made modifications to these force field files. The ffG53a6.hdb file is responsible for adding/correcting hydrogen atoms in your structure (just because the crystallographers do not see them does not mean they aren’t there) and contains hydrogen-beautification information for all of the three/four-letter codes recognized in aminoacids.dat. The content below is the hydrogen-correcting data for the B12, BCN, LYB, and LCB structures. Simply paste this into the bottom of the ffG53a6.hdb file.

B12     19
1    2    HAO    N62    C61    O63
1    2    HAN    N62    C61    C60
1    2    HAM    N52    C50    O51
1    2    HAL    N52    C50    C49
1    2    HAK    N45    C43    O44
1    2    HAJ    N45    C43    C42
1    2    HAI    N40    C38    O39
1    2    HAH    N40    C38    C37
1    2    HAE    N29    C27    O28
1    2    HAD    N29    C27    C26
1    2    HAG    N33    C32    O34
1    2    HAF    N33    C32    C31
1    2    HAA    O7R    C2R    C1R
1    2    HAB    O8R    C5R    C4R
1    2    HAC    N59    C57    O58
1    1    H2B    C2B    N1B    N3B 
1    1    H4B    C4B    C5B    C9B
1    1    H7B    C7B    C8B    C6B
1    1    H10    C10    C9     C11
LYB     20       
1    1    H      N      -C     CA
1    4    HZ1    NZ     CE     CD
1    2    HAO    N62    C61    O63
1    2    HAN    N62    C61    C60
1    2    HAM    N52    C50    O51
1    2    HAL    N52    C50    C49
1    2    HAK    N45    C43    O44
1    2    HAJ    N45    C43    C42
1    2    HAI    N40    C38    O39
1    2    HAH    N40    C38    C37
1    2    HAE    N29    C27    O28
1    2    HAD    N29    C27    C26
1    2    HAG    N33    C32    O34
1    2    HAF    N33    C32    C31
1    2    HAA    O7R    C2R    C1R
1    2    HAC    N59    C57    O58
1    1    H2B    C2B    N1B    N3B
1    1    H4B    C4B    C5B    C9B
1    1    H7B    C7B    C8B    C6B
1    1    H10    C10    C9     C11
BCN     19
1    2    HAO    N62    C61    O63
1    2    HAN    N62    C61    C60
1    2    HAM    N52    C50    O51
1    2    HAL    N52    C50    C49
1    2    HAK    N45    C43    O44
1    2    HAJ    N45    C43    C42
1    2    HAI    N40    C38    O39
1    2    HAH    N40    C38    C37
1    2    HAE    N29    C27    O28
1    2    HAD    N29    C27    C26
1    2    HAG    N33    C32    O34
1    2    HAF    N33    C32    C31
1    2    HAA    O7R    C2R    C1R
1    2    HAB    O8R    C5R    C4R
1    2    HAC    N59    C57    O58
1    1    H2B    C2B    N1B    N3B 
1    1    H4B    C4B    C5B    C9B
1    1    H7B    C7B    C8B    C6B
1    1    H10    C10    C9     C11
LCB     20       
1    1    H      N      -C     CA
1    4    HZ1    NZ     CE     CD
1    2    HAO    N62    C61    O63
1    2    HAN    N62    C61    C60
1    2    HAM    N52    C50    O51
1    2    HAL    N52    C50    C49
1    2    HAK    N45    C43    O44
1    2    HAJ    N45    C43    C42
1    2    HAI    N40    C38    O39
1    2    HAH    N40    C38    C37
1    2    HAE    N29    C27    O28
1    2    HAD    N29    C27    C26
1    2    HAG    N33    C32    O34
1    2    HAF    N33    C32    C31
1    2    HAA    O7R    C2R    C1R
1    2    HAC    N59    C57    O58
1    1    H2B    C2B    N1B    N3B
1    1    H4B    C4B    C5B    C9B
1    1    H7B    C7B    C8B    C6B
1    1    H10    C10    C9     C11

As brief explanation, the three-letter code is followed by the number of Hydrogen atoms that are to be added. Each line can be read:

First Column – The number of hydrogen atoms added (so all of these entries on the far left mean “add ONE hydrogen”)

Second Column – The manner by which the hydrogen atom is to be added (this is listed in section 5.5 of the GROMACS 3.3 Manual (page 93))

Third Column – The name of the Hydrogen atom to be added

Fourth Column – The atom to which the H is going to be directly linked in the topology file

Fifth – Seventh Columns
– atoms that define how the Hydrogen is added with respect to (1) the code in Column 2 and (2) the atom to which the Hydrogen is added.

3. ffG53a6bon.itp

There are a few subtle tweaks to the force constants for a few bonds that I perform here right within the file and that proper MD people likely would scream at. I note that, when you do this, you are making changes to numbers that will affect the results if you somehow start doing heme MD simulations.

Change the gb_NN values to those provided below.

#define gb_34        0.198  0.6400e+06
; NR  -   FE    120
#define gb_4         0.1142  3.7000e+07
; C - O (CO in heme)  2220
#define gb_14       0.1340  1.1000e+07
; C  -  NR (heme)       1000
#define gb_30       0.1880  2.7200e+06
; FE  -  C (Heme)

You will note that I have not done anything to make cobalt appear in the topology or force field files. For the sake of running a simulation, Fe and Co are close enough that simply replacing CO for FE in the PDB file is sufficient. You can do the completely proper job of adding cobalt to the force field to get the mass right.

And that is the bare basics for getting a run to happen. A proper tutorial on how to generate force field parameters and topologies may be forthcoming, depending largely on interest and my ability to find time to do it.

Article citation: Damian G. Allis, Mol. BioSyst., 2010, DOI: 10.1039/c003476b

Damian G. Allis1, Timothy J. Fairchild2 and Robert P. Doyle1

1. Department of Chemistry, Syracuse University, Syracuse, NY 13244, USA
2. School of Chiropractic and Sports Science, Murdoch University, Murdoch, WA 6150, Australia

As part of ongoing research into the use of vitamin B12 (B12; cobalamin; Cbl)-based bioconjugate approaches for the oral delivery of peptides/proteins, a molecular dynamics (MD) study of the binding of a cyanocobalamin–insulin (CN–Cbl–insulin) conjugate to human transcobalamin(II) (TCII) was recently reported that provides a qualitative picture of how the human insulin protein in its open T-state geometry affects CN–Cbl binding to TCII. This initial analysis revealed that the B22–B30 segment of the insulin B-chain acts as a long tether that connects the larger combined insulin A/B region to CN–Cbl when this conjugation is performed at the CN–Cbl ribose 5-hydroxy position. The experimental support for this model of the binding interaction is provided by the consequences of the successful delivery of the CN–Cbl–insulin conjugate in the production of significantly decreased blood glucose levels in diabetic STZ-rat models. In efforts to provide a more detailed description of the (CN–Cbl)–TCII complex for modeling Cbl-based bioconjugate designs, the (CN–Cbl)–TCII system and a CN–Cbl conjugate incorporating a flexible tether composed of only the B22–B30 segment of human insulin have been examined by MD simulations. The implications of these simulations are discussed in terms of successful conjugate positioning on Cbl, especially when such sites are not apparent from the diffraction studies alone, and the possibilities, as yet not reported, for dual-tethered Cbl bioconjugates for multi-component drug delivery applications.