Abstract
In recent times the discovery of several ncRNA (non coding RNA which does not code
for proteins but directly performs structural and catalytic functions) genes have drawn
progressively greater scientific attention to RNA studies and drawn renewed attention to
the hypothesis of “RNA world” where RNA potentially has served the dual role as both
the genetic material and as the principal biocatalyst.
The non covalent interactions, which are responsible for folding of highly flexible and
negatively charged sugar phosphate backbone of RNA molecules are far less understood,
compared to those involved in protein folding. These interactions are relatively more
complex than in regular DNA double helices, which are held together by canonical A:T
and G:C Watson-crick base pairing, which is suited for structure and function of DNA
molecules. In contrast, single stranded RNA molecules display a richer variety in base
pairing patterns, which govern their folding on themselves into complex dynamic
structures. Non canonical base pairs, which constitute ~ 40% of base pairing interactions
in RNA (Leontis et al, 2001) have been recognized for their role in molecular recognition
including proteins, ligands and metal ions and mediating numerous tertiary interactions.
(Westhof et al, 2000)
High concentration of charges has long since been identified as one of the major features
distinguishing nucleic acids from proteins, in terms of their structure, activity and folding
dynamics. Neutralization of the negatively charged backbone phosphate groups,
involving metal counter ions, has e. g., been shown as one of the primary events in RNA
folding in vitro.(Misra et al, 2003) However, there is now a growing acceptance of the
fact that, apart from phosphates and metal cations, the nucleobases themselves can
acquire charges through protonation. This development indicates a major change in
paradigm, particularly with respect to the role of metal cations in nucleic acid folding and catalysis. It may be noted in this context, that the role of nucleobase protonation in
nucleic acids goes much beyond general purpose neutralization – something which the
metal cations ubiquitously perform. Several studies have pointed towards more specific
roles, including the mediation of global conformational changes in DNA as well as in
RNA structures, whereby the protonation of nucleobases in specific macromolecular
contexts may be important for defining their catalytic roles and folding pathways.
In general, nucleobases do not get protonated at physiological pH, since the pKa values
for imino nitrogens are ~ 3.5 and 4.2 for adenine and cytosine respectively, and ~9.2 for
guanine and uracil. (Saenger, 1984) It is thus important to understand the basis of context
dependent pKa shift which leads to protonation of nucleobases in different cases. Based
on their occurrence contexts and functional features, protonated nucleobases identified in
known structures have been divided into two classes, (Bevilacqua et al, 2005) viz. Class
I, for which the loaded proton is sequestered in hydrogen bonding between paired bases,
and class II sites, where the proton is not directly involved in base pairing. Of these, we
have investigated Class I type of interactions. For the detection of these base pair types,
we have extended available methods (Das et al, 2006) of base pair detection for detecting
putative protonated base pairs and understand their contexts. We have also used advanced
quantum chemical methods, both to confirm their occurrence as well as to study their
geometries and stabilities. We have used a non redundant dataset of selected highresolution
RNA crystal structures (PDB ids given in Appendix) from Protein Data Bank
(PDB) to analyze the geometries and analyze the occurrence contexts of base pairs
belonging to protonated base pair. Out of twelve RNA base pair geometric families
described by Leontis-Westhof classification scheme (Leontis and Westhof, 2001), we
have identified eighteen possible protonated base pairs spanning eight base pair families.
In addition, we also report a putative base pair on the basis of modeling. Out of nineteen
distinct possibilities studied in this work, as many as six instances needed hypothesis
driven manual intervention for addressing ambiguities; which arouse due to variety of
factors , and which were resolved to different degrees using quantum chemical methods.
Out of nineteen distinct geometries studied in this work, nine base pairing geometries are
new, and not previously being detected in nucleic acid structures. These base pairs have
potential to nucleate higher order structures and participate in tertiary interactions joining distant regions in RNA structures. The interaction patterns of protonated base pairs are
highly diverse, with gas phase optimized interaction energies in the range from -24 to -49
Kcal/mol. The high interaction energy is generally observed for