- The RRM domain of poly(A)-specific ribonuclease has a noncanonical binding site for mRNA cap analog recognition (2008)
- The degradation of the poly(A) tail is crucial for posttranscriptional gene regulation and for quality control of mRNA. Poly(A)-specific ribonuclease (PARN) is one of the major mammalian 3’ specific exo-ribonucleases involved in the degradation of the mRNA poly(A) tail, and it is also involved in the regulation of translation in early embryonic development. The interaction between PARN and the m7GpppG cap of mRNA plays a key role in stimulating the rate of deadenylation. Here we report the solution structures of the cap-binding domain of mouse PARN with and without the m7GpppG cap analog. The structure of the cap-binding domain adopts the RNA recognition motif (RRM) with a characteristic a-helical extension at its C-terminus, which covers the b-sheet surface (hereafter referred to as PARN RRM). In the complex structure of PARN RRM with the cap analog, the base of the N7-methyl guanosine (m7G) of the cap analog stacks with the solvent-exposed aromatic side chain of the distinctive tryptophan residue 468, located at the C-terminal end of the second b-strand. These unique structural features in PARN RRM reveal a novel cap-binding mode, which is distinct from the nucleotide recognition mode of the canonical RRM domains.
- Structural basis for the sequence-specific RNA-recognition mechanism of human CUG-BP1 RRM3 (2009)
- The CUG-binding protein 1 (CUG-BP1) is a member of the CUG-BP1 and ETR-like factors (CELF) family or the Bruno-like family and is involved in the control of splicing, translation and mRNA degradation. Several target RNA sequences of CUG-BP1 have been predicted, such as the CUG triplet repeat, the GU-rich sequences and the AU-rich element of nuclear pre-mRNAs and/or cytoplasmic mRNA. CUG-BP1 has three RNA-recognition motifs (RRMs), among which the third RRM (RRM3) can bind to the target RNAs on its own. In this study, we solved the solution structure of the CUG-BP1 RRM3 by hetero-nuclear NMR spectroscopy. The CUG-BP1 RRM3 exhibited a noncanonical RRM fold, with the four-stranded b-sheet surface tightly associated with the N-terminal extension. Furthermore, we determined the solution structure of the CUG-BP1 RRM3 in the complex with (UG)3 RNA, and discovered that the UGU trinucleotide is specifically recognized through extensive stacking interactions and hydrogen bonds within the pocket formed by the b-sheet surface and the N-terminal extension. This study revealed the unique mechanism that enables the CUG-BP1 RRM3 to discriminate the short RNA segment from other sequences, thus providing the molecular basis for the comprehension of the role of the RRM3s in the CELF/Bruno-like family.
- Structural basis for the dual RNA-recognition modes of human Tra2-beta RRM (2010)
- Human Transformer2-beta (hTra2-beta) is an important member of the serine/arginine-rich protein family, and contains one RNA recognition motif (RRM). It controls the alternative splicing of several pre-mRNAs, including those of the calcitonin/calcitonin gene-related peptide (CGRP), the survival motor neuron 1 (SMN1) protein and the tau protein. Accordingly, the RRM of hTra2-beta specifically binds to two types of RNA sequences [the CAA and (GAA)2 sequences]. We determined the solution structure of the hTra2-beta RRM (spanning residues Asn110–Thr201), which not only has a canonical RRM fold, but also an unusual alignment of the aromatic amino acids on the beta-sheet surface. We then solved the complex structure of the hTra2-beta RRM with the (GAA)2 sequence, and found that the AGAA tetra-nucleotide was specifically recognized through hydrogen-bond formation with several amino acids on the N- and C-terminal extensions, as well as stacking interactions mediated by the unusually aligned aromatic rings on the beta-sheet surface. Further NMR experiments revealed that the hTra2-beta RRM recognizes the CAA sequence when it is integrated in the stem-loop structure. This study indicates that the hTra2-beta RRM recognizes two types of RNA sequences in different RNA binding modes.
- Effects of NMR spectral resolution on protein structure calculation (2013)
- Adequate digital resolution and signal sensitivity are two critical factors for protein structure determinations by solution NMR spectroscopy. The prime objective for obtaining high digital resolution is to resolve peak overlap, especially in NOESY spectra with thousands of signals where the signal analysis needs to be performed on a large scale. Achieving maximum digital resolution is usually limited by the practically available measurement time. We developed a method utilizing non-uniform sampling for balancing digital resolution and signal sensitivity, and performed a large-scale analysis of the effect of the digital resolution on the accuracy of the resulting protein structures. Structure calculations were performed as a function of digital resolution for about 400 proteins with molecular sizes ranging between 5 and 33 kDa. The structural accuracy was assessed by atomic coordinate RMSD values from the reference structures of the proteins. In addition, we monitored also the number of assigned NOESY cross peaks, the average signal sensitivity, and the chemical shift spectral overlap. We show that high resolution is equally important for proteins of every molecular size. The chemical shift spectral overlap depends strongly on the corresponding spectral digital resolution. Thus, knowing the extent of overlap can be a predictor of the resulting structural accuracy. Our results show that for every molecular size a minimal digital resolution, corresponding to the natural linewidth, needs to be achieved for obtaining the highest accuracy possible for the given protein size using state-of-the-art automated NOESY assignment and structure calculation methods.
- NMR solution structure of a chymotrypsin inhibitor from the Taiwan cobra Naja naja atra (2013)
- The Taiwan cobra (Naja naja atra) chymotrypsin inhibitor (NACI) consists of 57 amino acids and is related to other Kunitz-type inhibitors such as bovine pancreatic trypsin inhibitor (BPTI) and Bungarus fasciatus fraction IX (BF9), another chymotrypsin inhibitor. Here we present the solution structure of NACI. We determined the NMR structure of NACI with a root-mean-square deviation of 0.37 Å for the backbone atoms and 0.73 Å for the heavy atoms on the basis of 1,075 upper distance limits derived from NOE peaks measured in its NOESY spectra. To investigate the structural characteristics of NACI, we compared the three-dimensional structure of NACI with BPTI and BF9. The structure of the NACI protein comprises one 310-helix, one α-helix and one double-stranded antiparallel β-sheet, which is comparable with the secondary structures in BPTI and BF9. The RMSD value between the mean structures is 1.09 Å between NACI and BPTI and 1.27 Å between NACI and BF9. In addition to similar secondary and tertiary structure, NACI might possess similar types of protein conformational fluctuations as reported in BPTI, such as Cys14–Cys38 disulfide bond isomerization, based on line broadening of resonances from residues which are mainly confined to a region around the Cys14–Cys38 disulfide bond.
- Peak picking NMR spectral data using non-negative matrix factorization (2014)
- Background: Simple peak-picking algorithms, such as those based on lineshape fitting, perform well when peaks are completely resolved in multidimensional NMR spectra, but often produce wrong intensities and frequencies for overlapping peak clusters. For example, NOESY-type spectra have considerable overlaps leading to significant peak-picking intensity errors, which can result in erroneous structural restraints. Precise frequencies are critical for unambiguous resonance assignments. Results: To alleviate this problem, a more sophisticated peaks decomposition algorithm, based on non-negative matrix factorization (NMF), was developed. We produce peak shapes from Fourier-transformed NMR spectra. Apart from its main goal of deriving components from spectra and producing peak lists automatically, the NMF approach can also be applied if the positions of some peaks are known a priori, e.g. from consistently referenced spectral dimensions of other experiments. Conclusions: Application of the NMF algorithm to a three-dimensional peak list of the 23 kDa bi-domain section of the RcsD protein (RcsD-ABL-HPt, residues 688-890) as well as to synthetic HSQC data shows that peaks can be picked accurately also in spectral regions with strong overlap.
- Identification of residues required for stalled-ribosome rescue in the codon-independent release factor YaeJ (2013)
- The YaeJ protein is a codon-independent release factor with peptidyl-tRNA hydrolysis (PTH) activity, and functions as a stalled-ribosome rescue factor in Escherichia coli. To identify residues required for YaeJ function, we performed mutational analysis for in vitro PTH activity towards rescue of ribosomes stalled on a non-stop mRNA, and for ribosome-binding efficiency. We focused on residues conserved among bacterial YaeJ proteins. Additionally, we determined the solution structure of the GGQ domain of YaeJ from E. coli using nuclear magnetic resonance spectroscopy. YaeJ and a human homolog, ICT1, had similar levels of PTH activity, despite various differences in sequence and structure. While no YaeJ-specific residues important for PTH activity occur in the structured GGQ domain, Arg118, Leu119, Lys122, Lys129 and Arg132 in the following C-terminal extension were required for PTH activity. All of these residues are completely conserved among bacteria. The equivalent residues were also found in the C-terminal extension of ICT1, allowing an appropriate sequence alignment between YaeJ and ICT1 proteins from various species. Single amino acid substitutions for each of these residues significantly decreased ribosome-binding efficiency. These biochemical findings provide clues to understanding how YaeJ enters the A-site of stalled ribosomes.
- Objective identification of residue ranges for the superposition of protein structures (2011)
- Background: The automation of objectively selecting amino acid residue ranges for structure superpositions is important for meaningful and consistent protein structure analyses. So far there is no widely-used standard for choosing these residue ranges for experimentally determined protein structures, where the manual selection of residue ranges or the use of suboptimal criteria remain commonplace. Results: We present an automated and objective method for finding amino acid residue ranges for the superposition and analysis of protein structures, in particular for structure bundles resulting from NMR structure calculations. The method is implemented in an algorithm, CYRANGE, that yields, without protein-specific parameter adjustment, appropriate residue ranges in most commonly occurring situations, including low-precision structure bundles, multi-domain proteins, symmetric multimers, and protein complexes. Residue ranges are chosen to comprise as many residues of a protein domain that increasing their number would lead to a steep rise in the RMSD value. Residue ranges are determined by first clustering residues into domains based on the distance variance matrix, and then refining for each domain the initial choice of residues by excluding residues one by one until the relative decrease of the RMSD value becomes insignificant. A penalty for the opening of gaps favours contiguous residue ranges in order to obtain a result that is as simple as possible, but not simpler. Results are given for a set of 37 proteins and compared with those of commonly used protein structure validation packages. We also provide residue ranges for 6351 NMR structures in the Protein Data Bank. Conclusions: The CYRANGE method is capable of automatically determining residue ranges for the superposition of protein structure bundles for a large variety of protein structures. The method correctly identifies ordered regions. Global structure superpositions based on the CYRANGE residue ranges allow a clear presentation of the structure, and unnecessary small gaps within the selected ranges are absent. In the majority of cases, the residue ranges from CYRANGE contain fewer gaps and cover considerably larger parts of the sequence than those from other methods without significantly increasing the RMSD values. CYRANGE thus provides an objective and automatic method for standardizing the choice of residue ranges for the superposition of protein structures. Additional files Additional file 1: Dependence of Q on the order parameter rank. The quantity Qi is plotted against the order parameter rank i for 9 different protein structure bundles. Additional file 2: Dependence of P on the clustering stage. The quantity Pi is plotted against the clustering stage i for 9 different protein structure bundles. Additional file 3: Dependence of CYRANGE results on the minimal cluster size parameter my. The sequence coverage (red) and RMSD (blue) of the residue ranges determined by CYRANGE were plotted as a function of my for 9 different protein structure bundles. The dotted vertical line indicates the default value, my = 8. Where CYRANGE found two domains, the RMSD values of the individual domains are shown in light and dark blue. Additional file 4: Dependence of CYRANGE results on the domain boundary extension parameter m. See Additional File 3 for details. Additional file 5: Dependence of CYRANGE results on the minimal gap width g. See Additional File 3 for details. Additional file 6: Dependence of CYRANGE results on the relative RMSD decrease parameter delta. See Additional File 3 for details. Additional file 7: Dependence of CYRANGE results on the absolute RMSD decrease parameter delta abs. See Additional File 3 for details. Additional file 8: Dependence of CYRANGE results on the gap penalty parameter gamma. See Additional File 3 for details. Additional file 9: Correlation between the sequence coverage from CYRANGE, FindCore and PSVS, and the GDT total score, GDT_TS. Each data point represents a protein shown in Figures 3 and 4. The coverage is the percentage of amino acid residues included in the residue ranges found by the different methods. The GDT_TS value is defined by GDT_TS = (P1 + P2 + P4 + P8)/4, where Pd is the fraction of residues that can be superimposed under a distance cutoff of d Å. Additional file 10: Correlation between the RMSD value for the residue ranges from CYRANGE, FindCore and PSVS, and the GDT total score, GDT_TS. Each data point represents one protein domain. See Additional File 9 for details.