Functional impact of mutations

Besides structural consequences, variants can disrupt molecular functional sites, such as catalytic residues and DNA/protein binding sites, which are usually position-specific or share consensus motifs. Those disruptions, however, do not necessarily involve disruption of structure. A prominent class of sites that variants would affect consists of diverse PTM sites, of which some of the most frequent types are phosphorylation, glycosylation, acetylation, methylation, and ubiquitination. PTMs play an important role in cellular signal transduction and regulation, and activating and inactivating certain key proteins rely on precise modulation of PTMs in cell activities. For instance, without environmental stress, p53 is suppressed through ubiquitination catalyzed by E3 ubiquitin ligases, while in the presence of stress, such as DNA damage, p53 is activated by a variety of PTM enzymes, including acetylation and phosphorylation on its flexible DNA-binding domain [29]. PTM sites and flanking residues generally form consensus sequences with a high degree of variety, and therefore variants within these enzyme-specific motifs could abolish known functionalities or create new ones. This section starts by detailing two concrete examples of functional changes due to variants, followed by a description of DisPhos (Disorder-enhanced Phosphorylation sites predictor), an established phosphorylation predictor, and then explain how the concepts of gain and loss of phosphorylation can be used to analyze a cancer data.

FGFR2 (fibroblast growth factor receptor 2), one of four members of FGFR family of receptor tyrosine kinases, plays an important role in transmembrane signal transduction. Recent research identified one missense mutation, A628T, as being involved in LADD syndrome through severely impairing the kinase activity of FGFR2 [30]. Residue A628 is in the center of the catalytic pocket in the tyrosine kinase domain of FGFR2. A mutant structure, A628T-FGFR2 [31], reveals that the substitution of the smaller amino acid alanine at position 628 with the larger, polar threonine pushes one of the key residues, R630, out of the catalytic pocket; that movement disrupts the hydrogen bond between D626 and R630 existed in the wild-type structure (Figure 3, left). Although the position of D626 remains almost unchanged, R630 is too far away from the catalytic pocket and fails to stabilize the interaction with substrates, which consequently greatly compromises the catalytic ability of FGFR2. Compared with wild-type FGFR2, the A628T-FGFR2 mutant has roughly the same structure but highly reduced kinase activity.

It has been observed that amino acid substitutions occurred on non-PTM-sites could spread their influence to neighboring PTM sites on the same protein. One of such examples is PTPS, human PTP (protein tyrosine phosphatase) synthase, which catalyzes triphosphate elimination. PTPS participates in the biosynthetic pathway for tetrahydrobiopterin (BH4). Lack of PTPS catalytic activity causes a deficiency of BH4, which in turn leads to hyperphenylalaninemia (HPA), an autosomal recessive disorder. Missense mutation R16C was associated with HPA and resulted in reduced activity of PTPS [32]. Moreover, phosphorylation of S19 on PTPS is required for maximal enzyme activity [33]. So how does R16C affect phosphorylation on S19? There are multiple potential explanations. One is that the structure of PTPS shows the exposure of both R16 and S19 on the surface of the protein (Figure

Figure 3. The crystal structure of the catalytic pocket of the A628T-FGFR2 mutant (left, PDB ID: 3B2T) and ribbon view of human PTPS structure (right, PDB ID: 3I2B). In both cases, the N-terminus is colored in blue and the C-terminus in red. Residues of interest are depicted as ball and stick models.

Figure 3. The crystal structure of the catalytic pocket of the A628T-FGFR2 mutant (left, PDB ID: 3B2T) and ribbon view of human PTPS structure (right, PDB ID: 3I2B). In both cases, the N-terminus is colored in blue and the C-terminus in red. Residues of interest are depicted as ball and stick models.

3, right; [34]) that forms the consensus sequence R16XXS19 for cGMP protein kinase II. The substitution C16 disrupts this kinase-recognizable motif and thus hinders phosphorylation, which ultimately leads to the inactivation of PTPS. Another explanation is that a removal of R16 prevents a salt bridge between it and a phosphate group when attached, which in turn results the loss of stability of the modified protein.

As with the stability prediction tool MUpro, described in the previous section, experimental difficulties have promoted the development of computational approaches to estimating many common PTM sites based on protein sequence. For the prediction of phosphorylation, DisPhos differs from other available methods like NetPhos [35] and ScanSite [36], since its model explicitly includes a range of characteristic features from the predicted disorder region around the phosphorylation site [37].

In some cases, researchers have found phosphorylation sites located on intrinsically disordered regions or have observed disorder-to-order or order-to-disorder conformational changes upon phosphorylation [38]. DisPhos exploited such observations by integrating predicted disorder information with the motif profile to improve its predictive performance.

Because phosphorylation occurs on residues S, T, and Y (S/T/Y), DisPhos assembled three pairs of positive-negative data sets, with each pair corresponding to one residue-specific predictor. First, it extracted proteins with phosphorylation annotations from UniProt (Universal Protein Resource); it then combined this data with data from Phospho.ELM [39]. DisPhos placed a 25-residue segment centered on each annotated S/T/Y into a positive set, while placing the same length segment around every non-annotated S/T/Y on the same protein into a negative set. To reduce the sequence bias caused by homologs or duplications, DisPhos only kept entries with a pairwise sequence similarity of less than 30 percent, which means that it allowed up to seven matches from alignment without gap. Due to the small size of experimentally verified phosphorylation sites, the filtered data sets were highly unbalanced (Table 2).

DisPhos used a broad range of features to discriminate positive from negative sites (Table 3).

To cope with the highly dimensional, yet sparse feature space, DisPhos performed feature selection by applying a permutation test to binary features and applying principal component

Residue Positive Sites (P) Negative Sites (N) N/P Ratio

Table 2. Data sets used in DisPhos (adapted from Table 1 in [37])

Type Features Dimension

Amino acid composition Binary coding 480

Amino acid frequency Binary coding 20 Disorder VLXT, VL2, VLV, VLC, VLS 5 Secondary structure Helix, loop and sheet 7 Sequence property Complexity and flexibility 2 Residue property Net charge, aromatic content, 5

Hydrophobic moment, Hydrophobicity, exposed/buried

Table 3. Descriptive and predicted features used in DisPhos training.

analysis (PCA) to continuous features and then fitted logistic regression models to the transformed data sets.

Generally, binary classifiers work best in settings of balanced or close to balanced data sets in terms of accuracy, sensitivity, and specificity. For a classification in which the class boundary is determined by a solution that maximizes accuracy-the default configuration for many popular classifiers-training on highly unbalanced data sets inevitably results in extreme values for sensitivity or specificity, ultimately leading to poor generalization. DisPhos adopted an ensemble strategy to correct this issue in the S/T/Y data sets.

The combination of data filtering, feature selection, and sophisticated training and test configurations enabled DisPhos to achieve accuracy ranges between 70 and 80 percent, an improvement over the accuracy of other similar predictors. Moreover, the features derived from disorder predictions improved the accuracy by two percent on average, and these improvements showed the usefulness of disorder features in the prediction of phosphorylation sites.

DisPhos represents outcomes as probabilities, which quantitatively measure the likelihood that the underlying residues are phosphorylation sites. This characteristic facilitated the definition of gain and loss of phosphorylation for a specific site [40], and since these concepts can be interpreted readily, they may help provide insight into the underlying molecular mechanisms of mutations associated with diseases. Actually, the definitions of gain and loss are not limited to phosphorylation sites and can apply just as well to many other functional and structural properties.

Using bioinformatics tools that predict functional and structural attributes on both wild-type and mutant protein sequences provides us with two probabilistic estimates for a property p: P(p = 1 at sf) and P(p = 1 atsf) at site Si, with sf denoting a wild type site and sf denoting a mutant site. Then, conceptually, we have

P(loss of property p at site s^ ) = P( p = 1 atsf AND p = 0ats™ ).

Given that sw and sm are actually different molecules, we consider that P(p = 1 at sf) and P(p = Oatsm) are not dependent because of any underlying process. Therefore, we can expand the right hand of equation (1) as a product:

P(p = 1 atsw ANDp = 0atsm) = P(p = 1atsw) • P(p = 0atsm)

By substituting equation (1) with equation (2), we get

P(loss of property p at site si) = P(p = 1 at sf) • [1 - P(p = 1 ats-")] (3)

Likewise, we can define gain of a property as

P(gain of property p at site si) = [1 - P(p = 1atsf)] • P(p = 1at s^1) (4)

Figure 4 shows the contour of gain of a property. Note that we can still compute gain/loss even if the predictions for the property are the same for wild-type and mutant sequences. The value of gain/loss varies from 0 to 0.25 when both predictions take a value of 0 through 0.5.

P(mutant)

Figure 4. The contour of gain of property with respect to probability on mutant sequence-x-axis, P(mutant)-and wild-type sequence-y-axis, P(wild)). The dashed line denotes sites with equal probabilities for the two types of sequences.

P(mutant)

Figure 4. The contour of gain of property with respect to probability on mutant sequence-x-axis, P(mutant)-and wild-type sequence-y-axis, P(wild)). The dashed line denotes sites with equal probabilities for the two types of sequences.

[40] showed one application of gain and loss of phosphorylation. An experiment in their study collected 1,099 breast and colorectal cancer nsSNPs occurring on 847 proteins from a large-scale cancer-tumor-sequencing project [41]. Radivojac et al. then paired control and mutation data by randomly mutating on the same set of 847 wild-type proteins at the codon level. Their study then calculated gain and loss of phosphorylation for each mutation in both data sets, and found that disease-associated nsSNPs were significantly more likely to be involved in adding new phosphorylation sites (Table 4).

Phosphorylaiton change Disease nsSNPs Control nsSNPs P-value Gain 1.91 086 0.014 Loss_170_1.50_0.59

Table 4. Percentage of mutations predicted to have undergone gain or loss of phosphorylation. P-values were computed by i-test.

This survey showed how the concepts of gain and loss of phosphorylation could distinguish cancer-associated from neutral somatic mutations; it also suggested that they could serve as useful features for discriminating between general disease-related nsSNPs and neutral ones.

Was this article helpful?

0 0
10 Ways To Fight Off Cancer

10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook


Post a comment