Water

protein protein protein protein

sequestered

not available

protein protein protein available

structural

locate a donor

available protein protein protein

structural

locate an acceptor

available protein available protein

ligand-like

replace by e.g. -CO?

protein available protein available

ligand-like

replace by e.g. -NH2

protein protein available available

ligand-like

replace by e.g. -OH

Where only one interaction with the protein occurs, then replacement should be possible unless the water molecule under consideration forms a link in a chain of interacting hydrogen bonded groups from the protein or other water molecules. In this case preservation of the chain may be an important consideration. For two or less interactions with the protein or surrounding system, the bound water may be described as ligand-like and it should be possible to displace it with a favourable energetic outcome provided that there is no degradation in the quality of the replacement interactions.

3.5.3.2 Linking the probes

The construction of potential intramolecular links between two probe groups is a straightforward if tedious problem of determining the possible spans by constructing a series of bonds with standard lengths, angles and torsions and elucidating those links which do not clash with the protein. A number of methods have been developed to address this problem. For up to six bonds, given the bond lengths and angles, the required torsion angles can be solved analytically. Beyond this, connection can be achieved by constrained optimisation of the torsion angles introducing constraints using the method of Lagrange multipliers. If torsion angles are sampled at appropriate minima, combinations of bond geometries (tetrahedral and trigonal) can be assembled into a growing network which terminate when a connection between fragments is made. Generally, several linker chains of varying length and composition will connect the probes and often these can be combined to give cyclic structures, eliminating unwanted conformational freedom and associated entropic effects. Similarly when the linkers show certain patterns of torsion angles, for, example, a series of planar torsions, they may be reinforced by constructing rings incorporating those torsions.

Alternatively, the spatial arrangement of functional groups within the binding site allows this geometric structure to interrogate a data base of small-molecule three-dimensional structures such as the Cambridge structure data base. In pharmaceutical companies, such data bases contain up to 1 million molecules. Molecules matching the required criteria can be tested for their ability to bind to the protein. The fastest searches assume a single conformation for each small molecule, but multiple conformations can be sampled if pre-stored in the data base. A method requiring less data storage at the expense of description of strain within the molecule and of computer time, attempts to fit the spatial constraints of the search query using distance geometry methods (Blumenthal (1970)).

Further data base methods utilise the vector nature of the probe to its potential link with the putatative ligand. The geometric relationship between the vectors can be defined in terms of distances, angles and torsions. A searchable vector data base can be generated from any set of molecules by identifying templates with a number of connector bond vectors and then tabulating the geometric relationships between them. These templates are often rigid ring systems and the connectors, C-H bonds. Starting from commercially available compounds, a data base of more than 30,000 templates can be derived. Searching a vector data base for templates capable of connecting the localised functional groups simply correlates matching the appropriate distances, angles and torsions within given tolerances. An ability to synthesise the appropriate template is usually an overriding choice amongst the matching templates.

Finally, it may be noted, that the whole process of matching a ligand to its site can be machine based without recourse to an experimental data base. If a decision is taken on the basis of the synthetic chemistry to be exploited, for example that of substituted benzdiazepines, then the most promising substitution patterns can be identified. Since the chemical reactions are specified, the reagents that are available commercially can be used as input to the computation and the output can be exploited using robotic methods in multiple parallel syntheses to generate libraries of candidate compounds.

3.5.3.3 Single fragment probes and ligand evolution

As the name implies, an initial target binding site is selected and an initial fragment probe developed from which the ligand is allowed to grow. This growth can be done by successive addition of atoms using a correlated acceptance or rejection procedure on each addition, the choice being dependent on their fitness to the protein environment. At a geometric level, the quality of the ground rules and the range of atom types considered are critical to the validity of the method. A variant of the method allows atoms to be 'mutated' and segments of one molecule to be exchanged for a second. These evolutionary steps of addition, mutation, and crossover form the basis of a 'genetic' algorithm. The number of structures evolved is controlled by assessing the fitness of the protein environment. Further criteria are required for realistic segments to be be identified and to ensure that the consequences of mutating atoms on surrounding atoms are transmitted into the next generation.

Again a fragment database with connector bonds can be attached to candidate 'hooks' within the seed. Selection procedures based on some 'protein binding' score catering for the interaction and the degree of distortion involved with the new link are used in such procedures. The quality of the criteria and the potentially regressive effect of the enlargement on previous substitutions where a net favourable gross interaction may occur, highlight difficulties with these automated procedures. As further steps become involved, the enlargement may lead to combinatorial explosion in the number of candidates.

3.5.3.4 Filling the target site

The final approach is to fill the target site with nominal atoms and then choose viable subsets and determine the chemical nature of the atoms constituting the candidate ligand. Again much of these automated procedures are based on simple logical procedures. A regular lattice such as the diamond, tetrahedral or planar hexagonal is positioned in the binding site using interactive graphics or by calculating minimal steric clashes with the protein surface. Complementary ligand-receptor interatomic interactions may be assigned and viable subsets of atoms selected. It is, however, difficult to mix different regular lattices to form realistic molecules.

Alternatively one may place a set of small acyclic and cyclic fragments to fill the site with all possible combination frameworks. It is then necessary to select candidate subgraphs and assign atom types via the protein environment. This method allows different geometries to be used together and overcomes the combinatorial explosion by using a small set of fragments involving only carbon atoms. Here, the main difficulties lie in the selection of viable sub-graphs and the assignment of atom types. A much simpler approach is to characterise the shape of the protein binding site as a defined ellipsoid and search a data base of small molecules for identifying suitable ligands which fill the site approximately. The structure may then be substituted to adapt and complement the target site.

The last approach in this category is to fill the site with atoms whose individual nature is randomly assigned. The system is equilibrated using molecular dynamics with a force field that allows for 'soft' repulsion between the atoms. A 'mother' atom is randomly selected and attempts are made to form bonds with neighbouring atoms using probabilistic rules. If accepted, the system is then relaxed using molecular dynamics and a new 'mother' atom selected. The process is repeated for a specified number of selections, resulting in the emergence of a candidate ligand from the initial aggregate of atoms. The process is thus stochastic and may take many repeats to arrive at a synthetically useful ligand, The rules for bond formation and the associated acceptance criteria are crucial to this approach.

3.5.4 Accommodation of the protein to ligand binding. Estimating interaction free energies

In the previous Section, the structure of the protein was taken to be fixed at the average determined by X-ray crystallography or NMR spectroscopy. By comparing native protein structure with those of complexes, it is apparent that some degree of accommodation to the ligand always occurs on binding. Indeed, in many cases, significant conformational changes accompany the ligand binding. Given the choice of a native protein structure or the structure of the protein partner from a ligand complex, experience has indicated that the latter is the better starting point for ligand design. The problem here, as with basing new design on an active ligand conformer when the structure of the protein binding site is unknown, is the inherent bias of the bound ligand conformation. Clearly one should design much better, if the accommodation of the protein to novel structure were taken into account.

The local fluctuations in protein and ligand structures can be introduced in a given mode of binding to yield a free energy. Using Monte Carlo or molecular dynamics methods (see for example, Beveridge and DiCapua (1989), Allen and Tildesley (1987), Valleau and Whittington (1977)), an ensemble of local fluctuations within the ligand and the protein are calculated to yield the thermodynamic functions of the binding. In the former method, the sample space is efficiently explored using an algorithm based on Boltzmann weighting while in the latter, the dynamics of the interactions are explored over a period of nanoseconds. Both methods thus allow for an ensemble of protein structures to be explored and replace the single rigid structure used hitherto. As indicated earlier in the context of building fragments, one problem is the expansion in the number of protein 'structures' which are associated with individual designed ligands. and the limits on computing time. Again restriction on the variables undergoing change in the fluctuations to torsional angle subsets may alleviate the problems to some extent. If large conformational changes occur on binding, then the changes are difficult to simulate in any predictive way.

Some experimental information on restricting the scale of the structure to be relaxed can be given by the X-ray or NMR structure. NMR determined structures are defined by an ensemble of structures that meet the NMR structural criteria. This ensemble can be used instead of a single structure. The mobility of atoms in structure determined by X-ray crystallography is often represented by an associated temperature factor, and these data could be incorporated into the design process. There is a structural hierarchy in relation to the protein's accommodation to the ligand, from side-chain reorientation, then local main-chain adjustments and finally large hingebending movements of whole regions of the protein structure.

Although many of these decision taking processes may be introduced into automated regimes, the introduction of specific constraints removes some of the objective character of the procedures involved, and all methods are limited by the adequacy of the physical descriptions of the interactions defined in Sections 3.2-3.4 Specific polarizing effects of strong charge interactions inducing changes in the charge distribution both in ligand and in protein are not introduced into standard fast potential routines unless potentials are specifically developed over the sets of ligand and protein atoms for the particular interaction concerned using more fundamental quantum mechanical calculations. There is a case for doing this in any area of detailed study. The difficulties of estimating accurate free energies of binding should not be underestimated. It would, of course, be desirable to calculate all interactions by fundamental quantum mechanical methods but the physical constraint on machine time becomes quickly rate limiting. The scale of the problem with current machine capabilities is summarised in Section 3.7.

3.6 PROTEINS

The theoretical determination of protein structure from first principles based on the intramolecular interactions of the individual amino acids, as we remarked earlier, would have high significance in the design of inhibitory or stimulatory ligands in many areas of drug therapy. This is a large subject and we refer to more specialised treatments. The possible number of sequences in an average sized protein of some 400 amino acids is 2 0 400 based on the 20 amino acids and the question as to why only a very small fraction occurs in nature may resolve to structures that have unique and stable native states. A recent paper (Li, Helling, Tang and Wingreen (1996)) which avoids most details of the chemistry of the amino acid interactions examines a polymer of 27 amino acids occupying all sites of a 3x3x3 cube employing simple interactions on a lattice (hydrogen bonding or otherwise). The great majority of sequences have multiple ground states and hence may fold into different structures assuming no inherent large kinetic barrier. Thus 'foldability' focusses on the sequence selecting potentially functional ones while 'designability' is based on the structure of the resulting protein, which is quantified by measuring the number of sequences that uniquely fold into a particular structure. In evaluating the 227 structures in the simple amino acid scheme, the distribution gives a number of patterns. At the tail of the distribution, there are structures that are highly desirable, they are also more stable. The number of sequences (NS) associated with a given structure (S) differs from structure to structure but preferred structures emerge with NS values much larger than the average. Analysis of the mutation patterns of the homologous sequences for highly designable structures revealed phenomena similar to those observed in real proteins, some sites being highly mutable while others are highly conserved. Although the initial categorisation is elementary, such an approach may offer a pathway to introducing constraints on the multiple minima problem in addition to the already established methods.

3.7 ACCURATE CALCULATION OF INTERMOLECULAR INTERACTIONS

In view of the problems of determining molecular interactions accurately, particularly for stronger interactions where the involvement of charge transfer and polarisation are significant, the question may be asked as to why one does not work at a more significant level of accuracy. Here, to provide some perspective to this problem, we simply outline the scale of calculations currently feasible with existing parallel computers. Undoubtedly, the best compromise is to achieve the greatest accuracy possible dependent on the scale of the problem. Given time and effort, if the scale demands the use of fast methods involving empiric potentials, it would undoubtedly be best to develop the best empiric potentials for each particular interaction, dependent on its environment and degree of local interaction. Considerable effort has been made to provide more flexibility in this direction by the categorisation of potentials for particular interactions.

In fundamental or 'ab initio' quantum mechanical calculations, each electron's interaction with the nuclei is not strictly independent of the position of other electrons in the system. To simplify the problem, the initial approximation is made that each electron interacts with the average field of the other electrons, i.e. the motions of the electrons are uncorrelated (Hartree-Fock approximation). An electron will thus have kinetic energy while its potential energy will consist of its interaction with other nuclei and with the average field of the other electrons, so that the problem is reduced to a set of one electron equations. Molecular geometries, dipole moments and electrostatic effects may be calculated to good accuracy with this approximation. The neglect of electron correlation, however, means that dispersive or van der Waals interactions are not present, while in situations where electron correlation is important, for example in transition states with molecules near dissociation limits, the approximation is completely invalid.

The electrons on each atom are characterised by molecular orbitals and a molecular orbital is constructed from a linear combination of the atomic orbitals. The set of vector functions defining the atomic orbitals is known as a basis set. If one function is used to characterise the atomic orbitals, the set is known as a minimal basis, and broadly viewed, a minimal basis has insufficient flexibility to enable the valence electrons to spread themselves out satisfactorily and such conditions can have different consequences dependent on the occupied and unoccupied orbitals. Providing two functions to characterise each orbital (doubling the basis set) allows much more flexibility in the wave functions but can produce exaggerated properties. As interactions become stronger, the introduction of d orbitals in atoms in the first row of the periodic table becomes significant leading to some 15 basis functions for a first row atom and the size of the basis set rapidly expands even with relatively small molecules. In Hartree Fock-theory, the number of two-eletron integrals rises as n4 where n is the number of basis functions. An alternative approach which has gained ground in recent years is based on the Kohn-Sham theory that an exact solution to the Schrodinger equation exists which leads to self-consistent equations as in Hartree Fock theory. The many-electron problem can be replaced by an exactly equivalent set of one-electron equations with an effective one-particle potential. This effective potential will reproduce the exact density and the exact total energy if the definition of this potential can be defined. The advantage of this density functional approach which has required considerable development is that the scale of this approach rises as n2. Thus for larger problems of chemical interest, the potential becomes high.

A decade ago, the cutting edge of computation was a machine with a speed of some 100 mflops/sec but now speeds approach 100-1000 gigaflops/sec. Utilising the benefits of parallelisation of machines applies only to certain calculations where the problem can be dismembered satisfactorily to run time-limiting sections in parallel as with the calculation of two-electron integrals. Using 64 node machines, the practical limit on basis functions using ab initio methods is approximately 4000. This allows dependent on accuracy, an interaction of some 250-1000 atoms. The basis set limit using density functional theory is perhaps 5000. Semi-empirical quantum mechanical methods cannot utilise the benefit of parallelisation beyond about 8 nodes and the practical limit of scale is again of this order. For free energy calculations using empiric potentials, molecular dynamics methods can handle up to 500,000 atoms for 1 nanosec time scale. Vibrations and rotations with time scales of 10-15 and 10-12 sec. respectively can be handled by such calculations. However docking in molecular recognition (of the order of 10- sec.) and translational motion in liquids are on longer time scales.

REFERENCES

Allen, M.P. and Tildesley, D.J. (1987) Computer Simulations of Liquids. Oxford: Clarendon.

Abraham, M.H. (1982) Free energies, enthalpies and entropies of solution of gaseous nonpolar nonelectrolytes in water and nonaqueous solvents. The hydrophobia effect. Journal of the American Chemical Society 104, 2085-94.

Abrahams, J.P., Buchanan, S.K., van Raaij, M.J., Fearnley, I.M., Leslie, A.G.W. and Walker, J.E. (1996) The structure of bovine F1-ATPase complexed with the peptide antibiotic efrapeptin. Proceedings of the National Academy of Sciences 93, 9420-4.

Beveridge, D.L. and DiCapua, F.M. (1989) Free Energy via molecular simulation: A primer. In Computer Simulations of Biomolecular Systems, edited by W.F.van Gunsteren and P.K.Weiner, pp. 1-26. Leiden: ESCOM.

Blumenthal, L.M. (1970) Theory and Applications of Distance Geomery, 2nd edn., Bronx, New York: Chelsea.

Brunck, T.K. and Weinhold, F. (1979) Quantum Mechanical Studies on the origin of barriers to internal rotation about single bonds. Journal of the American Chemical Society 101, 1700-9.

Cariati, F., Cauletti, C., Ganadu, M.L., Piancastelli, M.N. and Sgamellotti, A. (1980) Spectroscopic investigations on phthalazino(2,3-b)phthalazine-5,12-dione and some of its mono and di-substituted derivatives. Spectrochimica Acta 36A, 103743.

Csizmadia, I.G. (ed.) (1982) Molecular Structure and Conformation. Amsterdam: Elsevier.

Davies, R.H. (1987) Drug and Receptors in Molecular Biology. International Journal of Quantum Chemistry and Quantum Biological Symposica 14, 221-43.

Davies, R.H., Sheard, B. and Taylor, P.J. (1981) Conformation, partition and drug design. Journal of Phamaceutical Sciences 68, 396-97.

Deslongchamps, P. (1983) Stereoelectronic Effects in Organic Chemistry. Oxford: Pergamon.

Feng, J.-A., Johnson, R.C. and Dickerson, R.E. (1994) Hin recombinase bound to DNA: The Origin of specificity in major and minor groove interactions. Science 263, 348-55.

Jorgensen, W.L. and Salem, L. (1973) The Organic Chemist's Book of Orbitals. New York and London: Academic Press.

Kuboniwa, H., Tjandra, N., Grzesiek, S., Ren, H., Klee, C.B. and Bax, A. (1995) Solution structure of calcium-free calmodulin. Natural Structural Biology 2, 76876.

Li, H., Helling, R., Tang, C. and Wingreen, N. (1996) Emergence of preferred structures in a simple model of protein folding. Science 273, 666-9.

Marquart, M., Walter, J., Deisenhofer, J., Bode, W. and Huber, R. (1983) The geometry of the active site and of the peptide groups in trypsin, trypsinogen and its complexes with inhibitors. Acta Crystallographica B39, 480-90.

Nederkoorn, P.H.J., Timmerman, H., Timms, D., Wilkinson, A.J., Kelly, D.R., Broadley, K.J. and Davies, R.H. (1997) Stepwise phosphorylation mechanisms and signal transmission within a ligand-receptor-GaPy-protein complex recent submission.

Page, M.I. and Jencks, W.P. (1971) Entropic contributions to rate accelerations in enzymic and intramolecular reactions and the chelate effect. Proceedings of the National Academy of Sciences 68, 1678-83.

Radom, L. (1982) Structural consequences of hyperconjugation. In Molecular Structure and Conformation: Recent Advances, edited by I.G.Csizmadia, pp. 1-64. Amsterdam: Oxford, New York, Elsevier.

Reed, A.E., Weinhold, F., Curtiss, L.A. and Potachko, D.J. (1986) Natural bond orbital analysis of molecular interactions: The theoretical studies of binary complexes of HF, H2O, NH3, N2, O2, F2, CO and CO2 with HF, H2O and NH3 . Journal of Chemical Physics 84, 5687-705.

Sielecki, A.R., Fedorov, A.A., Boodhoo, A., Andreeva, N.S. and James, M.N.G. (1990) Molecular and crystal structures of monoclinic porcine pepsin refined at 1.8 A resolution. Journal of Molecular Biology 214, 143-70.

South, T.L., Blake, P.R., Hare, D.R. and Summers, M.F. (1991) C-Terminal retroviral-type zinc finger domain from the HIV-1 nucleocapsid protein is structurally similar to the N-terminal zinc finger domain. Biochemistry 30, 6342-9.

Taylor, D.A., Sack, J.S., Maune, J.F., Beckingham, K. and Quiocho, F.A. (1991) Structure of a recombinant calmodulin from drosophila melanogaster refined at 2.2 A resolution. Journal of Biological Chemistry 266, 21375-80.

Valleau, J.P. and Whittington, S.G. (1977) A Guide to Monte Carlo for Statistical Mechanics: 1. Highways. In Statistical Mechanics Part A: Equilibrium Techniques edited by B.J.Berne, pp. 137-168. New York and London: Plenum.

Vitali, J., Martin, P.D., Malkowski, M.G., Robertson, W.D., Lazar, J.B., Winant, R.C., Johnson, P.H. and Edwards, B.F.P. (1992) The structure of a complex of bovine a-thrombin and recombinant hirudin at 2.8 A resolution. Journal of Biological Chemistry 267, 17670-8.

Vos, A.M.de, Ultsch, M. and Kossiakoff, A.A. (1992) Human growth hormone and extracellular domain of its receptor: crystal structure of the complex. Science 255, 306-12.

Plate 3.1 Serine proteases. Proton movement and enzymatic cleavage of the peptide bond. The serine proteases are characterised by an Asp102 - His57 - Ser195 catalytic triad. Experimental (NMR) and theoretical results have indicated that the histidine residue remains neutral thoughout the course of the reaction. The initiating attack of Ser195 on the peptide carbonyl carbon atom is facilitated by the abstraction of the hydroxyl proton by His57. The proton originally residing on His57 is transferred to Asp102 and the incipient negative charge developing on the peptide carbonyl oxygen is stabilised by hydrogen bonding from the main chain -NH groups of residues 193 and 195. The tetrahedral intermediate collapses to an acylated enzyme with the delivery of a proton to the leaving amino group. This proton originates from His57 but delivery may be mediated by a water molecule. Concomitantly, the histidine accepts the proton from Asp102 to regenerate the initial protonation state. Deacylation follows an analogous cycle of proton transfers with a water molecule replacing Ser195 as the nucleophile and with the serine becoming the leaving group.

The figure (Marquart, Walter, Deisenhofer et aJ. (1983)) shows the catalytic site of trypsin in the presence of the bovine pancreatic trypsin inhibitor (BPTI). The Ca trace of the enzyme (pink) shows the catalytic triad to the right. The scissile carbonyl carbon atom is shown in green. The primary recognition of the peptide bond to be cleaved results from a binding pocket for the substrate side chain in the vicinity of residue 189. The nature of the residues in this pocket predicate the particular specificity of the protease. In the case of trypsin, this residue is an aspartate and specificity is for basic side chains. In the left of the

flFpi^idc ben J Hd rf fwirjL Qfl ci

The figure (Marquart, Walter, Deisenhofer et aJ. (1983)) shows the catalytic site of trypsin in the presence of the bovine pancreatic trypsin inhibitor (BPTI). The Ca trace of the enzyme (pink) shows the catalytic triad to the right. The scissile carbonyl carbon atom is shown in green. The primary recognition of the peptide bond to be cleaved results from a binding pocket for the substrate side chain in the vicinity of residue 189. The nature of the residues in this pocket predicate the particular specificity of the protease. In the case of trypsin, this residue is an aspartate and specificity is for basic side chains. In the left of the flFpi^idc ben J Hd rf fwirjL Qfl ci

figure, a lysine side chain is shown interacting with Asp189 and Thr190 via two water molecules.

figure, a lysine side chain is shown interacting with Asp189 and Thr190 via two water molecules.

Plate 3.2 Inhibition of a serine protease and protein-protein recognition. The natural ligand inhibitor, Hirudin binding to the catalytic Asp-His-Ser triad within the serine protease a-Thrombin (Vitali, Martin, Malkowski etal. (1992)). a-Thrombin has a high specificity for peptide bonds associated with arginine residues and plays a central role in thrombosis and haemostasis. It is the product of prothrombin cleavage by factor Xa in the final step of the blood clotting cascade, and consists of two polypeptide chains, A and B, connected through a single disulphide bond.

During clotting, a-thrombin converts fibrinogen into fibrin by removing fibrinopeptide A from the Aa-chain and fibrinopeptide B from the Bp-chains of fibrinogen. Hirudin is a small protein of 65 residues and 3 disulphide bonds that is isolated from the glandular secretions of the leech Hirudo medicinalis and is a potent natural inhibitor of thrombin. The figure shows the large surface area of contact of the Hirudin inhibitor (blue) with the serine-protease, bovine a-thrombin (brown). The Asp102 - His57 - Ser195 catalytic triad of the enzyme (elemental colouring) is blocked by the first three residues of the N-terminal chain of Hirudin (refer to the Hirudin N-terminus in green). In human thrombin, two hydrogen bonds from the amino terminal group exist. In the crystal developed at pH 4.7, one is to the carbonyl group of Ser214 and the second is to the catalytic serine residue 195. For the crystal developed at pH 7, this second bond is to His57. Neither bond is formed in this bovine complex at pH 4.7 indicating that a second bond may not be essential for Hirudin binding. Specific binding to the associated binding site for arginine residues does not occur (compare the bovine pancreatic trypsin inhibitor in Plate 3.1) but a number of exo-sites on the surface of the thrombin can interact with the inhibitor. The last sixteen residues of hirudin are in an open conformation and bind between the two loops of the enzyme surface formed by Phe34 to Leu41 and by Lys70 to Glu80. This region of the enzyme is marked by positively charged side chains and interaction with Hirudin's anionic residues Asp53, Asp55, Glu56, Glu57. The latter three residues are shown in red. Salt bridges are formed by Asp55 and Glu57 interacting with the enzyme residues Arg73 and Arg75 respectively.

Plate 3.3 Aspartate proteases. As for the serine proteases, electron reorganisation coupled to proton movement is critical to the cleavage of the peptide bond. In this case, the catalytic site consists of two adjacent aspartate residues, Asp32 and Asp215 (pepsin numbering) which localise a solvent water molecule between their carboxyl groups. This highly polarised water molecule is the initiator of the peptide bond hydrolysis. Studies of the pH dependence of catalysis by porcine pepsin leads to estimates of two pKa values of 1.2 and 4.7 and, hence, one of the apartate residues is thought to be protonated in the resting state. High refinement (1.8 A (Sielecki, Fedorov, Boodhoo et al. (1990)) of the pepsin structure shows that the oxygen atom of the catalytic water molecule lies in the plane of Asp215 whereas the carboxylate group of Asp32 is twisted by some 22° with repect to this common plane. In the resting state, the carboxylate oxygen atoms are arranged such that one from each residue is within hydrogen bonding distance of the water molecule and the two adjacent 'inner' carboxylate oxygen atoms from each residue are also hydrogen bonded together. From the interatomic distances, the 'inner oxygen atom of Asp32 and the 'outer' oxygen of Asp215 are hydrogen bonded to the water. Hence the probable location of the proton that forms the hydrogen bond between the carboxylate groups is on the 'inner' oxygen atom of Asp215. The shorter contact distance from the water molecule is also to Asp32 (2.6 A) rather than to Asp215 (2.9 A.) suggesting that in the resting state, it is Asp215 that is protonated. The proposed hydrogen bond length between Asp32 and Asp215 is 2.8 A. The precise mechanism of catalysis is not known and the following description represents one possible hypothesis. Nuceophilic attack on the peptide carbonyl carbon atom is probably facilitated by movement of a proton from the water molecule to Asp215 along with proton transfer from Asp215 to Asp32. The 'inner' oxygen atoms are located via hydrogen bonds from the main chain -NH groups of Gly34 and Gly217 and two residues, Ser35 and Thr218 may hydrogen bond to the 'outer' carboxylate oxygen atoms. Ser35 may help to stabilise the incipient oxyanion of the tetrahedral intermediate and Thr218 may position a second water molecule in order to mediate the transfer of the proton from breakdown of the tetrahedral intermediate. As the proton is transferred from the 'outer' oxygen atom of Asp215, the proton on Asp32 is transferred to the 'inner' carboxylate of Asp215 so restoring the initial state.

The hypothetical proton and electron reorganisations are shown in the scheme below

Inhibitors of aspartate proteases such as pepstatin, displace the catalytic water molecule by an appropriately orientated hydroxyl group. The figure shows the pepsin catalytic site with the resident water molecule superimposed on a second structure determined in the presence of pepstatin (green). In the centre of the figure are the hydroxyl group of pepstatin and the catalytic water molecule with Asp32 located to the left.

Asp215 to the leaving amino group on

The hypothetical proton and electron reorganisations are shown in the scheme below

Inhibitors of aspartate proteases such as pepstatin, displace the catalytic water molecule by an appropriately orientated hydroxyl group. The figure shows the pepsin catalytic site with the resident water molecule superimposed on a second structure determined in the presence of pepstatin (green). In the centre of the figure are the hydroxyl group of pepstatin and the catalytic water molecule with Asp32 located to the left.

'inner1

CleiVlgO uf Tcplidi: bond udrratiifig oí Aiputltl prelSMNuit

'inner1

This page intentionally left blank.

Plate 3.4 Protein-Protein recognition. The influence of a hormone on protein dimerisation. Human growth hormone (hGH) binding to the extracellular domain of its receptor (de Vos, Ultsch and Kossiakoff (1992)). The binding of hGH to its receptor is required for regulation of normal human growth and development. The extracellular domain of the receptor (hGHbp) complex, here shown as a ribbon structure, consists of one molecule of growth hormone per two molecules of receptor (orange and blue respectively). The hormone (lilac) is a four helix bundle. The binding protein consists of two distinct domains which have some similarity to immunoglobulin domains. In the complex, both receptors donate essentially the same residues to interact with the hormone even though the two binding sites on hGH have no structural similarity. In addition to the hormone-receptor interfaces, there is also substantial contact between the carboxyl-terminal domains of the receptors.

The core of the helix bundle is made up of primarily hydrophobic residues. The extracellular part of the receptor consists of the two domains linked by a four residue segment of polypeptide chain. Each domain contains seven P-strands that together form a sandwich of two antiparallel P-sheets, one with four strands and one with three with the same topology in each domain. The thirty residues of the receptor's amino terminal domain show conformational flexibility and are not given in the crystal structure. The carboxy-terminal domains are closely parallel, the termini pointing away from the hormone in the expected direction of the membrane. Intact receptors would have an additional eight residues at the end of the seventh strand (bottom right) which form the putative membrane spanning helix.

Plate 3.5 A potential ligand-activated proton pathway for signalling in a guanine-nucleotide-coupled receptor ternary complex acting as a guanosine triphosphate synthase (Nederkoorn, Timmerman, Timms et al. (1997)). The ternary complex consists of a seven (helical trans-membrane receptor (yellow), a heterotrimeric GaPy-protein

(a-pink, p-brown, y-white) and an activating ligand. The nucleotide guanosine di-phosphate (GDP) resides within the Ga-subunit (elemental colouring to the left of the figure). On ligand activation, a series of events are set in train and the existing interpretation is that on signal stimulation GDP is exchanged for the triphosphate, GTP, causing separation of the Ga- and GPy-subunits from the receptor on the cytoplasmic site of the cell. The high energy Ga..GTP and the GPy-subunits can then both activate second messengers within the cell. An alternative interpretation of this mechanism is that a direct phosphorylation of the GDP initially occurs. A proton signalling pathway can exist to a histidine residue holding a metaphosphate group as an acid-labile phosphoramidate on the GP-subunit. Transfer of the metaphosphate group to an arginine residue (blue) at the base of the Ga- a2-helix (green) at the interface between the Ga- and GPy-proteins to form a high energy phosphonoarginine intermediate, allows transport of the phosphate group to the phosphorylation site. The histidine phosphoramidate is shown at the base of the figure in elemental colouring. The primary mechanism for delivering a proton over the 43 A through a set of local Tyr-Arg/Lys-Tyr proton shuttles is seen to reside in the balance between Tyr-Arg/Lys ion pair and neutral complexes under multiple hydrogen bonded conditions within a hydrophobic environment. Under such conditions the isolated neutral Tyr-Arg complex is some 12-14 kcal/mol more stable than the ion pair form but transfer of a proton can occur under the influence of two hydrogen bond proton donor interactions on the phenolic oxygen atom. The figure shows six tyrosine residues (yellow) within the proposed signalling pathway together with their associated bases. A comparable mechanism is likely also to occur with cysteine/base residues under appropriate conditions. The partially stimulating analogue, prenalterol containing the 4-hydroxy-phenoxy moiety is shown at the top of the receptor figure interacting with Asp138 and initiating proton transfer from Tyr377 in the pi-adrenoceptor. At the top of the a2-helix two retaining bonds are broken within the full ternary complex allowing movement of the a2-helix and carrying the metaphosphate group over the last 30 Á to the phosphorylation site.

Plate 3.6(a) Adenosine triphosphate synthase (ATP synthase, F1 F0 synthase) is the central enzyme in energy conversion in mitochondria, chloroplasts and bacteria. and uses a proton gradient across the mebrane to synthesis ATP from the diphosphate, ADP and inorganic phosphate. The multi -subunit assembly consists of a globular domain, F1, and an intrinsic membrane domain, F0, linked by a slender stalk about 45 A long. The F1 domain is an approximate sphere 90-100 A in diameter and contains the catalytic binding sites for the substrates ADP and inorganic phosphate. About three protons flow through the membrane per ATP synthesised but the mechanism of synthesis is not known. The F1 structure is a complex of five different proteins with the stoichiometry 3a:3p:1y:1S:1s. The sequences of the a- and p-subunits are homologous (~20% identical), including the P-loop nucleotide-binding motif. The catalytic sites are in the p-subunits while the function of the a-subunits are obscure. It has been suggested that the structures of the three catalytic sites are always different, but each passes through a cycle of 'open', 'loose' and 'tight' states. In this respect crystals developed with AMP-PNP (where the nitrogen atom defines the analogue of ATP) show occupancy of the nucleotide sites in different states of phosphorylation. The a- and p-subunits are arranged alternatively like the segments of an orange around a central a-helical domain containing both the N-and C- terminals of the y-subunit. As the three p-subunits vary in nucleotide occupancy (ADP, AMP-PNP, and empty) and have different conformations, the structure as found in the crystal (2.8 A resolution) is compatible with one of the states to be expected in the cyclical binding change mechanism (Abrahams, Buchanan, van Raaij et al. (1996)). The figure shows the arrangement of the three a-(A, pink; B, blue; C, green) and P-(D, purple; E, yellow; F, white), around the central F0 stalk (orange). The positions of nucleotides are given in elemental colouring.

Plates 3.6(b) and (c) show the similarity of the binding sites of the nucleotides in the a- and P-subunits. (b) the nucleotide AMP-PNP is between the A, a- and D, P-subunits. All the nucleotide binding sites are in the a-(A, pink) except for those indicated (P-, D (purple)). The magnesium ion assisting the phosphorylation is shown in red between the two terminal phosphate groups, (c) The ADP is bound very predominantly to the, P-(D, purple) subunit. The relations to the a-, (C (green)) subunit are indicated.

Plate 3.7 The influence of strong charge on conformation. The structure of calmodulin with and without the interaction of 4 calcium ions. Calmodulin is the principal calcium-dependent regulator of a variety of intracellular processes. The 148 residue protein has four Ca++ sites and a number of acidic residues. It is a ubiquitous protein in eukaryotes and plays a critical role in coupling transient Ca++ influx, caused by a stimulation at the cell surface, to events in the cytosol. The Ca++ binding sites have the 'EF hand' configuration also identified in other Ca++ binding proteins such as intestinal calcium binding protein and troponin C. The 'EF hand' comprises a helix-loop-helix structure which can be identified from the sequence homology alone. The basic structural unit of the globular domain consists of a pair of EF-hands rather than a single binding site.

Plate 3.7(a) Left. Calcium-bound calmodulin from Drosophila melanogaster (2.2 A resolution—Taylor, Sack, Maune et al. (1991)) has a seven turn a-helix connecting the two calcium-binding domains. The dumbbell shaped molecule contains seven a-helices and four 'EF' calcium-binding sites and closely resembles the mammalian structure. The six-coordination octahedral form of a binding site is shown in Plate 3.7(b) where the Ca++ ion is held by four acidic residues. In each site, the coordination (one shared) comes from five side-chain oxygen atoms, a carboxyl oxygen (not shown) and one water molecule. Plate 3.7(a) Right. The NMR determined calcium-free structure of calmodulin (Kuboniwa, Tjandra, Grzesiek et al. (1995)). Each calmodulin domain consists of a strongly twisted but tightly packed bundle of four helices. Upon binding of Ca++ most of the change occurs within each of the 'EF hands' with inter-helix angle changes. The structural rearrangement on binding Ca++ ion results in a pronounced hydrophobic pocket on the surface of each domain. These pockets appear to be of importance from structure studies on Ca++ bound complexes with different synthetic target peptides. The accuracy of NMR determined structures is highest at the centre of the protein and decreases as one moves towards the surface. The accuracy in the determination of the Ca++ binding loops requires, in principle, further refinement. The conformation of the long central helix in the crystal structure was not previuosly consistent with extensive biochemical data on these proteins. The Ca++ free structure shows increased flexibility and this 'connecting spacer' can be viewed as a flexible tether between the two domains. This is confirmed by by X-ray structures on calmodulin complexed with peptide fragments of its intracellular receptors, e.g. myosin light-chain kinase where the two domains of cadmodulin swing round and envelope the target peptide.

Plate 3.8 Protein- Single strand DNA recognition. A zinc finger domain binding to a single stranded DNA sequence. Interaction of an NMR-determined zinc finger domain in the HIV-1 nucleocapsid protein (South, Blake,

Hare and Summers (1991)). A common feature of proteins containing the 'retroviral-type' (r.t.) zinc finger domain (Cys - X2 - Cys - X4 - His -X4 - Cys) is that they appear to be involved at some stage in sequence-specific single-stranded nucleic-acid binding analogous to the zinc finger motif found widely in duplex-DNA-binding proteins. Zinc finger r.t. domains are found both in the N-terminal and C-terminal chains of the intact HIV-1 nucleocapsid protein isolated from virus particles. The sequences have been shown to bind zinc stoichiometrically and with high affinity. The figure shows an eighteen amino acid HIV1-F1 peptide Ca sequence (Val-Lys-Cys-Phe-Asn-Cys-Gly-Lys-Glu-Gly-His-Ile-Ala-Arg-Asn-Cys-Arg-Ala in pink) bound to a single strand DNA sequence A-C-G-C-C). The tetrahedral coordination of the Zn ion with the three cysteine residues and His11 is shown bonded schematically on the right of the figure. The hydrophobic interactions of the peptide residues (Phe4, Ile12, Ala13,) are shown in green while the strong polar interaction of Arg14 with DNA backbone phosphate groups is seen at the end of the finger.

Plate 3.9 Protein-Double strand DNA recognition.

The selectivity of protein binding in the major and minor grooves of the DNA. The binding of the prokaryotic enzyme Hin recombinase to DNA in the Salmonella chromosome (Feng, Johnson and Dickerson (1994)). This site-specific recombination reaction controls the alternate expression of two flagellin genes by reversibly switching the action of a promoter. During the process of inverting the extended segment of DNA, two Hin proteins in the form of a dimer bind to the the left and right recombination sites located at the boundaries of the invertible DNA segment. Through interaction with a third interacting site (held by an additional protein) the overall complex aligns the two recombination sites correctly and the Hin protein is activated to initiate the exchange of DNA strands leading to inversion of the intervening DNA. The recombination half-site of the double helical sugar-phosphate backbone of the DNA (elemental colouring) linked by the heterocyclic base pairs (blue) is shown occupied by the helix-loop-helix-loop-helix of the Hin protein. The third Hin helix (green) sits in the major groove of the DNA where the residues Arg 178, Thr 175 and Tyr 179 are shown on the lower side of this helix. Helices 1 and 2 (purple) are approximately orthogonal to helix 3. The amino terminal loop (white) at the bottom right of the picture attached to Helix 1 lies in the minor groove with two arginine residues (140 and 142) interacting with the helical backbone of the DNA. The carboxyl terminal chain extending from helix 3 (white) leads again into the minor groove at the upper left of the figure where the portion of the chain interacting with the DNA is shown in pink. The short loops joining helices 1 and 2 (top right) and helices 2 and 3 (middle right) are also indicated in white. Water molecules within the X-ray crystal structure (determination at 2.3 A resolution) are shown with a white cross.

0 0

Post a comment