Literature review

Fragment Hit Binding by Molecular Simulations

By Franck Chevalier



During the last 20 years, a great interest has been shown for fragment in order to answer one of the major issue faced by traditional drug discovery program: by its properties, it allows to reduce the number of molecules tested whatever the method used, series are directly related to fragment and optimization of hit is more intuitive. Fragment based drug discovery (FBDD) by molecular dynamics offers now an awesome tool for fragment hit binding.

How has FBDD evolved?

A brief analysis of articles published within last two decades shows that main techniques used include bioassay (inhibition assay, radiolabeled binding assay…) and biophysical methods (Crystallography, NMR, SPR, MS, ITC…).

Low affinity showed by most fragment for their target limits the range of detection for most methods and are not necessarily the high affinity binder that are detected. Most articles in FBDD present the use of a single method, mainly in vitro bioassay. The rate of new articles is about 150/year. Expertise needed, throughput and cost of main biophysics methods still limit their access.

Which alternative?

Within the same period, we observed an increase of publications on virtual methods. Molecular dynamics simulations have been published in more than tens of thousands articles. Interestingly less than 5% of these publications deal about discovery at a yield of 300 publications/year within the last few years.

When FBDD is performed, about 1 to 2 techniques are used. If more than 2 techniques are used, 50% of publications mention the use of virtual methods. About 30% only used virtual methods and tendency has been increased. The large dataset of structures generated by crystallography empowered the use of docking. The important role played by protein flexibility has been answered with molecular dynamics.

Discovery or design?

At this stage, most work focused on discovery, meaning the detection of molecules able to bind, inhibit or activate the target. Few articles yet deals about linking, merging, growing fragment.

Targeting what?

Favorite targets are kinase, enzymes but more complex system involving ion channel, GPCR are now reported.

Reliability of molecular dynamics

In silico FBDD’s impact on lead development has been limited to date. Inability to describe kinetics and conformational changes upon binding are some of the reasons. In the following reference, a comprehensive computational assay for fragment hit characterization has been performed: 2.1 ms of unbiased all-atom high-throughput molecular dynamics (HTMD) data were obtained and analyzed to show the potential of this technique for FBDD.


Reference: Noelia Ferruz,Matthew J. Harvey,  Jordi Mestres, and Gianni De Fabritiis J. Chem. Inf. Model. 2015, 55, 2200−2205

Franck ChevalierFragment Hit Binding by Molecular Simulations
read more

Chlorobenzene as a new halogen bond probe in molecular dynamics simulations

By Zoe Greenburg

Non-covalent bonds play a crucial role in chemistry, biochemistry, and biology. H-bonds, and hydrophobic bonds at the core of protein folding, and govern reversible interactions between biomolecules and their ligands. In drug discovery, H-bonding and hydrophobic interactions define the affinity and selectivity of small molecules for their biological targets.

With the intent to better understand these interactions and their relevance in the design of small molecule modulators of biological function, several computational solvent-mapping methods have been developed over the past few years. Schrodinger introduced WaterMap, a method that allows understanding the locations and thermodynamics of water molecules that solvate binding sites, and guiding the optimization of small molecule hits; conserved, strongly bound water molecules are costly to displace but can be useful bridges. An alternative method for understanding the role of water in ligand binding, which uses all-atom molecular dynamics (MD), and that can be accessed through this company, was developed by De Fabritiis et al. in 2011. Mapping of other solvents like ACN, propane, benzene, and IPA has also been demonstrated with MD simulations of binary mixtures, and been put in the context of target druggability assessment.

A new category of substrate to study: organohalogens

Halogens are common substituents in drugs as halogen bonding interactions can offer key boosts in selectivity and potency. Initially attributed to van der Waals forces, this is now explained in terms of the halogen bond, another type of non-covalent interaction that in recent years has been exploited for many applications in synthetic, supramolecular and medicinal chemistry.

Anisotropy in halogen electron charge distribution allows halogens to interact with donors such as Lewis base centers and π-surfaces, something that, if one is to consider electron density alone, can be rather counter intuitive. Halogen-donor interactions increase with the polarizability of the halogen’s surrounding electron cloud, and with the withdrawing power of adjacent atoms. These observations are now explained using the sigma hole concept, which defines a region of positive charge at the surface of the halogen, along the covalent bond axis.

Within the complex structure of a protein, there are many atoms to which a halogen-containing substrate can form a halogen bond. Halogen – usually chlorine – interactions with backbone and side chain carbonyls, as well as with side chain alcohols, thiols and thioethers, amines and the surfaces of aromatic rings can affect the conformation of the protein as well as the affinity that other binding sites on the protein have for other substrates, such as water. This is however difficult to predict even though doing so could be highly beneficial in guiding structure-based drug design. Experimental techniques that allow solvent mapping such as the multiple solvent crystal structure (MCSC), can be effective but are time intensive and expensive to run, and lack of an ideal strategy and/or representative force fields have limited computational approaches.

Computational halogen bond mapping with molecular dynamics simulations

Recently, Verma and collaborators proposed a molecular dynamics based method for halogen bond mapping. This study aimed to observe the effect of a chosen probe, in this case chlorobenzene, on the binding sites of four different proteins using molecular dynamics. Chlorobenzene is potentially a bifunctional substrate; the sp2 bonded chlorine can probe for halogen bonds while the aromatic ring can probe for π-stacking, or hydrophobic sites. For proof-of-concept, MDM2(Pdb:1YCR), MCL-1(Pdb:3kJ0), interleukin-2(Pdb:1Z92), and Bcl-xL(Pdb:1BXL) were selected because of their involvement in protein-protein interactions, the availability of structural data, and the well reported structural changes they endure upon substrate binding, which can present unexposed halogen bonding sites. All-atom MD simulations were selected for the development of protein-small molecule interaction maps because other computational methods considered could not capture the protein’s dynamics sufficiently, and Lexa and Carlson showed this to be crucial for effective solvent interaction mapping.

For chlorobenzene preparation the authors used GAFF. Atomic charges were derived using the R.E.D. Server by fitting restrained electrostatic potential charges to a molecular electrostatic potential computed with Gaussian. One peculiarity of chlorobenzene when compared to benzene, a molecule the authors also used for mapping in a previous study, was found to be the propensity of the halogenated compound to aggregate at lower concentrations, which forced to reduce the substrate concentration by 25% to 0.15M, and to increase the sampling time during the simulation experiments to 10ns per trajectory.

Using molecular dynamics simulations to construct halogen affinity maps

For MDM2, all of the six halogen bonding sites found in the Pdb were recovered by the proposed method. Three of these, known to be occupied by a halogenated benzodiazepine inhibitor, and two others, occupied by Nutlin-2, were recovered on first instance by the mapping protocol. The sixth was recovered after modification of the Pdb structure, as the initial structure lacked an important sequence of residues at the N- terminus. In addition, four cryptic hydrophobic binding sites were detected that were not present in the initial input conformation of the protein. The existence of these sites could also be validated by matching with results available in the literature. For the other three targets the authors report similar results.

All-atom MD simulation with chlorobenzene can be used to map halogen and hydrophobic binding sites

Using molecular dynamics simulations, and the chlorobenzene parameters shown above, the authors could map halogen and hydrophobic binding sites on all four tested proteins. The results were consistent with available experimental data, and show the potential of this method in structure based drug design. As we have come to see in our own work at Acellera probing libraries of fragments using all-atom MD simulations, the choice of force field is difficult for halogenated molecules. The ability of the force fields used in this paper to reproduce the binding sites was remarkable, and could potentially be harnessed for a broader use in computer aided drug design.

gianniChlorobenzene as a new halogen bond probe in molecular dynamics simulations
read more

Kinetic modulation of a disordered protein domain by phosphorylation

by Santi Esteban

Protein phosphorylation plays key roles in many signal transduction processes, preferentially targeting intrinsically disordered protein domains (IDPs). IDPs remain largely unstructured under native conditions, resembling random-coil polymers akin to the unfolded states of proteins. Present in more than 50% of eukaryotic proteins, IDPs perform a plethora of biological functions and are commonly associated with a variety of human diseases.

Technical limitations hamper intrinsically disordered protein characterization

The mechanism by which phosphorylation modulates disordered domains remains unknown. This stems from the extraordinary structural heterogeneity of IDPs, which poses a challenge for their characterization using common structural determination experimental techniques. For instance, the study of IDPs is inviable with X-ray crystallography because of the fast mobility of the domains, and NMR does not allow singling out the structural details of the multiple, underlying conformational states.

Can intrinsically disordered proteins be characterized using molecular dynamics simulations?

Recent advances in computer simulation technology offer a unique opportunity to study IDPs with atomic resolution on the millisecond timescale, the regime necessary to study these systems. The introduction of accelerator processors, and their application to MD, as well as the development of HTMD, and adaptive sampling methods, have allowed an increase of sampling efficiency of several orders of magnitude. Indeed, a single workstation can now obtain microsecond long simulations with atomic resolution in one day, and milliseconds can be obtained rapidly on a supercomputer, or through distributed computing projects. Nevertheless, this bourgeoning field is still limited in its ability to analyse millions of protein conformers, and compute their population and kinetics.

All-atom MD for intrinsically disordered protein characterization

In this work (Nat. Comm. 2014, DOI: 10.1038/ncomms6272), we successfully reconstructed the conformational ensemble of an IDP domain involved in gene transcription. A new analysis tool capable of resolving the disordered state of disordered domains was used in combination with large scale all-atom, unbiased, molecular simulations. 1.7 milliseconds of aggregated time were sampled. This methodology was used to study a well-characterized disordered fragment of the protein kinase-inducible domain (KID) in the cellular transcription factor of the cAMP response element-binding protein (CREB).

Identification of slow molecular order parameters for Markov model construction

The kinetic characterization of macromolecular systems requires the description of slow relaxation processes. This depends on the identification of the structural changes involved in these processes (selection of structural descriptors), the discretization of the high-dimensional coordinate space (clustering of conformers according to the descriptors) and estimation of the rates or timescales at which these slow processes occur. Approaches to this end include Markov models, master-equation models, and kinetic network models.

In this work we analyzed the simulated data by constructing a Markov state model of the entire ensemble of trajectories using inter-residue Cα-Cα distances and φ/ψ backbone dihedral angles as general metrics to build the kinetic model. The conformational space was discretized into one thousand clusters and then projected on a 5-dimensional space using time-lagged independent component analysis (TICA), which identifies the slow coordinates in the dynamics without relying on subjective guesses.

Phosphorylation reduces the conformational kinetics of intrinsically disordered proteins

The results presented in Figure 1 show the first microscopic view of the equilibrium states and conformational kinetics of a disordered protein domain at atomic resolution. The conformational exchange between state D (disordered) and state O (ordered) is shown together with forward (τ1) and backward (τ-1) transition times. Conformations were aligned using residues with the lowest Cα-Cα distance variance. N- to C-terminal is color coded red and blue.

We find that phosphorylation induces a 60-fold slowdown in conformational kinetics, compared to a non-phosphorylated domain, which involves a low populated and partially structured excited state known to participate in an early binding intermediate (Fig. 2). Figure 2a shows residue specific relaxation times of helix folding/unfolding process (filled bars) derived from autocorrelation function. Empty bars show chemical shifts changes in pKID that map residues participating in an early binding intermediate as detected by NMR. Figure 2b is an example of a kinetically locked conformation in phosphorylated KID. The peptide backbone is shown in cartoon representation, and the heavy atoms of phosphorylated serine 133 and several charged residues are shown as sticks.

Direct comparison against NMR data supports the validity of these findings and show that, contrary to what is typically assumed and in line with experiments, mutation of the phosphoserine to glutamate cannot recover the slowdown in conformational exchange (Fig. 1c).

Is the regulation of phosphorylation via the kinetic modulation of intrinsically disordered proteins conserved?

We propose a new mechanism of phosphorylation regulation by kinetic modulation which could be general for disordered domains which lack a well-defined equilibrium structure. This mechanism emerges from a theoretically derived kinetic model that shows that it is possible to modulate the binding affinity of protein-protein complexes by inducing a slowdown in conformational kinetics affecting both on- and off-conformational transition times by the same amount and simultaneously. This model agrees with kinetic and affinity values obtained through experiment (See paper).

With disordered domains taking part in more than 50% of proteins responsible for signaling in the cell, this kinetic mechanism of modulation highlights a further way by which post-translation modifications may affect disordered domains and their interactions with binding partners.

This work was performed in the De Fabritiis lab. Gianni De Fabritiis is the founder of Acellera. Should you be interested in performing similar studies in-house let Acellera know. We will be happy to inform you.

gianniKinetic modulation of a disordered protein domain by phosphorylation
read more

Automatic Force Field Generation

by Zoe Greenburg

The accuracy of molecular dynamics (MD) simulations depends upon the quality of the force field used to parameterize the model. The development of these force fields is challenging to automate, due to the highly individual nature of each molecule and its environment that require uniquely programmed starting conditions.

Wang and co-workers recently addressed this issue using ForceBalance, in a JPCL paper titled ”Building Force Fields: An Automatic, Systematic, and Reproducible Approach”. Dr. Vijay Pande, the corresponding author of the paper and faculty at Stanford University explains, “Accurate force fields are critical for the success of molecular dynamics simulations, but yet due to the great challenges in building force fields, they have typically been made ‘by hand’, requiring many years of effort, and are not easily reproducible or produced in a systematic manner.” ForceBalance aims to create systematic and reproducible force fields by using a combination of experimental and theoretical data.

ForceBalance provides a hybrid approach that combines ab initio and experimental data

In order to find the parameters that allow a force field to accurately simulate a system, these parameters must be optimized to represent the physical properties of a system that are experimentally known, as well as what is known about the system from the theoretical calculations. Finally, the algorithm used to combine these two factors differs across force fields.

ForceBalance, an application developed in 2012 by Wang, which is available through, allows the researcher to improve the accuracy of the force field by basing the parameters on the data available, whether it be experimental, theoretical, or a combination of both.

Dr. David Case, from Rutgers University says “As I see it, the key new idea here is to use gradients of liquid state average properties to help drive the optimization procedure. This allows one to significantly expand that set of target properties that can be fit in an automated fashion. It doesn’t relieve one from having to choose and weight the targets that one hopes to fit.”

This approach greatly streamlines the process of force field creation, although the method is completely automatic. It still leaves the researchers room to manually alter aspects as they see fit.

ForceBalance maximizes the efficiency of force field development

Thermodynamic fluctuation formulas are employed to reduce the necessity of many long simulations. The simulations are only run for long periods of time when high precision is required, such as at the end of the optimization.

ForceBalance also combines several open source technologies that are available on the internet. OpenMM 6.0, the Work Queue Library, MBAR all contribute to a fast and accurate generation of force field parameters.

The TIP3P and TIP4P models were significantly improved with Force Balance

The TIP3P and TIP4P models are the most commonly used water molecules for the Amber and CHARMM force fields. TIP3P is used for many simulations that deal with biologically relevant phenomena. Thus, their further improvement is an exciting development for the field.

The original parameters developed for TIP4P were unable to reproduce the dielectric constant for water in simulations. However, once ForceBalance calculations were applied to the models, with starting points despite varying starting points, all converged to the same point. The data set included a wide range of temperatures and pressures for six different liquid properties. The newly optimized parameter set, called TIP4P-FB reproduces the dielectric constant of water reliably across wide ranges of temperature and pressure.

The same optimizations were made to the TIP3P model, as 3-point models of water are more widely used in research. The optimized parameters(TIP3P-FB) were similarly accurate to TIP4P-FB, except for slight deviations from the experimental data for the density of water at low temperatures, outside the range that is common in biomedically relevant simulations.

ForceBalance showcases that efficient, reproducible, and automatic force field generation is possible. The research team hopes to expand this method to more biologically relevant models.

Looking towards the future

This type of technology has many applications. Dr. Pande predicts that “By completely automating the parameterization of force fields, force field design can now be reproducible … and built systematically (we can build many force fields in a consistent manner, such that they would easily work with each other). It is our vision that there is a coming Renaissance in force field design, where there will be a considerable increase in force field accuracy due to this new approach”

gianniAutomatic Force Field Generation
read more

Atomistic MD Simulations and Experiment Provide Basis for Protein Motility in Prestin

Post by Mattia Sturlese, PhD.

Recently, we derived the structure of the prestin transmemebrane domain by combining bioinformatics, homology modeling, and MD simulations with extensive experimental work. Prestin (SLC26A5), a member of the SLC26/SulP anion transporter family localized in the outer plasma membrane of the outer hair cells (OHCs), is a motor protein essential for auditory processing. Motor proteins are a fascinating class of proteins able to transform molecular events into movements.

Prestin, a highly responsive piezoelectric transducer of unclear mechanism of action

Classical molecular motors are driven by ATP hydrolysis while prestin acts through a chlorine dependent intrinsic voltage sensor (IVS). In prestin, intracellular binding of a chloride ion induces a very large structural rearrangement – determined by the membrane potential – that endows this motor protein with a motility-related charge movement or nonlinear capacitance (NLC). One of the most attractive peculiarities of this mechanism relies on time scale in which this molecular event occurs, as it is remarkably faster than that of other cellular motor proteins. In a nutshell, prestin is a piezoelectric transducer able to convert membrane potential into macromolecular movement extremely efficiently. In fact, the piezoelectric coefficient in OHCs (20 fC/nN) is four orders of magnitude greater than found in man-made crystals used in electronic devices or manipulators. Despite its established physiological role as a key element in the high acuity of mammal hearing through cochlear amplification, it is essentially unknown how prestin can generate mechanical force. To date there are no known structures for the SLC26/SulP superfamily of anion transporters.

Computational modeling towards a predicted 3D prestin structure

Sequence-based bioinformatic analyses indicated the NCS2 and SulP transported families as possible candidates for homology modeling. Using hidden Markov models (HMMs) and their profile analysis we identified the uracil transporter UraA (PDB: 3QE7) as the best and most meaningful template for homology modeling. In order to validate the biophysics of this model we collected 150ns of atomistic molecular dynamics simulations using ACEMD. The MD simulation data showed that although there were defined local differences between the two models the two conserved the global fold and topology of UraA at equilibrium, thus corroborating the validity of our model. Thus, the prestin transmembrane domain appeared to be composed of 14 transmembrane segments organized into a 7+7 inverted repeat fold. Appropriate 3D structures to be used in further analysis were obtained by cluster selection.

Experimental validation of the model and understanding of the molecular basis for prestin’s activity

We continued validating the topology of the homology model using a substituted Cys accessibility method (SCAM).The SCAM results showed a notable agreement with water-protein contacts as derived from MD simulations. Importantly, the pattern of accessibility is in full agreement with the predicted topology (Fig 1), corroborating the model and UraA as an appropriate template for modelling SLC26. Solute accessible pathways were identified by analyzing the internal solvated areas and calculating the residence probability of explicit water molecules during the last 50 ns of the MD simulation. The central cavity is readily accessible from the intracellular space but occluded from the extracellular space by a constriction that precludes permeation from the exterior to the central cavity. A SCAM scan along the entire TM10 provided experimental confirmation of the model results; amino-acid positions predicted to contribute to the central cavity and its intracellular entrance were accessible to intracellular, but not extracellular MTS reagents. The accuracy of the model and the information from the MD simulations has also allowed identification of a candidate central binding site validated experimentally though mutagenesis.

Mattia Sturlese

Reference and further reading:

*Gorbunov, D., *Sturlese, M., Nies, F., Kluge, M., Bellanda, M., Battistutta, R., and ^Oliver, D. (2014). Molecular architecture and the structural basis for anion interaction in prestin and SLC26 transporters. Nat Commun 5.

Dallos, P., and Fakler, B. (2002). Prestin, a new type of motor protein. Nat Rev Mol Cell Biol 3, 104–111.

Oliver, D., He, D.Z.Z., Klöcker, N., Ludwig, J., Schulte, U., Waldegger, S., Ruppersberg, J.P., Dallos, P., and Fakler, B. (2001). Intracellular Anions as the Voltage Sensor of Prestin, the Outer Hair Cell Motor Protein. Science 292, 2340 –2343.

Zheng, J., Shen, W., He, D.Z.Z., Long, K.B., Madison, L.D., and Dallos, P. (2000). Prestin is the motor protein of cochlear outer hair cells. Nature 405, 149–155.

gianniAtomistic MD Simulations and Experiment Provide Basis for Protein Motility in Prestin
read more