A few of weeks ago we were invited to present at GTCBio Drug Design and Medicinal Chemistry 2014 Conference. The conference was very interesting and very well organized, and we will make sure to keep it our agenda for future years. The meeting also happened to be in Berlin, which instantly became one of my favorite cities. Our talk was titled “Machine Learning in FBLD” and discussed our new adaptive sampling method for probing protein ligand interactions using atomistic molecular dynamics simulations that was developed in collaboration with the De Fabritiis lab. So how did we get there?
High-throughput molecular dynamics can be used for profiling protein-ligand interactions
In 2011 we reported our seminal work towards a practical approach for profiling protein ligand interactions in silico with all atom resolution. In this proof-of-principle study we showed that with enough sampling we could use atomistic high-throughput molecular dynamics simulations to reconstruct the binding of benzamidine to trypsin, and not only obtain accurate energies, kinetics, poses but also resolve a ligand binding pathway. We were also able to predict several metastable states later observed experimentally.
Application of molecular dynamics simulations in fragment based lead discovery (FBLD)
We next focused on expanding the library size, and on adapting the methodology to fragment based ligand design. Benzamidine is a small molecule of micromolar affinity for trypsin, and therefore extending this method to FBLD was a natural progression. For the fragment study we used a 2003 STD NMR study that targeted Factor Xa, a target related to trypsin, and a set of forty fragments. Once more our MD experiments were able to recapitulate the binding of the library to the target, and also give accurate representations of binding energies, kinetics, poses and binding pathways. Importantly, we were also able to rank the compounds in order of increasing binding affinity accurately. As exciting as the results were, we needed to collect 2milliseconds of aggregate biological timescale trajectory data, a challenge we could only meet because of our access to GPUGrid through our collaboration with the De Fabritiis group.
Development of an adaptive sampling protocol for MD
One limitation of our HTMD based in-silico binding method at that time involved the amount of sampling needed for system convergence, which was in the order of 50×10^-6 s per compound for Factor Xa. Clearly, in order to make this high-throughput molecular dynamics methodology a practical complement to current experimental FBLD screens, we needed to improve the efficiency of our simulation methods. To this end, we developed a fully automated adaptive sampling method that can deliver system convergence about an order of magnitude faster than the brute-force HTMD approach.
As opposed to HTMD, where we run hundreds of simulations at once, our adaptive sampling protocol begins with only ten trajectories each starting from random initial states and each a few tens of nanoseconds long. We then analyze these trajectories using Markov State modelling, and use residence time information obtained from clustering to select the starting conformations of future runs. This run-analyze-respawn cycle is defined as an epoch, and each adaptive experiment is composed of ten epochs. Note that all of this is automated and that we only need to setup the initial trajectories before working up the results after the last epoch. For each protein-ligand system we run ten replicate adaptive experiments.
For our proof-of-concept we tested this adaptive sampling method on trypsin-benzamidine, a system for we which we have a very large amount of data and that over the years has become our benchmark. As opposed to regular HTMD where we needed to collect about 50×10^-6 s of data for convergence, only 5×10^-6 s were needed in our adaptive experiment. In each experiment, learning was evident after the 3rd epoch, and the energy and pose matched those of our control HTMD experiment after the 5th epoch, in about 80% of the experiments.
We are very excited about these results as in principle — and assuming there were no other bottlenecks — we should be able to screen 400 compounds in the same amount of time previously spent with 40. This would make this technique fast enough for medicinal chemists to start looking at it as a practical and reliable complement to current FBLD screens. We hope to be there soon but we still need to test the universality of the method thoroughly as well as work out other kinks such as the issue of ligand parametrization.