Acellera's Blog

Acellera HTMD: A complete software workspace for simulation-guided drug design

Gianni De Fabritiis’ talk at Boston ACS on 18th August, at 8.30AM:

HTMD: A complete software workspace for simulation-guided drug design

Abstract: Performing computational experiments using molecular dynamics simulations is still too difficult and the recent capability of running thousands of simulations has further exacerbated the problem. HTMD is a Matlab-like programmable workspace that provides the power of system preparation, molecular dynamics, Markov state model analysis and visualization at once. As a result, a single short script can lead from the PDB structure to useful quantities such as relaxation timescales, equilibrium populations, conformations and kinetic rates. This facilitates scientists with minimal background to easily integrate MD into their discovery workflow, drastically reduce errors and improves reproducibility.

COMP: Division of Computers in Chemistry
Room 156A – Boston Convention & Exhibition Center
Publication Number: 153

Franck ChevalierAcellera HTMD: A complete software workspace for simulation-guided drug design
read more

The Road towards GPU Molecular Dynamics: Part 2

by Matt Harvey

In Part 1 of this series, I recounted how we came to be developing molecular dynamics simulation code for the IBM Cell processor. Back in 2006, the Cell seemed poised to make a profound impact on the high performance field – even becoming the processor around which the first petascale system, LANL‘s “Roadrunner”, was built. It is now barely remembered. Only those well-versed in IBM Kremlinology can say why it was effectively killed off, but quite likely its unpopularity amongst programmers forced to deal with its sheer complexity played no small part in the decision.

Lessons learned while working with IBM’s Cell Processor

The Cell was indeed complex. But its crime was not the complexity per se, but rather that so much of it was exposed directly to the programmer. The absence of any tool-kit providing a clean high-level abstraction of the device meant programmers had to deal with the hardware head-on. Nevertheless, our experience of developing for the Cell this way led us to several important conclusions:

1) Future highly parallel processors would have several levels of parallelism (instruction, vector, thread and program), all of which matter (and interact) when aiming for optimum performance.
2) Partitioned address spaces and different classes of memory would make explicit data movement essential, and with it the need to hide latency by overlapping data movement with compute.
3) Heterogeneous designs that combine different processor types each with complementary capabilities in a single device would be common.

These reinforced our view that developing high performance programs for these future processors would require much more than recompilation of existing applications with a special compiler. Their architectures would be so far away from the dominant single-core CPU + serial/threaded program model that major algorithmic changes would be required, necessitating substantial redesign/refactoring of code. Put in the context of scientific computing, where codes may live for decades, it’s clear that if you force change with every new hardware revision programmers won’t target your hardware.

The implication of this was that for the programmer to have any hope of developing new code that is both optimised and portable to other/future hardware, programming with the right high-level abstraction of the hardware would be critical. In other words, having a high quality set of software development tools would be more important than ever.

So, what happened next?

In 2007, NVIDIA, a company known for its PC graphics cards, released its new model, called the G80. Few people in the HPC space knew much about NVIDIA then – its products were mostly sold to gamers and architects – and so the implications of developments in the 3D graphics field had gone unremarked by most of the scientific computing world. The arrival of the G80 was, in retrospect, a rare moment of revolutionary change when an outsider enters a new field and really shakes it up. So what was so revolutionary about the G80? To understand, we need to take a trip down memory lane.

GeForce 8600GT First GPU used for MD

A Potted History of PC Graphics

In the early 90’s a PC graphics adapter was dumb hardware – all it really did was squirt out the contents of its memory (“frame buffer”) down a cable to a monitor. The host CPU was completely responsible for writing the data representing the color of each display pixel (or character) into the frame buffer.

Having the CPU do all the work was very inefficient, not least because of the slow bus connection, so it became increasing common for graphics adapters to have some degree of acceleration for simple, common operations: for example to move a block of pixels from one location to another (“blitting”, useful for moving or scrolling windows, for example), or to fill whole area with color.

When games with 3D graphics first started to become popular (if you worked in a computer lab in the 00s you surely played Quake death matches!) the graphics scene, typically expressed as a triangulated surface, was constructed and rendered by the CPU. Because the demand for better games graphics outmatched what improvements in CPUs and busses could provide (and because the dollar size of the games market was growing dramatically), many of these geometric primitive operations became implemented in hardware. The GPU was born.

The first GPUs bore all the hallmarks of a device engineered for a specific task – the hardware was a set of relatively simple fixed-function block all pipelined together, with each unit performing offload of one fixed aspect of the rendering and display process. Two of the more important functions, of this pipeline, texturing and lighting ( the process of making a surface look more realistic by covering it with an appropriate image and illuminating it) are quite demanding of hardware, so GPUs rapidly began to acquire their own large, very high bandwidth memories as well as limited floating point capabilities.

By the early 2000’s, GPUs were starting to boast performance figures high enough to cause some people in scientific computing, myself included, to start to wonder if they might be useful for other things beyond making pretty pictures. Superficially the GPUs of the day looked highly appealing – they had lots of memory bandwidth, reasonable floating point capability and low cost. Unfortunately, that’s where the good news stopped!

GPUs were still very much fixed-function devices and, furthermore, were only programmable through graphics API languages (NVIDIA’s Cg and OpenGL’s GLSL). To re-purpose them you had to have an algorithm that could map directly onto the structure imposed by the programming abstraction, in effect making the computation look like a special case of shading, with the input as a texture image and the output of the screen frame buffer. Not much fun!

To compound matters, the bus that connected the GPU to the host computer (called AGP) was horrendously slow – anytime you saved by going the compute on the GPU would be lost in the ponderous copy of the results back to the host.

When we were looking for novel processors to develop for we considered, but ultimately dismissed, the contemporary GPUs because of these short-comings, although they are – in essence – more extreme versions of the problems of parallelism, data locality and architectural abstraction that we encountered with Cell.

NVIDIA brings GPUs to HPC

Returning to the G80, NVIDIA had clearly seen the short-comings of their fixed-function hardware, and realised that a more general-purpose architecture would give them a march on their competition by giving the programmer greater flexibility. The G80 was designed around a fully programmable core and, although fixed-function hardware remained, much of the functionality was now performed in software.

Interesting though this was, what made it really significant in the world beyond 3D was that NVIDIA coupled the G80 release with a new programming language, called CUDA, which provided an expressive programming model that embodied an elegant hardware abstraction.

NVIDIA had clearly set its sights on expanding into the HPC space, and it has done so very successfully – from having no presence at all in 2007, by November 2010, it had acquired the number one spot on the Top 500 list of supercomputer, with the 2.5 petaflop Tianhe 1A system.

In the next post, we’ll dive into the architecture of the modern Nvidia GPU.

gianniThe Road towards GPU Molecular Dynamics: Part 2
read more

AceCloud – Simplified, Limitless Molecular Dynamics Simulations on the Cloud

by Matt Harvey, CTO

Physics-based computer simulation offers a tremendously powerful way to gain insight into the behavior of a wide variety of complex systems. At its most successful, simulation has become a ‘third way’ for scientists and engineers, complementing analytical and experimental methods. In engineering in particular, computational fluid dynamics simulation and finite element analysis are now an integral part of any design effort.

Similarly, we believe that physics-based simulation, in the form of molecular dynamics simulation, has great potential to become one of the basic tools in biochemical, and biophysical R&D, and establish itself as a robust tool in the drug discovery pipeline. Recent work, for example, has demonstrated the ability of MD simulation to produce quantitative estimates of small molecule binding kinetics (PNAS) and of conformational selection in selection in proteins (Nat. Comm., and blog).

Why perform molecular dynamics simulations on the cloud?

Currently, the amount of MD simulation required to undertake these types of studies is often beyond the reach of anyone without access to dedicated high-performance computing resources. To help overcome this critical limiting factor and bring MD simulation to a wider audience, we are pleased to introduce our new product AceCloud.

AceCloud is designed to free you from the constraints of your workstation and – though the use of cloud computing technology – allow you to run hundreds of simulations with the same ease as running one without the need of any additional setup.


Performing cloud molecular dynamics simulations with AceCloud

Accessing AceCloud is a simple matter of using three new commands built into the ACEMD molecular dynamics software package: for running simulations, retrieving results, and monitoring progress. No additional knowledge is required – our software takes care of all of the interaction with the Cloud. Here’s a video of AceCloud in action:


No changes are required to your existing ACEMD simulations, and all features are supported, including extensions and plugins such as PLUMED Metadynamics. As a bonus, users who are already familiar with Gromacs may run their existing simulation inputs on AceCloud, without the need to make any conversions.

The compute resources behind AceCloud are GPU-based and allow simulation rates over 100ns/day for the DHFR benchmark, making them around 40% as fast as the very latest GTX980s in our Metrocubo workstation, but still offering a compelling, cost-effective, level of performance.

AceCloud Costs

AceCloud uses Amazon Web Services (AWS) technology to dynamically provision compute hardware as and when you require it. The only charges for AceCloud are for the time used, from less than 0.40 US$ per hour. There are no minimum or on-going fees. There is no setup required and you can start using it at your convenience in just a few steps. Billing is hassle free and it is done through your own existing Amazon account.

You can calculate representative costs using our AceCloud Cost Estimator in the AceCloud page (this estimate already includes data transfer costs).

Start using AceCloud Today!

AceCloud is available right now. Visit our website for details on getting started.

mattAceCloud – Simplified, Limitless Molecular Dynamics Simulations on the Cloud
read more

Chlorobenzene as a new halogen bond probe in molecular dynamics simulations

By Zoe Greenburg

Non-covalent bonds play a crucial role in chemistry, biochemistry, and biology. H-bonds, and hydrophobic bonds at the core of protein folding, and govern reversible interactions between biomolecules and their ligands. In drug discovery, H-bonding and hydrophobic interactions define the affinity and selectivity of small molecules for their biological targets.

With the intent to better understand these interactions and their relevance in the design of small molecule modulators of biological function, several computational solvent-mapping methods have been developed over the past few years. Schrodinger introduced WaterMap, a method that allows understanding the locations and thermodynamics of water molecules that solvate binding sites, and guiding the optimization of small molecule hits; conserved, strongly bound water molecules are costly to displace but can be useful bridges. An alternative method for understanding the role of water in ligand binding, which uses all-atom molecular dynamics (MD), and that can be accessed through this company, was developed by De Fabritiis et al. in 2011. Mapping of other solvents like ACN, propane, benzene, and IPA has also been demonstrated with MD simulations of binary mixtures, and been put in the context of target druggability assessment.

A new category of substrate to study: organohalogens

Halogens are common substituents in drugs as halogen bonding interactions can offer key boosts in selectivity and potency. Initially attributed to van der Waals forces, this is now explained in terms of the halogen bond, another type of non-covalent interaction that in recent years has been exploited for many applications in synthetic, supramolecular and medicinal chemistry.

Anisotropy in halogen electron charge distribution allows halogens to interact with donors such as Lewis base centers and π-surfaces, something that, if one is to consider electron density alone, can be rather counter intuitive. Halogen-donor interactions increase with the polarizability of the halogen’s surrounding electron cloud, and with the withdrawing power of adjacent atoms. These observations are now explained using the sigma hole concept, which defines a region of positive charge at the surface of the halogen, along the covalent bond axis.

Within the complex structure of a protein, there are many atoms to which a halogen-containing substrate can form a halogen bond. Halogen – usually chlorine – interactions with backbone and side chain carbonyls, as well as with side chain alcohols, thiols and thioethers, amines and the surfaces of aromatic rings can affect the conformation of the protein as well as the affinity that other binding sites on the protein have for other substrates, such as water. This is however difficult to predict even though doing so could be highly beneficial in guiding structure-based drug design. Experimental techniques that allow solvent mapping such as the multiple solvent crystal structure (MCSC), can be effective but are time intensive and expensive to run, and lack of an ideal strategy and/or representative force fields have limited computational approaches.

Computational halogen bond mapping with molecular dynamics simulations

Recently, Verma and collaborators proposed a molecular dynamics based method for halogen bond mapping. This study aimed to observe the effect of a chosen probe, in this case chlorobenzene, on the binding sites of four different proteins using molecular dynamics. Chlorobenzene is potentially a bifunctional substrate; the sp2 bonded chlorine can probe for halogen bonds while the aromatic ring can probe for π-stacking, or hydrophobic sites. For proof-of-concept, MDM2(Pdb:1YCR), MCL-1(Pdb:3kJ0), interleukin-2(Pdb:1Z92), and Bcl-xL(Pdb:1BXL) were selected because of their involvement in protein-protein interactions, the availability of structural data, and the well reported structural changes they endure upon substrate binding, which can present unexposed halogen bonding sites. All-atom MD simulations were selected for the development of protein-small molecule interaction maps because other computational methods considered could not capture the protein’s dynamics sufficiently, and Lexa and Carlson showed this to be crucial for effective solvent interaction mapping.

For chlorobenzene preparation the authors used GAFF. Atomic charges were derived using the R.E.D. Server by fitting restrained electrostatic potential charges to a molecular electrostatic potential computed with Gaussian. One peculiarity of chlorobenzene when compared to benzene, a molecule the authors also used for mapping in a previous study, was found to be the propensity of the halogenated compound to aggregate at lower concentrations, which forced to reduce the substrate concentration by 25% to 0.15M, and to increase the sampling time during the simulation experiments to 10ns per trajectory.

Using molecular dynamics simulations to construct halogen affinity maps

For MDM2, all of the six halogen bonding sites found in the Pdb were recovered by the proposed method. Three of these, known to be occupied by a halogenated benzodiazepine inhibitor, and two others, occupied by Nutlin-2, were recovered on first instance by the mapping protocol. The sixth was recovered after modification of the Pdb structure, as the initial structure lacked an important sequence of residues at the N- terminus. In addition, four cryptic hydrophobic binding sites were detected that were not present in the initial input conformation of the protein. The existence of these sites could also be validated by matching with results available in the literature. For the other three targets the authors report similar results.

All-atom MD simulation with chlorobenzene can be used to map halogen and hydrophobic binding sites

Using molecular dynamics simulations, and the chlorobenzene parameters shown above, the authors could map halogen and hydrophobic binding sites on all four tested proteins. The results were consistent with available experimental data, and show the potential of this method in structure based drug design. As we have come to see in our own work at Acellera probing libraries of fragments using all-atom MD simulations, the choice of force field is difficult for halogenated molecules. The ability of the force fields used in this paper to reproduce the binding sites was remarkable, and could potentially be harnessed for a broader use in computer aided drug design.

gianniChlorobenzene as a new halogen bond probe in molecular dynamics simulations
read more

Kinetic modulation of a disordered protein domain by phosphorylation

by Santi Esteban

Protein phosphorylation plays key roles in many signal transduction processes, preferentially targeting intrinsically disordered protein domains (IDPs). IDPs remain largely unstructured under native conditions, resembling random-coil polymers akin to the unfolded states of proteins. Present in more than 50% of eukaryotic proteins, IDPs perform a plethora of biological functions and are commonly associated with a variety of human diseases.

Technical limitations hamper intrinsically disordered protein characterization

The mechanism by which phosphorylation modulates disordered domains remains unknown. This stems from the extraordinary structural heterogeneity of IDPs, which poses a challenge for their characterization using common structural determination experimental techniques. For instance, the study of IDPs is inviable with X-ray crystallography because of the fast mobility of the domains, and NMR does not allow singling out the structural details of the multiple, underlying conformational states.

Can intrinsically disordered proteins be characterized using molecular dynamics simulations?

Recent advances in computer simulation technology offer a unique opportunity to study IDPs with atomic resolution on the millisecond timescale, the regime necessary to study these systems. The introduction of accelerator processors, and their application to MD, as well as the development of HTMD, and adaptive sampling methods, have allowed an increase of sampling efficiency of several orders of magnitude. Indeed, a single workstation can now obtain microsecond long simulations with atomic resolution in one day, and milliseconds can be obtained rapidly on a supercomputer, or through distributed computing projects. Nevertheless, this bourgeoning field is still limited in its ability to analyse millions of protein conformers, and compute their population and kinetics.

All-atom MD for intrinsically disordered protein characterization

In this work (Nat. Comm. 2014, DOI: 10.1038/ncomms6272), we successfully reconstructed the conformational ensemble of an IDP domain involved in gene transcription. A new analysis tool capable of resolving the disordered state of disordered domains was used in combination with large scale all-atom, unbiased, molecular simulations. 1.7 milliseconds of aggregated time were sampled. This methodology was used to study a well-characterized disordered fragment of the protein kinase-inducible domain (KID) in the cellular transcription factor of the cAMP response element-binding protein (CREB).

Identification of slow molecular order parameters for Markov model construction

The kinetic characterization of macromolecular systems requires the description of slow relaxation processes. This depends on the identification of the structural changes involved in these processes (selection of structural descriptors), the discretization of the high-dimensional coordinate space (clustering of conformers according to the descriptors) and estimation of the rates or timescales at which these slow processes occur. Approaches to this end include Markov models, master-equation models, and kinetic network models.

In this work we analyzed the simulated data by constructing a Markov state model of the entire ensemble of trajectories using inter-residue Cα-Cα distances and φ/ψ backbone dihedral angles as general metrics to build the kinetic model. The conformational space was discretized into one thousand clusters and then projected on a 5-dimensional space using time-lagged independent component analysis (TICA), which identifies the slow coordinates in the dynamics without relying on subjective guesses.

Phosphorylation reduces the conformational kinetics of intrinsically disordered proteins

The results presented in Figure 1 show the first microscopic view of the equilibrium states and conformational kinetics of a disordered protein domain at atomic resolution. The conformational exchange between state D (disordered) and state O (ordered) is shown together with forward (τ1) and backward (τ-1) transition times. Conformations were aligned using residues with the lowest Cα-Cα distance variance. N- to C-terminal is color coded red and blue.

We find that phosphorylation induces a 60-fold slowdown in conformational kinetics, compared to a non-phosphorylated domain, which involves a low populated and partially structured excited state known to participate in an early binding intermediate (Fig. 2). Figure 2a shows residue specific relaxation times of helix folding/unfolding process (filled bars) derived from autocorrelation function. Empty bars show chemical shifts changes in pKID that map residues participating in an early binding intermediate as detected by NMR. Figure 2b is an example of a kinetically locked conformation in phosphorylated KID. The peptide backbone is shown in cartoon representation, and the heavy atoms of phosphorylated serine 133 and several charged residues are shown as sticks.

Direct comparison against NMR data supports the validity of these findings and show that, contrary to what is typically assumed and in line with experiments, mutation of the phosphoserine to glutamate cannot recover the slowdown in conformational exchange (Fig. 1c).

Is the regulation of phosphorylation via the kinetic modulation of intrinsically disordered proteins conserved?

We propose a new mechanism of phosphorylation regulation by kinetic modulation which could be general for disordered domains which lack a well-defined equilibrium structure. This mechanism emerges from a theoretically derived kinetic model that shows that it is possible to modulate the binding affinity of protein-protein complexes by inducing a slowdown in conformational kinetics affecting both on- and off-conformational transition times by the same amount and simultaneously. This model agrees with kinetic and affinity values obtained through experiment (See paper).

With disordered domains taking part in more than 50% of proteins responsible for signaling in the cell, this kinetic mechanism of modulation highlights a further way by which post-translation modifications may affect disordered domains and their interactions with binding partners.

This work was performed in the De Fabritiis lab. Gianni De Fabritiis is the founder of Acellera. Should you be interested in performing similar studies in-house let Acellera know. We will be happy to inform you.

gianniKinetic modulation of a disordered protein domain by phosphorylation
read more

Maxwell GPU review: MD simulations with GeForce GTX980

by Matt Harvey

After a long wait, the greatly-anticipated Maxwell GPU from NVIDIA has finally arrived, in the form of the Geforce GTX980, to rave reviews from the gaming world where it has been acclaimed as the new king of the performance hill.

At Acellera, we’re always tracking the cutting edge of technology to deliver the best systems for molecular dynamics simulation, we’ve been hard at work putting these new cards through their paces. Before we see how they perform, let’s have a quick look at what’s changed.

What’s new? Maxwell GPUs: the new state of the art for molecular dynamics simulations

NVIDIA’s previous generation GPU, named Kepler, has been our workhorse for almost three years now, first in the form of the GK104 silicon, and then its big brother the GK110. These two devices both have a similar design, based on a 192-processing element block, called a “Streaming Multiprocessor“, or SM for short.
Manufactured on a 28nm process, the GK110 had 15 SMs, although it’s only with the very latest Tesla K40 and Geforce GTX780Ti that we have seen products with all of those cores enabled – the GTX780 systems we have been shipping to this past year have had only 12 SMs activated.

Maxwell GPU vs Kepler: Main differences

Maxwell’s SM is a refinement of that of Kepler, reducing the number of processing cores by a third to 128 but incorporating additional design improvements. NVIDIA claims that the real-world performance is reduced by only 10% relative to Kepler. The new Maxwell processor, GM204, is still manufactured on the same 28nm process as GK110 rather than the anticipated 20nm process. Nevertheless, the smaller SM, and the intervening refinements to the manufacturing process mean that GM204 can run at higher clock frequencies and contain more SMs than Kepler (16 versus 15) on a die about 40% smaller.

Maxwell GPU Performance in MD Simulation: Faster, stable and more efficient

So how does the Maxwell-based GTX980 fare when running ACEMD, our flagship molecular dynamics code?

Over the last year, we have been selling GTX780-based systems. On the popular dihydrofolate reductase benchmark system of 23,500 atoms, we saw single-GPU performance of around 210ns/day.

Performance

Running the same test on a GTX980, with no other performance optimisations, yields an impressive rate of 280ns/day, around 30% faster. On a Metrocubo equipped with 4 GTX980s, that’s over 1 microsecond per day of MD sampling. If you prefer maximum performance over throughput, a two-GPU run can achieve 380ns/day, a new performance record!

Benchmarking conditions: ECC off. X79 chipset. CUDA4.2 and ACEMD ver 2500 or greater. Periodic boundary conditions, 9 A cutoff, PME long range electrostatic grid size 1, hydrogen mass repartitioning, rigid bonds, Langevin thermostat, time step 4 fs. Note: Other codes make benchmarks with smaller cutoff or less atoms. Performace as of October 8th 2014. See ACEMD page for latest results, and to run a benchmark simulation with your system.

Efficiency

And that’s not all: compared to Kepler, the new GPU is much more power-efficient – an at the wall measurement of a 4-GPU E3-based Metrocubo system running at full tilt draws almost 200W less, making it even quieter and cooler than before.

It’s quite remarkable that such an improvement has been made without moving to a newer manufacturing node, and makes the future 20nm parts even more tantalising.

GPU hardware for MD simulations available now with Maxwell GPUs

All in all, the new Maxwell has passed its tests with flying colours and we’re very pleased to announce that we are shipping them to customers now.

As usual feel free to request a test drive. We will be more than happy to make some time available in one of our machines. Maxwell is already available for testing.

Also, feel free to request a quote.

gianniMaxwell GPU review: MD simulations with GeForce GTX980
read more

The Road towards GPU Molecular Dynamics: Part 1

by Matt Harvey

In 2006, when we started to develop codes for molecular dynamics (MD) simulations, we did so in anticipation of riding a new wave of technological innovation in processors. Up until then, MD was firmly in the realm of high performance computing. Running a simulation on a desktop or workstation was an exercise in futility — it simply wasn’t possible to run simulations for long enough to reach the timescales of relevance to bio-molecular processes — you had to have access to a supercomputer.

Supercomputers are great things (I’ve spent most of my career building, running and using them) but they usually far too expensive for any one researcher to own exclusively, meaning that they are often owned by consortia and time on them rationed parsimoniously. Suddenly, a researcher thinking of using molecular simulation had not only to learn about MD, but also to bid for computer time then find their way around a peculiar new operating environment, all for the privilege of having their simulations sit inexplicably stuck in a batch system queue.

Building a better environment for MD based computer assisted drug discovery

All of these issues combined to make MD quite a niche activity, despite having – in our eyes – direct applicably to a wide range of bio-molecular problems. What, we wondered, if we could build a better workstation? A personal machine able to do useful MD simulations wouldn’t just bring a quantitative change to the field but a qualitative one: by becoming something that one could run almost on a whim, MD could become a standard tool in the toolbox of computational drug design.

The question for us to answer was, would it be possible?

A case for consumer hardware repurposing

If you are familiar with the history of high performance computing, you’ll know that it is littered with the corpses of companies that tried – and failed – to build specialised processors optimised for niche applications. In the best cases, the fate of these businesses is to be rendered irrelevant by the relentless Moore’s Law-paced improvements in mass-market technology. As if we needed any more discouragement from following that path, it was that it had just been trod by DE Shaw Research, with their special-purpose MD machine Anton[1]. From the published costs, knew we had no hope of raising financing for a similar effort.

What we needed, then, was to find some already existing processor that:
1) had characteristics making it markedly more suitable for MD than normal CPUs,
and 2) that was also available on the mass market, making it both cheap and likely to benefit from Moore’s Law performance improvements.

At around this time the processor industry, in response to ever-greater manufacturing challenges, was beginning its transition from single-core to multicore designs. It was pretty clear even then that this would be a major technology trend and, although it was still an open question what future multicore processors would look like in detail, it was evident that it was the type of processor we should be design for.

Looking around for examples that fitted these criteria, it didn’t take us long to whittle the list down to one: the Cell processor

The Cell processor for molecular dynamics simulations

The Cell processor was jointly developed by IBM, Toshiba and Sony as a high performance processor for media applications. It had a heterogeneous architecture – a single Cell processor contained a conventional Power CPU (called the “PPE”) connected to a set of “Synergistic Processing Elements” that were, in effect, high performance vector processors. The aggregate performance of these SPEs made a Cell processor over 10x faster at arithmetic than its CPU contemporaries. Unlike normal CPUs, the SPEs could not run general purpose programs directly, they operated under the direct control of the CPU. This made their programming substantially more complex than usual.

Nevertheless, the Cell had one decisive factor in its favor – it could be bought for next to nothing, as it was the processor at the heart of the Sony PlayStation 3 games console. (Granted, it could also be bought for a lot if you asked IBM for their version!). Serendipitously, the PS3 could also run Linux, giving us a sensible, familiar, development environment.

CellMD – the first MD code for running MD simulations on graphic processors

The programming of the Cell that ultimately led to CellMD proved quite challenging – to get good performance, we had to consider:

*) Explicit data management. The SPEs could not access main system meory directly, all input and output data for their programs had to be staged in explicitly using code written for the PPE. Furthermore the SPEs has only very limited memory (256KB), making it necessary to carefully pack data structures.
*) Multilevel parallelism. In mapping the MD work onto the SPEs we had to consider division of work not only across the array of SPEs but also across the 128bit SIMD vector word on which each SPE operated. The PPE also had its own incompatible set of vector SIMD operations.
*) Flow control complexity. The SPEs, being very simple vector processors, optimised for floating point arithmetic operations were very poor at handling flow control operations, so code had to be carefully modified, to unroll loops and use predication in place of conditionals.
*) Algorithm mapping. Since both the SPEs and PPE could be used simultaneously, getting best performance meant mapping different aspects of the computation onto the different processing elements and carefully overlapping work.

When we finally had our first working MD code, which we called CellMD, it ran over an order of magnitude faster than could be achieved on a single CPU workstation[2], equivalent to about 5 years of CPU development, and was a vindication of our approach.

Inconvenient revisions of the Cell and migration into GPUs

With CellMD working, we anticipated being able to turn our attention back to more scientific pursuits, expecting the Cell processor — which was garnering substantial interest in the HPC world by this point — to be further developed, so giving us “free” performance improvements. Naturally, things seldom turn out so well!

Although IBM would talk about the general roadmap, details and timescales were very vague and never included anything nearly as cost-effective as the PS3 (In fact, there was only ever one major revision, the PowerXCell 8i in 2008, which found its way into LANL’s Roadrunner, the first petaflop supercomputer[3]). To compound matters, the revisions of the Cell made for later updates the the PS3 were aimed at cost-optimisation: newer production techniques were used to make the same design smaller and cheaper, rather than faster.

Fortunately, the trend towards accelerator processors was occurring in other parts of the industry, and we didn’t have to look far for our next platform: NVIDIA’s G80 GPU.

For more, check out part 2 of this series, where we dive into GPUs.

References

[1] D E Shaw et al “Anton, A Special-Purpose Machine for Molecular Dynamics Simulation,” Communications of the ACM, vol. 51, no. 7, 2008, pp. 91-97
[2] G. De Fabritiis, Performance of the Cell processor for biomolecular simulations, Comp. Phys. Commun. 176, 670 (2007).
[3] http://www.lanl.gov/roadrunner/

gianniThe Road towards GPU Molecular Dynamics: Part 1
read more

Automatic Force Field Generation

by Zoe Greenburg

The accuracy of molecular dynamics (MD) simulations depends upon the quality of the force field used to parameterize the model. The development of these force fields is challenging to automate, due to the highly individual nature of each molecule and its environment that require uniquely programmed starting conditions.

Wang and co-workers recently addressed this issue using ForceBalance, in a JPCL paper titled ”Building Force Fields: An Automatic, Systematic, and Reproducible Approach”. Dr. Vijay Pande, the corresponding author of the paper and faculty at Stanford University explains, “Accurate force fields are critical for the success of molecular dynamics simulations, but yet due to the great challenges in building force fields, they have typically been made ‘by hand’, requiring many years of effort, and are not easily reproducible or produced in a systematic manner.” ForceBalance aims to create systematic and reproducible force fields by using a combination of experimental and theoretical data.

ForceBalance provides a hybrid approach that combines ab initio and experimental data

In order to find the parameters that allow a force field to accurately simulate a system, these parameters must be optimized to represent the physical properties of a system that are experimentally known, as well as what is known about the system from the theoretical calculations. Finally, the algorithm used to combine these two factors differs across force fields.

ForceBalance, an application developed in 2012 by Wang, which is available through Simtk.org, allows the researcher to improve the accuracy of the force field by basing the parameters on the data available, whether it be experimental, theoretical, or a combination of both.

Dr. David Case, from Rutgers University says “As I see it, the key new idea here is to use gradients of liquid state average properties to help drive the optimization procedure. This allows one to significantly expand that set of target properties that can be fit in an automated fashion. It doesn’t relieve one from having to choose and weight the targets that one hopes to fit.”

This approach greatly streamlines the process of force field creation, although the method is completely automatic. It still leaves the researchers room to manually alter aspects as they see fit.

ForceBalance maximizes the efficiency of force field development

Thermodynamic fluctuation formulas are employed to reduce the necessity of many long simulations. The simulations are only run for long periods of time when high precision is required, such as at the end of the optimization.

ForceBalance also combines several open source technologies that are available on the internet. OpenMM 6.0, the Work Queue Library, MBAR all contribute to a fast and accurate generation of force field parameters.

The TIP3P and TIP4P models were significantly improved with Force Balance

The TIP3P and TIP4P models are the most commonly used water molecules for the Amber and CHARMM force fields. TIP3P is used for many simulations that deal with biologically relevant phenomena. Thus, their further improvement is an exciting development for the field.

The original parameters developed for TIP4P were unable to reproduce the dielectric constant for water in simulations. However, once ForceBalance calculations were applied to the models, with starting points despite varying starting points, all converged to the same point. The data set included a wide range of temperatures and pressures for six different liquid properties. The newly optimized parameter set, called TIP4P-FB reproduces the dielectric constant of water reliably across wide ranges of temperature and pressure.

The same optimizations were made to the TIP3P model, as 3-point models of water are more widely used in research. The optimized parameters(TIP3P-FB) were similarly accurate to TIP4P-FB, except for slight deviations from the experimental data for the density of water at low temperatures, outside the range that is common in biomedically relevant simulations.

ForceBalance showcases that efficient, reproducible, and automatic force field generation is possible. The research team hopes to expand this method to more biologically relevant models.

Looking towards the future

This type of technology has many applications. Dr. Pande predicts that “By completely automating the parameterization of force fields, force field design can now be reproducible … and built systematically (we can build many force fields in a consistent manner, such that they would easily work with each other). It is our vision that there is a coming Renaissance in force field design, where there will be a considerable increase in force field accuracy due to this new approach”

gianniAutomatic Force Field Generation
read more

Why we began optimizing GPU hardware for MD simulations

by D. Soriano and Gianni De Fabritiis, PhD.

We started using GPU computing for molecular dynamics simulations in late 2007, shortly after NVIDIA introduced their first fully-programmable graphics card. GPU adoption into our molecular dynamics program was a natural transition as we had spent the previous two years working with Sony’s Cell processor, the first widely-available heterogeneous processor, and had obtained very promising results (see 2007 papers cited). While working towards ACEMD, our molecular dynamics software, we were unable to find a stable GPU cluster from any of the major vendors that matched high GPU density with a cost effective specification. We were thus forced to build our own.

Development of the first GPU servers for MD simulations by Acellera

We started the hardware development work that ultimately led to Acellera’s Metrocubo by cherry-picking the appropriate components from a scarce pool. We knew we wanted to build cost effective GPU computing systems of high density, and so we focused on researching and developing 4xGPU nodes built around consumer hardware. After evaluating the possibilities we found a single 1500W power supply, and picked one motherboard of the very few which at the time supported 4 double-width GPUs. We then complemented this with CUDA GPUs from NVIDIA, and after deciding on the remaining components we obtained a system that allowed us to build a small GPU cluster, to which we tailored what would ultimately become ACEMD.

Development of a computer chassis specifically designed for GPU accelerated MD

Over the next two years, the range of consumer hardware choices available for GPU computing improved significantly. Nevertheless, we could not consistently get our hands on a commercial GPU server that had a suitably designed chassis, and consequently some of our systems required frequent support. For the most part, available chassis did not have the right combination of GPU cooling, low noise, and GPU density, and brand name options came with custom designs of power supplies and mainboards, which complicated updates. To address these issues, and be able to standardize hardware configurations so that we could confidently have robust and efficient machines optimized for GPU accelerated molecular dynamics simulations, we decided to design a new GPU chassis. To date this chassis remains the only one to be optimized for use in MD simulation work and computer aided drug design.

First generation GPU chassis: design and prototyping

The first prototype of the GPU computer chassis, which was made of stainless steel, already included some of the key signature features of the final product. First, we placed the power supply at the front, and fixed the dimensions of the box for exclusive single socket, 4-GPU motherboard support,as multiple sockets increased the cost and decreased performance by about 10% due to PCIe switch latency. The result was a versatile and compact design that permitted facile repurposing of workstations into rackmount solutions, and offered the possibility of high GPU density. While troubleshooting or testing new configurations we frequently moved machines from the server room to the office and vice versa, and therefore we preferred to design a machine that fitted in both environments. Second, we placed two large-radius fans with high quality bearings at the front, instead of the center or the back. Given the small volume of the box, we felt this would be enough to ensure proper cooling and curb noise to manageable levels. As one might expect the first prototype required revision, but minor modifications primarily introduced in the front panel of our second generation design produced a GPU chassis that fitted all of our requirements.

Second generation GPU chassis: results and final touch ups

The results obtained with GPU nodes built with the second generation chassis were most satisfactory. The cooling was much better than anything we had ever tested before and the temperature of the GPUs never exceed 78C, even after running the machine for extended periods of time with both actively and passively cooled GPUs. Furthermore, the amount of noise produced by the machines was very acceptable, and did not exceed the background noise produced by the lab’s AC. Some of Acellera’s customers have now six of these systems running satisfactorily full blast in their offices. Furthermore, the compact size of the chassis – each 8U tall and less than ⅓ of 19in wide – allowed fitting 3 units per computer rack shelve, on a tray, thus facilitating GPU cluster assembly.

Next, we shifted our attention to the unit’s cosmetics. GPU clusters and workstations are ubiquitously characterized by dull colors and unimaginative designs, and we wanted something more attractive to see in the lab every day. We ordered ten prototypes of various colors, and we decided we liked the green, the orange and the blue versions, but we settled for the last as it coincided with Acellera’s logo color. The feet for the workstations we made ourselves, printed in orange on our own 3D printer.

GPU clusters and workstations built with Acellera’s chassis are marketed in the USA, Canada, and Europe

Since then, Acellera has shipped over a hundred Metrocubo units built on this chassis design. In order to have the flexibility to adapt to hardware changes only small batches are made and modifications introduced as needed. Developing standardized hardware configurations allows Acellera to minimize hardware problems. Acellera sells the same standard configurations in the Europe and the US thanks to partnerships with companies such as Silicon Mechanics, or Azken. None of our MetroCubo machines ship until they have passed a battery of taxing stress tests, including running extensive ACEMD simulations. Our Metrocubo GPU workstations are plug and play devices that come with the OS (CentOS or Ubuntu) and molecular dynamics simulation software all fully installed and ready for use. For GPU cluster configurations this is up to the customers’ discretion, but nevertheless Acellera offers assistance should they prefer to do it on-site. Should you be interested in more photos of our GPU systems or details on a current sample configuration visit our Google+ profile, see more information on Metrocubo, or contact us directly.

Finally, if you prefer to build your own, here is a sample spec that we currently use:

  • Acellera Metrocubo Chassis
  • Intel Xeon™ E3-1245v3 Quad Core 3.4Ghz, 22nm, 8MB, 84W + P4600 GPU
  • MB ASUS Z87 WS
  • 32 GB ECC RAM
  • Silverstone 1500W power supply
  • Hard disk WD 4TB RED
  • 4 GPUs of choice (Tesla K40, GTX 780 TI)

Note that E3 processors have a built in GPU that can be used while all 4 Nvidia GPUs are computing. Enjoy.

gianniWhy we began optimizing GPU hardware for MD simulations
read more

Adaptive Sampling for MD Simulations Talk at GTCBio

by David Soriano

A few of weeks ago we were invited to present at GTCBio Drug Design and Medicinal Chemistry 2014 Conference. The conference was very interesting and very well organized, and we will make sure to keep it our agenda for future years. The meeting also happened to be in Berlin, which instantly became one of my favorite cities. Our talk was titled “Machine Learning in FBLD” and discussed our new adaptive sampling method for probing protein ligand interactions using atomistic molecular dynamics simulations that was developed in collaboration with the De Fabritiis lab. So how did we get there?

High-throughput molecular dynamics can be used for profiling protein-ligand interactions

In 2011 we reported our seminal work towards a practical approach for profiling protein ligand interactions in silico with all atom resolution. In this proof-of-principle study we showed that with enough sampling we could use atomistic high-throughput molecular dynamics simulations to reconstruct the binding of benzamidine to trypsin, and not only obtain accurate energies, kinetics, poses but also resolve a ligand binding pathway. We were also able to predict several metastable states later observed experimentally.

Application of molecular dynamics simulations in fragment based lead discovery (FBLD)

We next focused on expanding the library size, and on adapting the methodology to fragment based ligand design. Benzamidine is a small molecule of micromolar affinity for trypsin, and therefore extending this method to FBLD was a natural progression. For the fragment study we used a 2003 STD NMR study that targeted Factor Xa, a target related to trypsin, and a set of forty fragments. Once more our MD experiments were able to recapitulate the binding of the library to the target, and also give accurate representations of binding energies, kinetics, poses and binding pathways. Importantly, we were also able to rank the compounds in order of increasing binding affinity accurately. As exciting as the results were, we needed to collect 2milliseconds of aggregate biological timescale trajectory data, a challenge we could only meet because of our access to GPUGrid through our collaboration with the De Fabritiis group.

Development of an adaptive sampling protocol for MD

One limitation of our HTMD based in-silico binding method at that time involved the amount of sampling needed for system convergence, which was in the order of 50×10^-6 s per compound for Factor Xa. Clearly, in order to make this high-throughput molecular dynamics methodology a practical complement to current experimental FBLD screens, we needed to improve the efficiency of our simulation methods. To this end, we developed a fully automated adaptive sampling method that can deliver system convergence about an order of magnitude faster than the brute-force HTMD approach.

As opposed to HTMD, where we run hundreds of simulations at once, our adaptive sampling protocol begins with only ten trajectories each starting from random initial states and each a few tens of nanoseconds long. We then analyze these trajectories using Markov State modelling, and use residence time information obtained from clustering to select the starting conformations of future runs. This run-analyze-respawn cycle is defined as an epoch, and each adaptive experiment is composed of ten epochs. Note that all of this is automated and that we only need to setup the initial trajectories before working up the results after the last epoch. For each protein-ligand system we run ten replicate adaptive experiments.

For our proof-of-concept we tested this adaptive sampling method on trypsin-benzamidine, a system for we which we have a very large amount of data and that over the years has become our benchmark. As opposed to regular HTMD where we needed to collect about 50×10^-6 s of data for convergence, only 5×10^-6 s were needed in our adaptive experiment. In each experiment, learning was evident after the 3rd epoch, and the energy and pose matched those of our control HTMD experiment after the 5th epoch, in about 80% of the experiments.

What next?

We are very excited about these results as in principle — and assuming there were no other bottlenecks — we should be able to screen 400 compounds in the same amount of time previously spent with 40. This would make this technique fast enough for medicinal chemists to start looking at it as a practical and reliable complement to current FBLD screens. We hope to be there soon but we still need to test the universality of the method thoroughly as well as work out other kinks such as the issue of ligand parametrization.

gianniAdaptive Sampling for MD Simulations Talk at GTCBio
read more