By Alejandro Varela
Our current understanding of drug design is fundamentally structure based. The process works as follows: once the structure of the target protein is known, and some interesting pockets have been identified on it, medicinal chemists can study these spaces and suggest small molecules which can create strong interactions with that protein environment, hopefully leading to a conformational change in the protein which will modify its behavior.
LigVoxel aims to provide this critical analysis that medicinal chemists now do. Given a pocket of interest, we want to have a rough idea of where the protein pocket expects drug chemical moieties to be, in what is usually referred to as a pharmacophore. One example would be expecting an hydrogen donor in the pharmacophore to be in front of an hydrogen acceptor group in the protein pocket.
To achieve this behaviour, we trained a deep convolutional neural network with thousands of co-crystallized protein-ligand complexes available in the scPDB database. We surrounded the protein pocket and its co-crystallized ligand with a box of side 16 Å, discretized into voxels of 1 Å. Each voxel is then given a value depending on how close it is to a given atom type. Voxels in empty spaces have values close to zero, while voxels that wrap an atom or are very close to one have values near 1. This is, basically, a 3D picture of the protein-ligand interaction.
The input for the network was the voxelized protein surface, with no ligand information, and the output of the network is the pharmacophore prediction. Now, because we did not have information of the actual pharmacophore of the pockets, we used the voxelized ligand as a proxy of this pharmacophore, hence, the output of the network should mimic the cocrystallized ligand of the pocket provided as input. This is enforced by the loss function, which compares the predicted output with the ground truth (the voxelized ligand).
In more detail, the voxelization is done splitting the atoms into different types: hydrogen bond donor, hydrogen bond receptor, aromatic and “any atom”, which accounts for occupancy/excluded volume. After training, we can select any protein pocket, run LigVoxel on it, and obtain a distribution of these properties along the pocket space (the 16 Å^3 box), in what could be called a pharmacophore field.
We test the accuracy of these predictions by measuring its overlap with some actual ligand-protein complexes which we kept in the test set, so that the network could not remember them. Occupancy is the best predicted property, followed by aromaticity. We also test if the predicted pharmacophore was able to discriminate good binding poses from bad ones. To do this, we applied several rotations to the cocrystallized ligand, and then we select those orientations which better fit LigVoxel’s predictions. We were able to recover good poses (RMSD smaller than 2 Å from the crystal conformation) in more than 80% of the targets that we studied. In the picture on top, we display LigVoxel’s prediction for protein HIV protease (PDB code: 1HVR) superposed with the crystal ligand, where the color code is the following:
- Aromatic: yellow
- Hydrogen bond donor: violet
- Hydrogen bond acceptor: red
- Occupancy: black wireframe
LigVoxel is available for free in playmolecule.org, and an article has been published with further details on the network architecture, and how it was trained and tested. In order to use LigVoxel on playmolecule.com, you just need to upload a protein molecule (in .PDB file format) and point the location of the pocket that you are interested in. We invite you to check our video demo:
Do not hesitate to contact us for more information!
Miha Skalic, Alejandro Varela-Rial, José Jiménez, Gerard Martínez-Rosell, Gianni De Fabritiis; LigVoxel: Inpainting binding pockets using 3D-convolutional neural networks, Bioinformatics (2018) https://doi.org/10.1093/bioinformatics/bty583