End-to-end learning of protein-protein interactions

Project Information

bash, bioinformatics, computational-chemistry, debugging, machine-learning, programming, programming-best-practices, python, scripting, slurm, software-installation, tensorflow, tuning
Project Status: Halted
Project Region: CAREERS
Submitted By: Galen Collier
Project Email: guillaume.lamoureux@rutgers.edu
Project Institution: Rutgers University–Camden
Anchor Institution: CR-Rutgers
Project Address: 303 Cooper St
Camden, New Jersey. 08102

Mentors: Galen Collier

Project Description

Protein-protein interactions (PPIs) are involved in numerous fundamental biological processes and a model that can reliably predict whether two proteins interact — and predict the effect of protein variation on an existing interaction — opens up new avenues for systems biology and for protein design. Current state-of-the-art PPI prediction models rely on sequence similarity with proteins known to interact and have an intrinsically limited accuracy for the protein variants of interest for cancer or viral/bacterial infection.

The goal of the project is to train deep learning models for PPI prediction in absence of structural information about the protein complex. We have recently developed models to predict the structure of any complex formed by two proteins A and B of known structure (see our preprint “Protein-protein docking using learned three-dimensional representations”, https://www.biorxiv.org/content/10.1101/738690v2), and we now aim at developing models that generate the structure of the AB complex at once, without explicitly searching for the optimal relative orientations of the two proteins, and that predict the binding affinity of proteins A and B directly from their structures. Such models have two main advantages: (1) they are much more computationally efficient, since they avoid a costly grid search in the space of translations and rotations, and (2) they are differentiable, which means they can be used as building blocks for larger neural architectures that, for instance, also predict the structures of the individual proteins A and B themselves.

This project is enabled by the development of TorchProteinLibrary, a computationally efficient library of differentiable primitives for deep neural network models of protein structure (see our preprint “TorchProteinLibrary: A computationally efficient, differentiable representation of protein structure” https://arxiv.org/abs/1812.01108). The library implements the functionalities needed to perform end-to-end learning of protein structure prediction.