paper
- Very simple idea thoroughly examined.
- Protein structures often offer richer information than protein sequences, however the former are much harder to obtain and thus less available.
- Language models have been shown to yield rich embeddings of protein residues within their local context (model trained to predict residue given its neighboring residues).
- When a structure is known, sequence embeddings trained on a large corpus are used as node features and a graph convolution network yields an encoding of the contact map.
- Model is trained to predict Gene Ontology classification for proteins and shows state of the art performance.
- Class Activation Maps are applied to trace back the importance of protein regions for the prediction.