paper

  • Very simple idea thoroughly examined.
  • Protein structures often offer richer information than protein sequences, however the former are much harder to obtain and thus less available.
  • Language models have been shown to yield rich embeddings of protein residues within their local context (model trained to predict residue given its neighboring residues).
  • When a structure is known, sequence embeddings trained on a large corpus are used as node features and a graph convolution network yields an encoding of the contact map.
  • Model is trained to predict Gene Ontology classification for proteins and shows state of the art performance.
  • Class Activation Maps are applied to trace back the importance of protein regions for the prediction.