Poster Presentation 51st Lorne Proteins Conference 2026

Structome-DeepRoots: On phylogenetic inference using high-dimensional embeddings (#215)

Ashar Malik 1 2 3 , David Ascher 1 2 3
  1. Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD, Australia
  2. School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
  3. Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia

It’s established that structure is more conserved than sequence, a fact rooted in the constraints of 3D geometry. This is why structural comparisons often succeed in recovering deeper signals where sequence alignments fail. This begs a powerful question: if moving from a 1D sequence to 3D geometry reveals so much more evolutionary history, what if we increased the dimensionality even further? Could analyzing proteins in hundreds of dimensions uncover deeper-resolved signals invisible even in 3D? This is where protein language models that encode the complex interplay of sequence, structure, and function into rich, high-dimensional embeddings become helpful.

Our method, Structome-DeepRoots, harnesses these representations in a novel phylogenetic framework. Instead of information-losing pooling, DeepRoots computes pairwise distances from the average cosine similarity of individually paired residue embeddings, which are identified via structural superposition. This alignment-aware approach provides a granular comparison in a 1280-dimensional latent space. To complete the framework, we also introduce a novel embedding perturbation model for rapid statistical bootstrapping. In this talk, I will demonstrate how this high-dimensional signal resolves complex relationships in the Globin superfamily and quantitatively outperforms our established TM-score baseline (Structome-TM) on the PhyloBench benchmark.
 

Resources in the Structome suite are accessible here: https://biosig.lab.uq.edu.au/structome/