Despite the key evolutionary role extra-chromosomal elements (ECEs) are known to have played across all three Domains of life, the nature and functional potential of ECEs within Domain Archaea are largely understudied, partly due to difficulties cultivating archaea in the laboratory. Metagenomics-based discovery of archaeal ECEs coupled to in silico structure prediction-based analyses of their proteins is a powerful approach to fill these knowledge gaps.
We curated 17 complete genomes of members of a novel type of ECE that shares protein families with Borg ECEs. Like Borgs, these “Klingons” are inferred to associate with Candidatus Methanoperedens archaea. The Klingon genomes are circular, range in size from 67 to 174 kb and were found in eight different environment types across four continents. As their proteomes are enriched in hypothetical proteins, we predicted the structures of the 2682 Klingon proteins with Colabfold. Functional annotations indicated an abundance of methyltransferase and glycosyltransferase enzymes and the presence of capsid-like and other proteins suggestive of viral lifestyles. Structural clustering of the protein models identified broader subfamilies of methyltransferase enzymes across the 17 Klingon genomes compared to the subfamilies identified by sequence-based clustering and enabled functional assignments.
Orthologous genes were observed to be positioned non-randomly and genes shared with the Borgs were encoded primarily in one half of the Klingon genomes, often bordered by CRISPR-like arrays. The results suggest genome expansion by selective insertion of genes from coexisting ECEs into recombinational hotspots.
Comparing phylogenies of single proteins across Klingons and Borgs uncovered considerable gene tree discordance, suggesting that the assumed ECE relatedness is due to extensive horizontal gene transfer (HGT), rather than relatedness by vertical descent. Thus, hierarchical clustering of the predicted proteomes could resolve Klingon lineages despite extensive HGT.