Poster Presentation 51st Lorne Proteins Conference 2026

Knowledge-based machine learning enables accurate chemical reaction prediction (#208)

Qisheng Pan 1 , David Ascher 1
  1. University of Queensland and Baker Institute, Melbourne, VIC, Australia

Organic chemical synthesis planning is often time-consuming and labour intensive, which requires substantial exploration experiments and expert knowledge to identify reaction templates and appropriate reagents. To facilitate this process, many Artificial Intelligence models were developed to enable chemical reaction prediction. Despite their performance, these approaches usually only take the text format of the compounds as inputs and process the molecules with language modelling, overlooking the chemical mechanisms behind. In essence, chemical reactions are driven by the cross-talk between atoms, resulting in the break / formation of bonds. Based on this insight, we developed an atom-based transformer model which could serve as a baseline of reaction prediction. Given a list of compounds, each atom was labelled based on its element type, charged, hybridization, valence, chirality, and other chemical and spatial properties. We then fit these atom representations to a transformer-based network to study how different atoms interact with each other during the reaction. To improve integrity, contrastive learning was applied to get the model to learn the compounds from both the text format and the atom-based representation. We leveraged the diverse reactions curated in the Open Reaction Database to develop our models. In our preliminary study, our model achieves a reasonable performance on both forward reaction prediction (Top-10 accuracy = 84.80%) and single-step retrosynthesis prediction (Top-10 accuracy = 67.27%). The reaction representation embedded in the model can be used to accurately find the reaction center (accuracy = 99.36%). In the downstream analysis, we are going to further optimise the model architecture and training strategy, aiming to use the reaction embedding to recommend appropriate reagents and solvents. We believe our atom-based model will help researchers better understand chemical reactions, facilitating natural product synthesis and small-molecule drug development.