To a first approximation, proteins evolve via the accumulation of substitutions and indels. With an appropriate statistical model for this evolutionary process, sequence alignment can be framed as a straightforward (albeit expensive) exercise in Bayesian inference: correctly placing gaps corresponds to identifying where the indels occurred. Similarly, phylogenetic tree construction and ancestral sequence reconstruction can be framed as inference tasks, as can more refined bioinformatics exercises like using a multiple alignment to estimate the type of selection occurring at different places in a sequence (purifying vs diversifying selection), identifying evolutionarily accelerated or conserved sites, and detecting various signals of three-dimensional structure.
All of this is contingent on having a realistic continuous-time Markov process generator for the rates at which substitutions and indels occur, and indeed being able to "solve" (exponentiate) that model. Specifically, we need a time-dependent alignment likelihood of the form P(alignment,descendant|ancestor,time) that obeys the Chapman-Kolmogorov equations for a Markov chain. The pioneering model in this regard was the Thorne, Kishino, and Felsenstein 1991 model (TKF91), the first to directly derive the gap and substitution scoring schemes for dynamic programming sequence alignment from an underlying model of the instantaneous rates of point substitutions and indels. One year later, the TKF92 model extended TKF91 to allow multi-residue indels, at the cost of introducing latent information.
Recently, several novel improvements on this model have been proposed. The first class of improvements tried to solve the simple process described by TKF92 in a cleaner way. De Maio (Systematic Biology, 2020) and Holmes (Genetics, 2020) used a renormalization approach to bypass the latent information introduced in TKF92. The second class of improvements goes directly to realism, and involves several attempts to model the alignment likelihood more realistically using neural networks, forsaking the clean simplicity of TKF92 (and friends) for a more predictive model of evolution.
We have tested several such models, including models based on mixtures and hierarchically nested versions of TKF92 (as in Holmes, 2004), as well as various neural models. Here, we report the results of these comparisons, with recommendations for future protein evolutionary analyses.