3D pride without 2D prejudice: Bias-controlled multi-level generative models for structure-based ligand design
Lucian Chan, Rajendra Kumar, Marcel Verdonk, Carl Poelking
Generative models for structure-based molecular design hold significant
promise for drug discovery, with the potential to speed up the hit-to-lead
development cycle, while improving the quality of drug candidates and reducing
costs. Data sparsity and bias are, however, two main roadblocks to the
development of 3D-aware models. Here we propose a first-in-kind training
protocol based on multi-level contrastive learning for improved bias control
and data efficiency. The framework leverages the large data resources available
for 2D generative modelling with datasets of ligand-protein complexes. The
result are hierarchical generative models that are topologically unbiased,
explainable and customizable. We show how, by deconvolving the generative
posterior into chemical, topological and structural context factors, we not
only avoid common pitfalls in the design and evaluation of generative models,
but furthermore gain detailed insight into the generative process itself. This
improved transparency significantly aids method development, besides allowing
fine-grained control over novelty vs familiarity.