AI drug discovery

SMALL DATA AND ACTIVE LEARNING

AI and Machine learning are commonly applied to big data problems. But in drug discovery, exciting new targets tend to have little known about them – and very little existing data to leverage.  At Exscientia we recognised several years ago that progressing new targets would require a blend of innovative approaches, with a particular focus on what we call the small data problem, where predictive models cannot initially be built. In these situations we have to start a project efficiently with limited seed data, whether it be from literature and patents, or more likely a small internal screening campaign.

Our Active Learning algorithms, are ideally suited to the small data challenge, where the objective is to achieve the largest knowledge increment in the absence of usable machine learning models. The merits of Active Learning, covered in a Nature journal report demonstrated how an algorithm can learn its way into a drug discovery dataset with limited data points more effectively than most humans.

BIG DATA AND GENERATIVE DESIGN

Global models, built from a wealth of literature and patent data, ensure comprehensive coverage of both protein target activities and general small molecule DMPK requirements. Routine application allows any compound to be assessed for acceptable selectivity and pharmacology, prior to synthesis and assay. As each project progresses, experimental data from each design-make-test cycle allows project-specific local models to be built and improved.

By applying generative algorithms to originate new compounds and models to prioritise with evolutionary selection pressure, those most likely to have characteristics relevant to the project can be selected for synthesis and assay.

Through integrated cycles of evolutionary design, experiment and feedback, the trajectory from Hit to Candidate can be actively monitored and assessed.