AI and Machine learning are commonly applied to big data problems. But in drug discovery, exciting new targets tend to have little known about them – and very little existing data to leverage. At Exscientia we recognised several years ago that progressing new targets would require a blend of innovative approaches, with a particular focus on what we call the small data problem, where predictive models cannot initially be built. In these situations we have to start a project efficiently with limited seed data, whether it be from literature and patents, or more likely a small internal screening campaign.
Our Active Learning algorithms, are ideally suited to the small data challenge, where the objective is to achieve the largest knowledge increment in the absence of usable machine learning models. The merits of Active Learning, covered in a Nature journal report demonstrated how an algorithm can learn its way into a drug discovery dataset with limited data points more effectively than most humans.