Crowd-sourced machine learning for NMR

Screen Shot 2020-03-03 at 11.42.33

The rise of machine learning (ML) has created an explosion in the potential strategies which may be used to learn from data in order to make scientific predictions. For physical scientists who wish to apply ML strategies to a particular domain, this has created a bewildering scenario, where it is difficult to make an a priori assessment of what strategy to adopt within a vast space of possibilities.

To address this search problem, we recently we teamed up with Kaggle to initiate a crowd-sourced community competition for searching and analysing the space of possible ML strategies in order to predict pairwise NMR properties. Over 3 months, we received 47,800 ML model submissions from 2,700 teams in 84 countries, surpassing anything we could have achieved on our own. Analysis of the results shows that it is possible to construct ensemble-based ML models as linear combinations of the top 50 submissions, which have a prediction accuracy better than any individual model, and are nearly 3 orders of magnitude better than our previous approaches.

If you’d like to learn more, Dr. Lars Bratholm has organised a symposium inviting the competition’s top-scorers to Bristol where they will describe their ML approaches.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s