Handling ultra-large chemical spaces like Enamine REAL Space (70B) or Chemspace Freedom Space 4.0 (142B) using traditional enumeration and scoring methods can take weeks and massive computing power. That’s why we use and continuously improve Thompson Sampling — a Bayesian algorithm that smartly balances exploration and exploitation to uncover the most promising molecules rapidly.
Instead of exhaustively screening billions of compounds, this method samples only a fraction of the space while shifting molecular score distributions toward desirable outcomes (e.g., maximizing QSAR-class probability). The result? Substantial time savings and a significantly higher number of retrieved top-scoring compounds.
Real-world performance:
We benchmarked our approach on the 70 billion-compound Enamine REAL Space using an ensemble of 5 Chemprop models trained on DNA-encoded libraries selection data.
- Runtime: Just 6 hours on 250 CPUs
- Sampled: Only 214 million compounds
- Outcome: Recovered 6× more top scorers compared to brute-force inference on 5 billion molecules (which took 120 hours!)
Ready to accelerate your discovery campaign?
Let us run your machine learning models on the REAL or Freedom Spaces - no infrastructure is needed on your side.
Contact us at [email protected] to learn how Thompson Sampling as a Service can transform your next virtual screen.