In past decades, computer-aided drug design has fundamentally changed the drug discovery landscape. One of the most powerful strategies has become virtual screening. Nowadays, instead of testing millions of compounds in the lab, researchers use computational algorithms to quickly identify prospective molecules, which dramatically reduces the time, cost, and experimental effort required for a drug discovery campaign.

What is Virtual Screening in Drug Discovery?
Virtual screening is a set of computational techniques that researchers use to explore vast compound libraries to identify biologically active molecules. The idea is to mimic in silico what occurs in a biochemical assay.
Let's revisit the history of virtual screening to better understand the statement of this technique in the drug discovery process. In the 1980s, docking methods were first tested. These early methods were limited, but they showed that, at least in principle, it was possible to predict how a drug fits into a protein pocket. By the 1990s, faster computers and the boom in combinatorial chemistry made people dream about screening thousands – even millions – of compounds virtually. In 1997, the term "virtual screening" appeared in the literature, giving the field its own name. Throughout the 2000s, the development of improved algorithms, the expansion of chemical databases, and the extensive growth of the number of high-quality protein structures transformed virtual screening into a practical and effective tool. These days, thanks also to AI algorithms, virtual screening has become a central component of drug discovery workflows. It’s not just about ranking molecules anymore; it’s about exploring chemical space in new, often unexpected ways.
Thus, searching for promising drug candidates that took months of high-throughput screening can now be done in days on a decent computing cluster. Costs drop, hit rates go up, and scientists get to focus on the compounds that actually matter.
Core Approaches to Virtual Screening
Over time, three main virtual screening techniques in drug discovery have emerged, and each of them has its own strengths, limitations, and applications.
- Structure-based virtual screening (SBVS). This approach is used when the 3D structure of the target is available (experimentally determined, predicted by homology or de novo modeling, AI/ML-based prediction methods, etc). After preparing the chosen structure, researchers dock compounds into the target’s binding site and score the strength of that interaction.
One of the advantages of SBVS is that this method provides scientists with insight into ligand-target complexes, which facilitates the design and optimization of more effective compounds. But there are some limitations – the scoring algorithms aren’t perfect, and targets aren’t as rigid as docking assumes. More accurate methods, such as molecular dynamics simulation, can help, but they require significantly more computing power and time.
- Ligand-based virtual screening (LBVS). If the structure of the target is absent and cannot be obtained by any method, researchers analyze the molecules with known biological activity along with their structure-activity relationships. LBVS is based on the principle that compounds with similar structures exhibit similar biological effects. To quantify this similarity, computational methods compare 2D fingerprints, 3D shapes, or pharmacophore patterns using similarity metrics like Tanimoto coefficients. Machine learning models (QSAR using Random Forest or neural networks) predict activity from molecular descriptors.
LBVS is fast – millions of molecules are screened in minutes – and works well for early filtering before docking. However, it proposes the same chemotypes, not breakthrough scaffolds. It is also necessary to consider that even minor changes in a molecule's structure can lead to activity cliffs, thereby undermining the fundamental concept of the approach. Also, researchers don’t obtain any information about the binding mode.
- Fragment-based virtual screening (FBVS). In this approach, researchers initially virtually screen small fragments – often composed of just a couple of functional groups and a molecular weight under 300 Daltons. Such fragments, as a rule, bind weakly, but if they interact with the correct target spot, they can serve as blueprints for growing a full-size drug candidate.
The advantage of FBVS is that the size of fragments allows them to explore the binding pocket of the target more efficiently than full-sized molecules. They often catch spots that bigger compounds miss, revealing novel starting points. But, from the other side, small fragments can bind weakly, which makes them difficult to detect computationally and experimentally. That's why researchers need to use expensive methodologies, such as NMR or crystallography, to confirm fragment binding. Additionally, growing these fragments into actual drug candidates requires significant design work and time; however, modern AI-assisted approaches, such as V-SYNTHES, now help automate fragment elaboration and explore synthetically feasible growth pathways more efficiently.
In practice, no team relies on just one method. It's like a different lens that allows researchers to look at the same problem – sometimes they switch between them or combine them.
Virtual Screening Workflow & Algorithmic Pipeline
Virtual screening is a funnel-shaped process. Researchers begin by investigating a million molecules and then narrow their focus to a few hundred, which are then synthesized and tested in the lab. And every stage of this funnel has its own tools, strategies, and frustrations.
- Preparation of the target. The SBVS and FBVS start from refining the biological target structure, as its accuracy determines the effectiveness of the entire workflow. This structure is typically obtained from the Protein Data Bank, generated using AlphaFold, or built with a homology model; however, it is not yet ready for docking. The raw structure preparation is a complex process that involves cleaning up by removing crystallographic waters, ligands, and artifacts; adding hydrogens (typically absent from crystal structures); assigning protonation states at physiological pH; defining the binding site, and performing energy minimization to relieve structural clashes. Sometimes, ensemble or flexible docking is used to account for protein flexibility that rigid docking approaches often overlook.
- Library selection. One of the most important questions in any virtual screening project is what to screen. Will it be commercial libraries, in-house compounds, or one of the make-on-demand spaces with billions of virtual molecules? It is essential to remember that the vast and diverse input space increases the likelihood of discovering novel, potent hits.
- Pre-filtering. Lipinski’s Rule of Five, PAINS, and toxicophore filters are used to remove ligands with poor pharmacokinetic and safety profiles. To identify molecules with poor solubility, permeability, or metabolic stability, ADMET prediction models are applied. This step saves a significant amount of computational time. However, if the filters are too strict, real hits can be eliminated.
- Ligand standardization. Initially, each compound from the chemical library needs to be prepared in a standard form, because its protonation state, tautomer shape, 3D conformation, and stereochemistry can significantly impact docking results. Then, molecules should be converted into representations that correspond to the algorithm type. SMILES strings are applied for chemical language models, fingerprints – for similarity searches, or 3D coordinate sets – for docking. If this stage is skipped, researchers don't obtain chemically valid and computationally reproducible results.
- Computational pre-screening is applied to quickly reduce the library size, with the aim of minimizing computational cost during docking. To achieve this, a variety of filtering techniques can be used. The most common of them are similarity searches, such as 2D fingerprints or 3D pharmacophore overlays. QSAR models accelerated by machine learning (Support Vector Machines or Random Forest) can be used to predict the activity or toxicity of compounds. Although these filters can significantly reduce the library size, one should be cautious, as overly strict filtering criteria may lead to the elimination of potentially active compounds.
- Screening methods. There are three core methods used in virtual screening services: SBVS, LBVS, and FBVS (covered earlier). Each of them has distinct advantages: LBVS is faster and more data-efficient; SBVS provides structural insights into ligand-target interactions; FBVS identifies fragments that can be grown up into a real drug. Hybrid workflows combine the strengths of all approaches.
- Scoring and Re-ranking. After the docking, researchers can obtain hundreds or thousands of ligands with high docking scores. However, these scores don’t equate to real biological activity in a tube; they are only a relative indicator. Therefore, to reduce the error, consensus scoring is used – the results from several different methods are combined or incorporate machine learning–based rerankers to identify molecules that consistently appear among the top candidates.
- Deep validation. Once the list of hits is short enough, researchers can apply more resource-intensive methods. Molecular dynamics simulations show the stability of the ligand-target complex over time. Free energy calculation methods, such as FEP or MM-GBSA, refine estimates of binding affinity. Deep learning rescoring networks provide independent affinity predictions based on experimental training data.
- Closing the loop. Finally, obtained hits should be synthesized or sourced and experimentally validated. Some drug candidates fail expectations, some demonstrate unexpected results, and only a few compounds exhibit sufficient activity to justify the entire screening campaign. With the advent of AI, experimental results feed back into the models, forming modern closed-loop platforms: compute → synthesize → test → update the model → compute again. It turns drug discovery into a learning cycle rather than a linear pipeline.
Modern virtual screening strategies in drug discovery employ a diverse algorithmic ecosystem, including similarity metrics (Tanimoto, Dice), QSAR models, machine learning classifiers (Support Vector Machine, Random Forest, neural networks), and meta-heuristic optimization algorithms (genetic algorithms, particle swarm optimization, simulated annealing, and Monte Carlo methods), among others. In practice, however, a handful of approaches handle the majority of computational work: similarity searches and QSAR models for rapid pre-filtering, heuristic docking engines for core screening, machine learning/deep learning models for hit classification and rescoring, and molecular dynamics simulations for final validation. The key to successful virtual screening in drug design is not selecting the best algorithm, but rather understanding when to use which one and how to combine them.
Future Directions & Emerging Trends
Virtual screening evolved from a specialized technique to a standard tool for drug discovery. And it isn't standing still. New modern technologies continue to revolutionize this approach.
AI and generative chemistry. Machine learning models already predict compound activity or ADMET properties and rescore docking hits; however, the frontier lies in generative models that design new molecules de novo. Neural networks trained on millions of structures can propose novel scaffolds that are optimized simultaneously for potency, selectivity, and drug-like properties. The challenge is to trust that these generated compounds can be synthesized.
Cloud-scale screening. Cloud-based platforms enable researcher teams to dock billions of molecules in parallel, scale computational resources up or down, and share them across multiple labs. Now the bottleneck is no longer about computing power, but the efficiency of algorithms that can screen trillion-scale libraries without drowning in false positives.
Closing the loop with automation. The most exciting trend isn’t only computational. It’s the integration of robotic synthesis, high-throughput biology, and virtual screening in closed loops. Some drug discovery already does this: compute → synthesize → test → update → repeat. In this model, the line between “virtual” and “real” screening starts to blur.
Beyond docking. Traditional virtual screening mainly focuses on binding at a single protein site. However, drug activity is more complex. Emerging algorithms now predict polypharmacology (where one drug binds to multiple targets), ADMET profiles, and even phenotypic outcomes. They transform virtual screening in drug discovery into an encompassing approach that considers broader aspects of drug behavior in biological systems.
More innovative scoring and fewer false negatives. One of the most significant problems is still scoring functions. Current scores are crude proxies. It is expected that the emergence of hybrid scoring systems – docking energies combined with physics-based free energy methods, accelerated by ML models – will provide more realistic predictions. Another hot topic is reducing false negative results. Better to test a few extra compounds than to miss the breakthrough one.
The following decades will be about making virtual screening of compound libraries faster and smarter. The result will be more efficient experiments, and hopefully, faster routes to medicines that matter.