A venom classifier picked 28 new potassium channel candidates from 5,165 sequences.

A team trained a classifier on 5,165 cysteine-rich venom peptide sequences and used it to flag 28 new candidates as potassium-channel binders. The peptides span sea anemones, snakes, scorpions, spiders, cone snails, and terebrid sea snails. The tool is named the Molecular Arms Race Classifier (MARC), published online May 13 in Digital Discovery ↗. The 28 hits are not assays but predictions, and the news in the paper is what those predictions are built on: the structural shapes that block ion channels keep recurring across radically different animal lineages, and a sequence-blind classifier can pick them out.

Animal venoms have already given the clinic eptifibatide ↗ as an antiplatelet drug used during heart procedures, derived from pygmy rattlesnake venom, alongside other peptide drugs for severe chronic pain. Most novel venom peptides, however, come out of transcriptome sequencing with no assigned target, and most never get one. The bottleneck for turning more of them into drugs is figuring out which channel any given peptide actually hits before running the slow lab assays.

The premise behind MARC is the title's arms race. Venoms in distantly related lineages evolved independently, but kept landing on the same handful of receptors and channels because those are the targets that actually matter for incapacitating prey or warding off predators. Sequences diverge fast under that pressure. The receptor-facing surface geometry has to stay close to optimal or the toxin does not work. If that picture is right, a classifier built on structure-aware embeddings should recover the target assignment even when two peptides look superficially nothing alike.

What MARC actually does is two-step. First, each peptide sequence is run through Evolutionary Scale Modeling (ESM), a protein language model that produces a vector representation incorporating the structural and evolutionary context the residues are usually found in. Then a random forest classifier sorts those vectors into one of four buckets: sodium channels, potassium channels, calcium channels, or none of the above. The choice of ESM matters because protein language models embed sequence-distant homologs near each other in feature space, which is the same convergence MARC is trying to predict. Random forest on top is a deliberately simple classifier head, which keeps the pipeline cheap to retrain and easy to inspect.

One of the 28 newly flagged candidates, a terebrid sea snail teretoxin called Cje1.9, was carried through to computational validation. The authors docked Cje1.9 against two model potassium channels (KcsA from a soil bacterium and MthK from an archaeal thermophile) and ran molecular dynamics on the best docked poses to test whether the binding mode survived a nanosecond or so of simulated thermal motion. It did. That is not the same thing as a patch-clamp recording showing actual current block in a mammalian neuron, but it is consistent with the prediction holding up under physical scrutiny.

The caveats are real. None of the 28 candidates have been expressed, folded, or tested for actual channel block in cells. The validation pair (KcsA, MthK) was chosen because their structures are exceptionally well characterized, not because either is itself a therapeutic target. Mammalian potassium channels are more diverse and more selectively druggable than the bacterial pair the model docked against. And the whole pipeline is scoped to cysteine-rich peptides with disulfide-stabilized folds, which is the dominant venom architecture but not the only one. Linear or post-translationally modified venom peptides would need their own model.

Even with those limits, the result is a structural argument about why venom libraries are still worth mining. Across radically different animal lineages, the residue patterns that block ion channels keep falling into recognizable shapes. A classifier that picks them out from raw transcriptome reads compresses what is usually months of phenotypic screening into a sorted shortlist. The 28 teretoxin candidates are now downstream work for whoever wants to express them. The structural news is that the arms-race premise survived a real test on a dataset spanning six animal taxa.

For peptide chemists building scaffolds against ion channels, the upstream implication is that high-throughput venom-gland sequencing plus a sequence-blind classifier can stand in for low-throughput pharmacology in the candidate-discovery phase. Potassium channels remain comparatively underexplored as a peptide-drug target family, partly because hand-screening venoms for K+ activity has been slow. If MARC's 28 leads, or others like them, run the experimental gauntlet, the supply of viable starting points for ion-channel programs gets larger without anyone needing to milk another scorpion.

A venom classifier picked 28 new potassium channel candidates from 5,165 sequences.

Related cards

Sources