Cyclic peptide structure prediction now handles non-canonical amino acids

A new J Chem Inf Model paper ↗ introduces HighFold-MeD2, a structure-prediction model for cyclic peptides containing backbone N-methylated and d-amino acids, the modifications that drive the metabolic stability and oral bioavailability of the cyclic-peptide drug class. The model builds on Boltz-2, the open-source structure-prediction model that has emerged over the past year as a successor to AlphaFold2 and AlphaFold3, and extends it to handle the non-canonical amino acids that current open structure tools have struggled with.

Why cyclic peptides matter. Cyclic peptides occupy what the medicinal-chemistry field calls the "middle space" between traditional small-molecule drugs and large biologics. They are large enough to engage protein-protein interfaces small molecules cannot, and small enough to retain oral bioavailability when properly engineered, which biologics typically lack. The marketed examples include cyclosporin (immunosuppressant), vancomycin (antibiotic), and several oncology drugs. The barrier to broader cyclic-peptide drug discovery has been engineering: getting the metabolic stability, membrane permeability, and oral bioavailability right requires non-canonical amino acid modifications (especially backbone N-methylation and d-amino acid substitution) that are difficult to predict computationally.

What HighFold-MeD2 does. The previous generation, HighFold-MeD, was built on AlphaFold2 and used Rosetta's simple_cycpep_predict (SCP) to generate non-canonical-amino-acid training data, then distilled that data into the AlphaFold2 framework for accelerated prediction. HighFold-MeD2 ports the same approach to Boltz-2's Pairformer-Diffusion architecture. The training input is the peptide sequence, the modification information, and the cyclic constraints; the output is all-atom 3D coordinates generated from random initialization to a Rosetta target structure. The structures are then refined via Amber force-field energy minimization to resolve local steric clashes.

The technical advance. HighFold-MeD2 represents non-canonical amino acids through universal Chemical Component Dictionary (CCD) entries rather than the labor-intensive hard-coding that the AlphaFold2-based predecessor required. The result is a much more extensible framework: adding a new non-canonical amino acid does not require retraining the model from scratch. The paper reports the model outperforms HighFold-MeD and other state-of-the-art baselines on cyclic-peptide structure prediction tasks, while still requiring a fraction of the computational cost of pure Rosetta SCP.

How this fits the broader peptide-AI landscape. The news section has been tracking the protein-language-model and structure-prediction ecosystem across multiple pieces. The RoBERTcr TCR-peptide language model ↗ on April 27 demonstrated that specialized peptide-domain models outperform general protein language models in their domain. HighFold-MeD2 extends the same lesson to structure prediction: specialized cyclic-peptide models with non-canonical-amino-acid handling outperform general structure-prediction tools like AlphaFold3 and Boltz-2 baseline on cyclic-peptide tasks. The pattern is consistent across both tasks the field cares about (sequence-based prediction and structure-based prediction): family-specialized models beat general ones in their domain.

What this enables. Cyclic-peptide drug discovery has been a slow-moving frontier despite the well-known drug-like properties of the modality, because the design-build-test cycle requires accurate prediction of how a candidate will fold and what conformational ensemble it will sample. With HighFold-MeD2, that prediction is now substantially cheaper and more accessible. The CCD-based representation means a researcher designing a novel cyclic peptide with arbitrary non-natural-amino-acid modifications can get a high-quality structural prediction without writing custom Rosetta hooks. The cyclic-peptide drug discovery field is one of the better candidates to benefit from accessible AI tooling, since the design problem is well-defined and the modality has clear pharmacological advantages over both small molecules and biologics.

The platform read. Peptidemodel hosts a substantial corpus of peptide cards, including a number of cyclic-peptide candidates. Tools like HighFold-MeD2 are exactly the kind of computational infrastructure that lets researchers iterate on a card's design with confidence in the resulting structural prediction. As the field continues to consolidate around open-source structure prediction (Boltz-2 specifically, with its accessible licensing) and family-specialized models (HighFold-MeD2 for cyclic peptides, RoBERTcr for TCR-peptide interactions, and so on), the platform's role as a candidate-curation resource benefits from the surrounding tooling becoming more reliable and more extensible.

What still has to scale. The current paper validates HighFold-MeD2 on a benchmark of cyclic peptides with N-methyl and d-amino-acid modifications. Real drug-discovery applications will involve larger and more chemically diverse modification sets, including stapled peptides, bicyclic peptides, peptide-conjugates, and peptide-PROTAC architectures. Whether HighFold-MeD2 generalizes to those classes without further architectural work is an open question. The CCD-based representation is encouraging because it can in principle accommodate any defined chemical building block, but the empirical generalization needs to be tested.

Cyclic peptide structure prediction now handles non-canonical amino acids

Sources