1,785 new microproteins just expanded the human proteome by 10%

A new Nature paper ↗ from the international TransCODE Consortium reports that roughly 25% of 7,264 non-canonical open reading frames (ncORFs) in the human genome give rise to detectable peptides, based on a meta-analysis of 95,520 proteomics experiments. The work identifies 1,785 previously unrecognized microproteins and codifies a new conceptual category for them: "peptideins." The paper expands the human proteome by approximately 10% and sets the reference annotation standard for ncORF-encoded microproteins going forward.

The methodology. TransCODE was launched in 2022 with members from GENCODE (the gene-annotation consortium that defines the canonical human reference), PeptideAtlas (the central proteomics data repository), and HUPO-HPP and HUPO-HIPP (the human proteome project organizations). The current paper is a meta-analysis of 95,520 mass-spectrometry-based proteomics experiments deposited across public repositories, scanned systematically for evidence of translation from non-canonical open reading frames, including upstream ORFs, downstream ORFs, internal ORFs, and ORFs from supposedly non-coding RNA. The 25% detection rate (1,785 out of 7,264 ncORFs producing detectable peptides) is much higher than what individual studies have reported for ncORF translation, and reflects the statistical power of pooled cross-experiment evidence.

The peptideins concept. Most of the newly catalogued microproteins are under 50 amino acids and have no detectable similarity to traditional proteins. They cannot be functionally classified by homology, which is the standard way new proteins enter the literature. The TransCODE team introduces "peptideins" as a working category for microproteins with indeterminate functional potential. The category is explicitly orthogonal to the established categories of "protein" (well-characterized, structurally and functionally), "peptide" (short bioactive sequence with characterized binding partner or mechanism), and "non-coding" (the assumption-default for short ORFs without homology). The term recognizes that something is being made, but stops short of asserting that it has a function.

Why this matters for therapeutics. Three implications stand out. First, the human proteome's drug-target landscape just expanded. Each peptidein is a candidate target for a binding partner, a candidate ligand for a known target, or a candidate drug in its own right. Some will turn out to be functional and important; many will not; the work begins. Second, the peptide therapeutics field has a long history of mining the natural peptidome for drug leads (insulin, GLP-1, oxytocin, somatostatin, the amylin family, the calcitonin family, dozens of others). A 10% expansion of the source material is a meaningful expansion of the discovery space. Third, the methodology demonstrates that pooled evidence across thousands of proteomics experiments can rescue signal that individual studies miss, which is a template that can be applied iteratively as more data accumulates.

The platform read. The peptidemodel platform's card corpus is anchored to defined therapeutic peptides with documented receptor binding, mechanistic claims, or candidate-drug status. The TransCODE peptideins are upstream of that classification: they are detected molecular entities without yet having functional characterization. As individual peptideins move from detection to function (through biochemical follow-up, knockout phenotypes, binding assays), they will become candidates for the platform's broader corpus. The current paper is the reference list that defines the discovery pipeline going forward.

The broader scientific context. The 10% proteome expansion is not the only recent rewrite. The past five years have seen recognition that the human transcriptome encodes substantially more polypeptides than the canonical Ensembl/RefSeq annotations document, that many of these polypeptides are tissue-specific or stress-induced, and that some have functional roles in disease (cancer, neurodegeneration, immunology) that conventional gene annotation missed. TransCODE is the consortium-scale formalization of those individual findings into a single reference framework. The paper is the kind of foundational document that downstream studies will cite for years.

What this is not. A drug discovery paper. The 1,785 peptideins are detected molecules with characterized translation evidence, not characterized therapeutics. Translating any individual peptidein into a drug candidate requires the standard sequence of biochemical, cellular, and animal studies that establish function and tractability. The paper is the discovery substrate, not the drug.

What 2026-2027 reveals. Three categories of follow-up will determine whether the peptideins concept holds. First, functional characterization studies that begin classifying peptideins by mechanism or binding partner. Second, evolutionary and tissue-specificity analyses that distinguish noise (translation artifacts) from signal (biologically constrained sequences). Third, drug-discovery efforts that target individual peptideins or their binding interfaces and produce tractable lead compounds. The TransCODE consortium has set the reference; the field's response to it determines the impact.

1,785 new microproteins just expanded the human proteome by 10%

Sources