How Peptide Sequencing Drives Drug Discovery and Biomarker Validation

How Peptide Sequencing Drives Drug Discovery and Biomarker Validation

Page Contents View

    Why Peptide Sequencing Matters in Drug Discovery

    Peptide sequence analysis, particularly peptide sequencing based on mass spectrometry technology, has evolved from a laboratory tool into an indispensable core engine in modern drug discovery and biomarker research, thanks to its high sensitivity, high throughput, and precise resolution.

    Peptide sequencing has become the core driving force behind the entire process of modern drug discovery and biomarker research. Whether it is rapidly identifying active molecules in synthetic peptide library screening, revealing the precise blueprint of immune recognition in antigen recognition, or confirming structure, optimizing performance, and ensuring quality in biopharmaceutical development, peptide sequence analysis plays an irreplaceable role at every critical stage.

    As breakthroughs continue to overcome technical bottlenecks in ultra-sensitive mass spectrometry, ion mobility separation, AI-driven spectrum analysis, single-cell/spatial technologies, and multi-omics integration, peptide sequence analysis will continue to serve as the core engine for decoding life information and driving drug innovation, playing an increasingly critical role in conquering complex diseases and achieving precision health.

    Start with the core principles, click on the article [Peptide Sequencing: Principles, Techniques, and Research Applications]

    Use Case 1: Target Discovery and Functional Validation

    The core value of peptide sequencing lies in its ability to accurately decipher information about the constituent fragments of proteins, providing the most direct and reliable evidence for understanding protein function, interactions and post-translational modifications. Its role is especially critical in the initial stage of drug discovery and validation.

    Resolving Protein-Protein and Protein-Drug Interactions

    Interaction interface resolution

    When potential target proteins or their interaction partners are identified by techniques such as yeast two-hybridization, Co-IP, pull-down, etc., peptide sequencing can accurately identify the peptide sequences in the binding complexes, thus pinpointing the direct interface of protein-protein, protein-nucleic acid or protein-small molecule interactions. This provides a molecular basis for understanding signaling pathway mechanisms and designing intervention strategies.

    Epitope mapping for antibody and vaccine development

    For antibody drug or vaccine development, it is critical to identify the precise region on the antigen that is specifically recognized by the antibody or the presenting peptide that is recognized by T cells. Peptide sequencing is the gold standard technology for identifying these functional epitopes, especially conformational or modification-dependent epitopes.

    Post-translational modification mapping

    phosphorylation, glycosylation, acetylation, ubiquitination, and other PTMs are central switches that regulate protein activity, localization, stability, and interactions. Peptide sequencing combined with enrichment technologies (e.g. phosphorylated peptide enrichment) can map the PTMs of specific proteins or proteomes, revealing disease-associated aberrant modification sites, which may themselves become promising novel drug targets.

    Target Validation and Functional Correlation

    Validating gene perturbation outcomes by quantitative proteomics

    after knockout/knock-in, RNAi knockdown or overexpression of a specific gene using CRISPR-Cas9, peptide sequencing can quantify the changes of the target protein and its interacting proteins, PTMs, and directly validate the target's function in the pathway and its impact on the downstream molecular network, confirming its biological relevance

    Differential proteomics for disease-associated targets

    By comparing the expression profiles and modification profiles of proteins/peptides in tissues, cells, or body fluids between disease groups and control groups (differential proteomics/differential modificationomics), peptide sequence analysis can identify proteins or peptides that are significantly differentially expressed or undergo specific modifications. These differentially expressed molecules may be disease drivers or key regulatory nodes and are the focus of subsequent target validation.

    Phosphoproteomics for pathway activity monitoring

    Using quantitative phosphoproteomics, we precisely locate the active site modifications (such as phosphorylation) of key signaling molecules (such as kinases and transcription factors) and directly monitor the regulatory effects of target interventions on pathway activity (such as ERK phosphorylation inhibition) to establish a causal chain between "target-pathway activity-phenotype."

    Molecular basis analysis of target-mediated cellular phenotypes

    Through global proteomics and PTM analysis, systematically analyze the differential protein expression, PTM events, and interaction network changes (such as apoptosis-related protein cleavage and metabolic enzyme modifications) triggered by target perturbations, anchoring cellular phenotypes to specific molecular mechanisms, eliminating off-target effects, and deepening functional validation.

    Use Case 2: Screening Synthetic Peptide Libraries

    Scenario description

    Libraries containing thousands to millions of synthetic peptides of different sequences are often screened in drug discovery (e.g., peptide drugs, peptide vaccines, epitope-based inhibitors) or antibody/receptor ligand identification. The goal is to find peptide sequences with high affinity or specific functional activity for the target molecule (e.g., antibody, receptor, enzyme).

    Role of peptide sequencing:

    • Positive "hit" identification: Screening results in a large number of potentially active peptides (e.g., peptides bound to a target) whose specific sequence is unknown. High-throughput mass spectrometry (e.g., MALDI-TOF/TOF or ESI-MS/MS) can quickly and accurately determine the sequence of these positive peptides and clarify their chemical structures.
    • Preliminary structure-activity relationship analysis: By analyzing the sequence characteristics of the peptide (e.g., conserved amino acids, key residues), the key sites affecting the activity can be preliminarily deduced, which can guide the subsequent peptide optimization.
    • Library quality control: Verify whether the actual composition of the synthesized peptide library conforms to the design.

    Advantage

    Fast, high throughput, good accuracy, indispensable for decoding large-scale screening results.

    Use Case 3: Neoantigen and Immune Epitope Identification

    Scenario description

    Accurate identification of antigenic peptides that can be specifically recognized by the immune system is critical in personalized tumor vaccines, infectious disease vaccine development, and autoimmune disease research. Tumor neoantigens arise from somatic mutations and pathogen antigens from exogenous proteins.

    Role of peptide sequencing

    • HLA-presenting peptidomics: The use of immunoprecipitation (e.g., antibodies against specific HLA alleles) combined with high-sensitivity mass spectrometry to isolate and sequence peptides presented by MHC class I/II molecules directly from the surface of cells (e.g., tumor cells, pathogen-infected cells). This is the gold standard method for discovering T-cell epitopes that are actually "seen" by the immune system.
    • Neoantigen identification: By comparing the HLA peptidomes of tumor cells with those of normal cells, combined with tumor genomic/transcriptomic data, peptides derived from tumor-specific mutations and actually presented are identified for personalized vaccine design.
    • Pathogen antigenic epitope discovery: Identify key immunogenic peptides in pathogen proteins that are presented by host HLA.
    • Epitope validation: Synthesize candidate epitope peptides and confirm their sequence and purity by peptide sequencing for in vitro or in vivo immunogenicity validation.

    Key Challenges and Advances

    Sensitivity, prediction accuracy (direct identification by mass spectrometry is superior to pure algorithmic prediction), difficulty in peptide isolation (high complexity of HLA peptidome). High-resolution mass spectrometry and advanced bioinformatics are continuing to break through these bottlenecks.

    Use Case 4: Characterizing and Optimizing Peptide Drug Candidates

    Scenario description

    The precise structure (including amino acid sequence, disulfide bonding, PTMs) of peptide drugs (e.g., GLP-1 analogs, antimicrobial peptides), antibody-drug couplings, fusion proteins, and other biologics containing peptides directly determines their activity, stability, safety, and durability.

    Role of peptide sequencing:

    • Primary structure confirmation: is a mandatory requirement for biopharmaceutical filing. Peptide sequencing (Top-Down MS or more commonly Bottom-Up MS, i.e., enzymatic sequencing) is the core technology to confirm the integrity and correctness of the amino acid sequence.
    • Disulfide bond localization: Mass spectrometry is used to accurately identify the location of attached cysteines, which are critical for maintaining proper protein folding and function, through different enzymatic strategies (e.g., non-reduced vs. reduced) in combination with peptide map analysis.
    • Post-translational modification characterization: Accurately locate and quantify product-related PTMs (e.g., N-glycosylation sites vs. glycoforms, C-terminal lysine truncation, deamidation, oxidation, etc.). These modifications may affect drug potency, immunogenicity, clearance and stability.
    • Degradation product analysis: In stability studies, peptide sequencing can identify impurities or fragments of degradation products due to chemical degradation (e.g., deamidation, oxidation) or enzymatic degradation, and identify degradation pathways and sites.
    • Structure optimization guidance: In the early stage of drug development, peptide sequencing can be used to analyze the relationship between structure and activity, and guide sequence engineering modifications (e.g., substitution of easily oxidized methionine, introduction of non-natural amino acids, cyclization, PEGylation site selection) to optimize the stability, activity, half-life, and reduction of immunogenicity of the drug.

    Core technology

    LC-MS/MS peptide map analysis is a central pillar of quality control (QC) and characterization (CQA) of biologics.

    MS target analysis for Volpe-CC in P. tapulusMS target analysis for Volpe-CC in P. tapulus (Figure from Heather G. Marco, 2022)

    Research Ideas and Project Path Analysis

    Research Idea Map

    The Core Role and Process of Peptide Sequencing in Biomedical ResearchThe Core Role and Process of Peptide Sequencing in Biomedical Research

    Case Study: De Novo Sequencing of SA923 Anticancer Peptide

    Background

    • Peptides involved: The study focuses on cyclic antimicrobial peptides (AMPs) from the traditional medicinal plant Sphaeranthus amaranthoides, particularly orbitides (such as SA923). These peptides do not contain disulfide bonds and are cyclized via C-N terminal peptide bonds.
    • Challenges and Needs: The fight against cancer requires novel, highly effective, and low-toxicity drugs. Plant-derived cyclic peptides have garnered significant attention due to their structural stability and strong targeting properties. However, the peptide resources of S. amaranthoides remain underdeveloped, and traditional methods struggle to decipher unknown peptide sequences (due to the absence of a genomic database). An efficient strategy is needed to explore its anticancer potential.

    Analytical Methods

    • Mass spectrometry: Liquid chromatography-electrospray ionization mass spectrometry (LC-ESI-MS) combined with manual de novo sequencing.  
    • Key highlights: No database dependency; for unsequenced plants, amino acid sequences were manually resolved using CID fragment ions (b/y ions), successfully identifying three peptides including SA923. Cyclization structure validation: Based on mass spectrometry fragment deficiencies (e.g., absence of terminal ions) and computational modeling (PEPstrMOD), SA923 was confirmed as a C-N cyclized peptide (GLU1-ASP7 forming a peptide bond), challenging the traditional understanding of disulfide bond cyclization.
    • Anticancer activity pre-screening: Combining SVM algorithms (AntiCP server), SA923 was predicted to have the highest anticancer potential, guiding subsequent experimental validation.

    Key Findings

    • A total of 86 novel peptides were identified from the natural herbal plant S. amaranthoides. Among these, three peptides were characterized, and their amino acid sequences were determined using a manual de novo strategy. Based on computational analysis, SA923 was predicted to belong to the orbitides family.
    • Novel cyclic peptide structure: SA923 (sequence ELVFYRD) is the first Asteraceae-derived cyclic peptide, formed through GLU1-ASP7 peptide bond cyclization (non-classical disulfide bond), and exhibits high thermal stability.
    • Highly effective anticancer activity: In vitro experiments showed that SA923 inhibited 3T3 cells by 89% (at 160 ng/mL), significantly higher than the linear peptide SA626 (37%); zebrafish embryo tests confirmed its low toxicity (no developmental abnormalities).

    Significance

    Provides a novel candidate molecule for anticancer drug development. SA923's unique cyclized structure enhances stability and targeting, with a wide safety margin. Demonstrates the effectiveness of de novo sequencing strategies in the discovery of unsequenced plant peptides, advancing the modernization of traditional medicinal plant resources.

    Denovo sequencing of (A) linear peptide SA626 and (B) cyclic peptides SA 923.4 and (C) SA 905.4Denovo sequencing of (A) linear peptide SA626 and (B) cyclic peptides SA 923.4 and (C) SA 905.4 (Figure from Swarnalatha Yanamadala, 2023)

    Challenges and Emerging Trends in Peptide Sequencing

    Challenge

    Limits of sensitivity

    Detection of very low abundance peptides (e.g., peptides at the level of individual cells, low concentrations of biomarker peptides in the circulation).

    Dynamic range of complex samples

    High-abundance proteins in body fluids (e.g., plasma) interfere significantly with low-abundance target peptides.

    Depth of coverage and throughput

    Technological advances are still needed to achieve deeper and faster proteome/peptidome coverage.

    Complexity of data analysis

    Processing, annotation, quantification, and biological interpretation of large amounts of mass spectrometry data place high demands on bioinformatics.

    Absolute quantification

    It is still challenging to realize absolute quantification with high accuracy without standards.

    New Technologies: Single-Cell, AI, and Multi-Omics Peptidomics

    Ultra-high sensitivity mass spectrometry platforms

    such as timsTOF series (PASEF technology), Orbitrap Astral, etc., significantly improve scanning speed and sensitivity, enabling single-cell proteomics/peptidomics and trace biomarker detection.

    Novel ionization and separation technologies

    nanoESI (nano-spray ion source), CE-MS (capillary electrophoresis)-MS, IM (ion mobility) separation (e.g., TIMS, FAIMS) to further improve the separation efficiency and selectivity, and enhance the detection of target peptides in complex matrices.

    Artificial Intelligence and Machine Learning:

    • Spectra prediction and resolution: Deep learning models (e.g. DeepNovo, Prosit) can more accurately predict peptide MS/MS spectra or directly extrapolate sequences from the spectra, which can dramatically improve the identification rate, speed, and accuracy, especially for peptides containing PTMs or off-label peptides. o Neoantigen prediction and optimization: AI can further improve the efficiency and selectivity of peptide detection in complex matrices.
    • Optimization of neoantigen prediction: AI integrates multi-omics data and clinical immune response data to significantly improve the accuracy of neoantigen prediction.
    • Data mining and biomarker discovery: Intelligent mining of clinically meaningful patterns from massive proteomic data.
    • Single-cell/spatial peptidomics: Combining single-cell isolation technology and ultra-high sensitivity mass spectrometry to reveal differences in peptide expression and presentation in cellular heterogeneity; spatial proteomics technologies (e.g., advances in MALDI Imaging mass spectrometry, antibody-based spatial genomics) provide key information on the distribution of peptides/proteins in the tissue in situ.
    • Integration of multi-omics analysis: Systematic integration of peptidome/proteome data with genomic, transcriptomic, and metabolomic data to build a more comprehensive map of disease molecular networks, and to discover more reliable combinations of targets and biomarkers.
    • Targeted peptidomics/proteomics: For example, the wide application of parallel reaction monitoring (PRM) and data non-dependent acquisition (DIA/SWATH) technologies can realize highly reproducible and accurate quantification of specific target peptides/proteins, and accelerate the translation and validation of biomarkers in the clinic.

    References

    1. Marco, H. G., et al. (2023). Mass Spectrometric Proof of Predicted Peptides: Novel Adipokinetic Hormones in Insects. Molecules.
    2. Yanamadala, S., et al. (2023). Biological Activity of Cyclic Peptide Extracted from Sphaeranthus amaranthoides Using De Novo Sequencing Strategy by Mass Spectrometry for Cancer. Biology.
    3. Gaspar, K., et al. (2018). Development of Novel Free Radical Initiated Peptide Sequencing Reagent: Application to Identification and Characterization of Peptides by Mass Spectrometry. J. Am. Soc. Mass Spectrom.

    For research use only, not intended for any clinical use.

    Online Inquiry