Peptide Sequencing: Principles, Techniques, and Research Applications

Peptide Sequencing: Principles, Techniques, and Research Applications

Page Contents View

    Proteins are the primary executors of biological processes, and their precise amino acid sequences determine structure, function, and regulation. However, sequencing intact proteins directly is technically challenging. Peptide sequencing addresses this by digesting proteins into smaller, manageable fragments using specific proteases (e.g., trypsin), followed by sequence determination and analysis.

    This strategy reduces complexity and enables redundant validation through multiple peptide fragments, ensuring robust and accurate results. Today, peptide sequencing—especially tandem mass spectrometry (MS/MS)—has become a cornerstone of proteomics, supporting applications from protein identification to biomarker discovery.

    With advances in high-resolution mass spectrometry and bioinformatics algorithms, peptide sequencing continues to evolve as a critical tool for modern biology and drug development.

    Overview of Peptide Sequencing

    What is Peptide Sequencing?

    Peptide sequencing refers to the process of analyzing proteins or their peptide fragments to determine the sequence of their amino acids. Protein molecules are composed of multiple amino acid residues connected in a specific order, and peptide sequencing technology analyzes these amino acid sequences to reveal the structure and function of proteins.

    Why Peptide Recognition is Crucial

    Protein Identification

    This is the most fundamental and important application. By obtaining peptide sequence information, it can be compared with known protein sequence databases to identify which proteins are present in the sample. For example, when comparing the proteomes of healthy and diseased tissues, peptide sequencing can reveal differentially expressed proteins.

    Post-translational modification (PTM) localization and characterization

    Protein function is often precisely regulated by PTMs such as phosphorylation, glycosylation, and acetylation. These modifications cause specific shifts in peptide molecular weight (e.g., phosphorylation +80 Da, oxidation +16 Da). Peptide sequencing not only detects the presence of modifications but also precisely locates them to specific amino acid residues, which is crucial for understanding signaling pathways and disease mechanisms (e.g., abnormal phosphorylation in cancer).

    Mutant and splicing variant detection

    Gene mutations or alternative splicing can produce protein variants that differ from the standard sequence. Peptide sequencing can detect peptides containing specific mutation sites or splicing junctions, thereby identifying these variants. For example, detecting specific mutations in oncogenic driver genes in tumor research.

    Protein quantification

    Combined with stable isotope labeling (e.g., SILAC, TMT) or label-free techniques, peptide sequencing data can be used to compare changes in peptide abundance under different conditions (e.g., treatment vs. control, disease vs. healthy), thereby inferring differences in the expression levels of corresponding proteins.

    The Main Challenges in Peptide Identification

    Complexity

    Biological samples typically contain thousands of different proteins, and the peptide mixtures produced after enzymatic digestion are extremely complex. Powerful separation techniques (such as high-performance liquid chromatography, HPLC) and high-resolution analytical instruments are required to address this challenge.

    Wide dynamic range

    The abundance of different proteins in a sample can vary greatly (up to 10 orders of magnitude), and the peptide signals from high-abundance proteins may overwhelm those from low-abundance but biologically significant target peptides.

    Isomers and isobaric amino acids

    Leucine (Leu) and isoleucine (Ile) have the same molecular weight, making them difficult to distinguish using conventional mass spectrometry; lysine (Lys) and glutamine (Gln) may also be indistinguishable in certain cases (with a mass difference of only 0.036 Da), requiring specific fragmentation techniques or more in-depth analysis.

    Diversity and dynamism of PTMs

    PTMs are highly diverse, with varying chemical properties, low abundance, and spatiotemporal dynamics, posing significant challenges for their comprehensive and accurate detection and localization.

    Complexity of data analysis

    The massive volume of raw mass spectrometry data requires powerful bioinformatics algorithms and databases for interpretation, matching, scoring, and validation, a computationally intensive process that demands specialized expertise for interpretation.

    Main Peptide Sequencing Methods: Edman, PMF, and MS/MS

    The core technology of peptide sequencing has evolved from early chemical degradation methods to the current era dominated by mass spectrometry.

    Edman Degradation

    Principle Overview

    Based on chemical degradation, amino acids are progressively cyclically cleaved from the N-terminus of the peptide chain. In each reaction cycle, the terminal amino group is labeled with phenyl isothiocyanate (PITC), and the first modified amino acid derivative is specifically removed under acidic conditions. The released amino acids are identified via chromatography.

    Application Scenarios

    Suitable for sequencing purified short peptides (<50 aa) with an open N-terminus, particularly in quality control scenarios where the N-terminal sequence must be clearly defined.

    Advantages and disadvantages

    The advantage lies in providing direct, unambiguous amino acid sequence information that can be quantified; however, the process is extremely slow (hours per cycle), cannot handle N-terminal acetylation or other closed modifications, is ineffective for long peptide segments or complex mixtures, and requires a significant amount of sample.

    Peptide Mass Fingerprinting (PMF)

    Principle Overview

    Utilizes proteases (e.g., trypsin) to specifically cleave proteins into a mixture of peptide fragments. Mass spectrometry (commonly MALDI-TOF) is used to precisely determine the mass of all peptide fragments, forming a unique "mass fingerprint." This is then matched against theoretical proteolytic peptide masses in a database to identify proteins.

    Application Scenarios

    Suitable for rapid identification of single purified proteins (e.g., gel spot analysis) or microbial classification identification (requires comprehensive database support).

    Advantages and Limitations

    Simple and rapid operation with low instrument requirements; however, it heavily relies on database completeness, cannot identify new proteins or proteins with unknown modifications/mutations, has weak mixture analysis capabilities, and cannot provide sequence site information.

    Tandem Mass Spectrometry (MS/MS)

    Principle overview

    Combines liquid chromatography separation with tandem mass spectrometry technology. The primary mass spectrometer screens peptide parent ions, which are then fragmented via collision-induced dissociation (CID/HCD) or electron transfer dissociation (ETD). The resulting fragment ion spectra are analyzed: theoretical sequences are matched via database searches, or amino acid sequences are directly inferred via de novo sequencing.

    Application Scenarios

    Covers the majority of proteomics research, including protein identification in complex samples, PTM localization, mutation detection, and new peptide discovery.

    Advantages and Limitations

    Combines high sensitivity (fmol level), high specificity (provides sequence evidence), and modification analysis capabilities; however, it has high instrument costs, complex data analysis, database searches rely on prior knowledge, and de novo sequencing has stringent requirements for spectrum quality.

    Mass spectrometry-based Edman-like Method

    Principle overview

    Combines Edman chemical principles with the advantages of mass spectrometry detection. Chemical reagents (such as PITC) are used to label the N-terminal amino acid of the peptide segment. After cleavage, characteristic mass loss or derivatized fragments are detected by mass spectrometry to read the N-terminal sequence.

    Application Scenarios

    Suitable for sequencing challenges involving N-terminally capped peptides (requiring an additional decapping step) or as an alternative upgrade to traditional Edman methods.

    Advantages and Limitations

    Compared to traditional Edman methods, it offers faster speed and higher sensitivity, and can analyze capped modified samples; however, it still requires multiple chemical reactions, has lower throughput than conventional MS/MS, and has a narrower application range, primarily serving as a supplementary technique.

    Get the fundamentals down first. [How to Choose the Right Peptide Sequencing Technology]

    MS/MS Peptide Sequencing Workflow

    Tandem mass spectrometry is currently the dominant technique for peptide sequencing. Its standard workflow can be summarized as follows:

    Sample Preparation

    Protein extraction

    Extract total protein or target protein from samples such as cells, tissues, or bodily fluids.

    Reduction and alkylation

    Use a reducing agent (e.g., DTT) to break disulfide bonds, then use an alkylating reagent (e.g., iodacetamide IAM) to block free sulfhydryl groups, preventing them from reforming disulfide bonds and improving proteolytic efficiency.

    Enzymatic Digestion

    Using specific proteases (most commonly trypsin, which cleaves the carboxy-terminal ends of arginine and lysine residues) to cleave proteins into peptide fragments suitable for mass spectrometry analysis (typically 5–25 amino acids in length).

    Desalting/Cleanup

    Remove salts, detergents, and other components from the digestion buffer that interfere with mass spectrometry analysis.

    Liquid Chromatography (LC)

    Inject the complex peptide mixture into a reverse-phase liquid chromatography column (typically C18 packing material).

    Utilizing the differences in hydrophobicity among the peptides, they are eluted in order of increasing hydrophobicity under a gradient elution of the mobile phase (water/acetonitrile, containing ion-pairing reagents such as formic acid). The core function is to reduce the complexity of the peptides entering the mass spectrometer, avoid ion suppression effects, and enhance detection sensitivity and resolution.

    Mass Spectrometry Acquisition

    Ionization

    Converts peptides eluted from the liquid phase into charged ions in the gas phase.

    First-order mass spectrometry scan (MS1)

    Analyzes the mass-to-charge ratio (m/z) and intensity of all peptide ions after ionization, yielding the total ion current (TIC) and mass spectrum of the peptide mixture.

    Precursor ion selection

     Selects one or more peptide ions (precursor ions) from the MS1 spectrum for subsequent analysis based on a predefined strategy.

    Fragmentation

    The selected parent ions are fragmented through energy collisions or other methods. Common fragmentation techniques include collision-induced dissociation (CID) and high-energy collision dissociation (HCD).

    Secondary mass spectrometry scanning (MS2/MS/MS)

    Analyzing the m/z and intensity of all fragment ions produced after parent ion fragmentation to generate the fragment ion spectrum (MS/MS Spectrum) of the peptide segment. This is the direct source of peptide sequence information.

    Data Analysis and Interpretation

    Raw Data Processing

     Convert the raw data files output by the mass spectrometer (.raw, .d, .wiff, etc.) into standard formats (e.g., .mzML).

    Database Search:

    Set the protein sequence database (e.g., UniProt, NCBI nr). Set search parameters: enzyme specificity, maximum allowed missed cleavage sites, fixed modifications, variable modifications, and parent ion mass tolerance (ppm). Use database search software to compare and score the MS/MS spectra obtained from the experiment with the theoretical fragment spectra of theoretical peptides in the database. Set thresholds to filter out reliable peptide-spectrum matches (PSMs) for protein inference.  

    De novo Sequencing

    For scenarios where database searches cannot resolve the issue, algorithms (such as PEAKS or pNovo) are used to directly derive peptide sequences from high-quality MS/MS spectra.  

    Result validation and reporting

    Key metrics are checked, including the number of identified peptides/proteins, sequence coverage, confidence scores, mass errors, modification site localization probabilities, and quantitative results, and a comprehensive report is generated.

    For an in-depth look at how interface design optimizes MS/MS workflows, explore our article on [Peptide Sequencing Workflow Design: Optimizing Sample Prep, LC-MS/MS, and Bioinformatics]

    Typical Applications of Peptide Sequencing

    Protein Identification and Quantification

    Mass spectrometry-based peptide sequencing technology enables high-sensitivity identification and precise quantification of trace proteins in complex biological samples (such as plasma and tissue lysates). The core of this technology lies in converting proteins into detectable peptide fragments through specific enzymatic digestion, distinguishing homologous protein isoforms using the mass accuracy of mass spectrometry, and achieving absolute quantification through isotope labeling technology. This capability is particularly suitable for identifying low-abundance protein biomarkers in the early diagnosis of diseases such as cancer, providing molecular-level evidence for liquid biopsy.

    Sequence Variation Analysis

    Peptide sequencing can precisely identify subtle variations in amino acid sequences, including single-nucleotide polymorphisms (SNPs), insertions, and deletions. By analyzing fragment ion patterns using high-precision tandem mass spectrometry and combining them with de novo sequencing algorithms, it is possible to effectively distinguish tumor-specific mutant peptides from normal sequences. This capability is crucial for identifying tumor neoantigens and provides targeted peptide sequence information for personalized cancer vaccine design.

    Post-Translational Modification (PTM) Localization

    Precise localization of post-translational modifications relies on characteristic responses in peptide fragmentation patterns. Techniques such as electron transfer dissociation (ETD) can preserve modified groups like phosphorylation and glycosylation, with modification sites determined by mass shifts in fragment ions. Nanopore single-molecule technology utilizes current feature changes caused by modifications for label-free detection. Combining these two approaches enables high-coverage analysis of complex modification networks.

    Discovery of New Peptides and Micropeptides

    Traditional genome annotation often overlooks micropeptides encoded by small open reading frames (sORFs). By integrating ribosome mapping (Ribo-seq) with high-sensitivity mass spectrometry validation, peptide sequencing can systematically identify novel functional micropeptides. These micropeptides are often involved in cellular metabolic regulation and signal transduction processes, and their discovery significantly expands our understanding of the functional mechanisms of non-coding RNA.

    Antibody Sequencing and Drug Development

    The development of antibody drugs relies on the precise analysis of the amino acid sequences in their variable regions. Mass spectrometry-based peptide sequencing technologies (such as HCD/ETD complementary fragmentation) can efficiently determine the full sequence of antibodies, and combined with automated bioinformatics tools, enable the assembly of light and heavy chains. This method overcomes the limitations of gene sequencing in hybridoma cells and significantly accelerates the development of therapeutic antibodies and engineered peptide drugs.

    To learn more, click on the article [How Peptide Sequencing Drives Drug Discovery and Biomarker Validation]

    Analysis of the preprocessed samples using online coupled liquid chromatography electrospray ionization mass spectrometryAnalysis of the preprocessed samples using online coupled liquid chromatography electrospray ionization mass spectrometry. (Figure from Prathibha R. Gajjala, 2019)

    Peptide Sequencing Delivery Results

    Core Identification Results

    The core identification results of peptide sequencing focus on sequence confirmation of the target molecule. By comparing the peptide fragment spectra obtained from the experiment with the theoretical sequence, a peptide sequence matching report is generated, clearly showing the measured mass error, fragment ion coverage, and confidence score for each identified peptide fragment.

    Further protein inference analysis assigns matched peptide fragments to specific proteins and calculates sequence coverage. The measured peptide coverage represents the proportion of amino acids covered by the peptide fragments relative to the entire sequence; higher coverage indicates greater sequence confirmation reliability.

    Post-translational Modification (PTM) Analysis Report

    The PTM analysis report system presents precise localization and quantitative results of modified sites. Based on characteristic mass shifts and fragmentation patterns in the fragment ion spectrum, combined with mass difference calculations from the b/y ion series, the specific amino acid site where modification occurs can be precisely localized.

    The report must include key spectrum annotations to demonstrate the fragment ion evidence chain supporting the localization. Additionally, relative abundance quantification of modified peptide segments is achieved through parent ion intensity or isotope labeling techniques (e.g., TMT), revealing changes in modification levels.

    Finally, algorithms (such as PTM Prophet) should be used to calculate site localization probabilities, avoiding misattribution of adjacent residues.

    Mass Spectrometry Raw Data Quality Assessment

    Data quality assessment is the cornerstone of result reliability. The primary indicator is the distribution of parent ion mass error. Modern high-resolution mass spectrometers (Orbitrap/Q-TOF) must control the average absolute error to <5 ppm to demonstrate the accuracy of mass measurement.

    The signal-to-noise ratio (S/N) statistics of MS/MS spectra reflect the reliability of fragment ion data, with a target peptide S/N > 30 typically required to ensure accurate fragment matching.

    Chromatographic retention time reproducibility is assessed through repeated injections, with retention time shifts < 0.5 minutes (at a 60-minute gradient) indicating the stability of the liquid chromatography system.

    Bioinformatics Analysis Aarameters

    Transparency of analysis parameters ensures reproducibility of results. Clearly label the database name and version to avoid matching discrepancies due to database updates.

    Enzyme digestion rules must align with actual processing, and modification settings should cover both fixed modifications and variable modifications specified in the experimental design.  

    The false discovery rate (FDR) control threshold is strictly set to ≤1% at the peptide level, ensuring the statistical rigor of identification results.

    For detailed guidance on interpreting raw spectra and generating comprehensive reports, check out [Peptide Sequencing Reports: Structure and Interpretation Guide]

    Common Issues In Peptide Sequencing

    How to Confirm The Identity of Proteins in Unknown Samples?

    Solution

    Separate an unknown protein complex using SDS-PAGE, cut out the target band, perform in-gel digestion (trypsin), and then conduct LC-MS/MS analysis.

    Peptide sequencing function

    By searching databases, the obtained peptide sequences are matched to specific proteins. The reliability of identification is confirmed by sequence coverage and the number of unique peptides. This forms the basis for discovering disease-related proteins or new interacting factors.

    How To Precisely Locate The Phosphorylation Modification Sites Of Proteins?

    Solution

    Investigate the activation mechanism of Akt, a key kinase in the insulin signaling pathway, which is known to be activated by phosphorylation at specific sites (such as Thr308 and Ser473). Therefore, use anti-Akt antibodies to enrich Akt protein and its interacting proteins in cell lysates, followed by proteolysis and LC-MS/MS analysis. In the search parameters, phosphorylation (S/T/Y) was set as a variable modification.

    Peptide Sequencing Function

    Detection and Localization: Identify peptides containing phosphorylation sites. By analyzing their MS/MS spectra, precisely determine whether phosphorylation occurs at the Thr308 or Ser308 sites. Quantification: Combining isotope-labeled or label-free quantification techniques, compare changes in the abundance of phosphorylated peptides under different stimulation conditions to quantify the dynamic phosphorylation levels of Akt at the Thr308 and Ser473 sites.

    Incomplete Enzymatic Digestion Leading To Insufficient Sequence Coverage

    Issue Essence

    The cleavage efficiency of trypsin is influenced by sample purity, buffer composition (e.g., residual SDS), or digestion time, resulting in some peptide segments not being released, manifested as key functional domains not being covered.  

    Solution

    Optimize digestion conditions—add denaturants to enhance denaturation efficiency; use Lys-C/Trypsin combination digestion; extend digestion time to 16 hours and monitor under-digestion rates; introduce dimethylation labeling to assess digestion efficiency.

    Low-abundance Peptides Are Overwhelmed By High-abundance Signals

    Problem essence

    Peptide ions from high-abundance proteins (e.g., serum albumin) in the sample inhibit the ionization of low-abundance target peptides, leading to the omission of key biomarkers.  

    Solution

    Implement a tiered strategy—use high-pH reverse-phase chromatography for pre-separation at the front end to divide the peptide mixture into 10–15 components before injection; or use custom antibodies to enrich target peptides; enable dynamic exclusion (DE) in mass spectrometry acquisition to avoid repeated collection of strong signals. In tumor marker detection, pre-tiering increased the detection rate of low-abundance peptides by threefold.

    Ambiguous localization Of PTMs

    Core issue

    Modifications such as phosphorylation and glycosylation are prone to loss during CID/HCD fragmentation, and adjacent residues (e.g., Ser/Thr) have identical masses, leading to ambiguity in site assignment.

    Solution

    Switch to electron transfer dissociation (ETD) mode to retain unstable modifications; use localization algorithms (such as PTM-Prophet) to calculate probabilities; synthesize isotope-labeled peptides as internal standards for verification; for glycosylation, enable electron activation dissociation (EAD).

    Model architecture overview: Our model takes MS/MS spectra as input and generates the predicted peptide sequenceModel architecture overview: Our model takes MS/MS spectra as input and generates the predicted peptide sequence. (Figure from Xiang Zhang, 2025)

    References

    1. Vasaikar, S., et al. (2018). Proteogenomic Analysis of Human Colon Cancer Reveals New Therapeutic OpportunitiesCell.
    2. Eloff, K., et al. (2025) InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments. Nat Mach Intell.
    3. Zhang, X., et al. (2025) π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing. Nat Commun.

    For research use only, not intended for any clinical use.

    Online Inquiry