Peptide Sequencing Reports: Structure and Interpretation Guide

Peptide Sequencing Reports: Structure and Interpretation Guide

Page Contents View

    Peptide sequencing technologies, like mass spectrometry, are vital in proteomics for revealing protein identity, abundance, modifications, and interactions. However, transforming raw data into useful insights requires complex processing. A clear and accurate peptide sequencing report is essential—it's the core output of proteomics research, not just data, but the key to biological understanding.

    Researchers must master interpreting these reports. This means understanding key elements (sample info, quality metrics, sequences, protein IDs, quantitative data, statistical assessment), basic spectrum validation, and database matching results.

    By systematically analyzing critical data points (peptide sequences, protein IDs, quantitative ratios, P-values/FDR, scores), researchers turn data into discovery: confirming targets, revealing changes, optimizing workflows, and advancing biomarker identification. Integrating this interpretation skill strengthens research decisions and illuminates the path to uncovering disease mechanisms.

    First, lay a solid foundation by reading our articles [Peptide Sequencing Principles, Techniques, and Research Applications]

    Why Peptide Sequencing Reports Matter?

    What Is a Peptide Sequencing Report?

    Peptide sequencing reports are the final data analysis results of LC-MS/MS peptide analysis, a core technique in proteomics for identifying and quantifying peptides.

    It is generated based on raw spectra collected by mass spectrometry instruments, following peptide sequence identification, protein inference, quantitative analysis, and quality assessment using specialized bioinformatics software.

    The report transforms complex spectral signals into understandable biological information, serving as a bridge between "data" and "biological insights."

    Key Applications of Sequencing Reports in Research

    The value of a peptide sequencing report goes far beyond a simple list of results; it is the core driving force behind the advancement of scientific research projects.

    Identification of target proteins

    The report clearly identifies which proteins are present in the sample, providing information such as their names, UniProt IDs, and functional annotations. This provides a solid molecular basis and target list for subsequent research (such as functional validation, interaction studies, and pathway analysis).

    Synthetic peptide verification

    For synthetic peptides, the report is the core basis for confirming whether the sequence is correct and the purity (whether there are impurities such as deletions, insertions, modification errors, etc.), and determines whether to accept or proceed to the next stage of application.

    Monitoring protein expression levels

    Through quantitative analysis (such as Label-Free Quantitation, TMT/iTRAQ, SILAC, etc.), the report shows changes in the relative or absolute abundance of target proteins under different conditions (such as disease vs. health, treatment vs. control). This directly reveals key regulatory molecules in biological processes (such as disease occurrence, drug response, and developmental stages).

    Project Risk Assessment

    Assess the stability and batch-to-batch consistency of synthesis or purification processes by interpreting key indicators in the report (such as sequence coverage, confidence scores, and impurity identification), identify potential risk points, and optimize resource allocation and timelines.

    Process Development and Optimization

    In biopharmaceutical development, peptide mapping is critical for monitoring the consistency of the primary structure of protein drugs. Peptide sequencing reports can accurately locate modification (such as deamidation, oxidation) or degradation sites, guiding formulation and process optimization.

    How to Interpret Key Metrics in a Peptide Sequencing Report

    A standard peptide sequencing report typically contains the following key modules and fields. Understanding their meanings is fundamental to interpreting the report:

    1. Sample Metadata and Biological Context

    Sample unique ID, biological origin (e.g., tissue type, cell line), experimental group (e.g., Control, Treated), sample processing method (e.g., lysis buffer, enzymatic digestion conditions, labeling method), loading volume, preparation date, operator, etc.

    Ensuring the traceability of the experiment and clarifying the biological context of the results are prerequisites for correctly interpreting subsequent differential expression analysis. Results from samples with different processing methods cannot be directly compared.

    2. Expected Peptide Information

    Expected sequence (usually represented by single-letter amino acid codes), expected molecular weight (theoretical mass), expected modifications (such as acetylation, phosphorylation, disulfide bond positions, etc.), and expected charge state.

    This is the benchmark for assessing the accuracy of sequencing results. Be sure to check that the expected information in the report is exactly the same as what you provided. Any discrepancies may lead to bias in subsequent analysis.

    3. Analysis Summary: Mass and Sequence Matching

    This is the essence of the report, which typically includes:

    Experimental molecular weight

    The mass of the precursor ion (parent ion) measured experimentally. Typically reported in the singly charged or major charged state.

    Theoretical molecular weight

    The precise mass calculated based on the expected sequence and modifications.

    Mass error

    The difference between the measured mass and the theoretical mass. Common units are parts per million (ppm) or Dalton (Da). The key point is that the smaller the error, the higher the match between the precursor ion mass and the target peptide segment, which is an important preliminary evidence. Modern high-resolution mass spectrometers (such as Orbitrap, Q-TOF) can control the error to within a few ppm (e.g., <5 ppm or <10 ppm is typically considered excellent).

    Primary Sequence Match

    The peptide sequence with the highest match to the measured MS/MS spectrum as reported by the software (typically the target sequence).

    Sequence Coverage

    For longer peptide segments or small proteins (generated by enzymatic digestion into multiple peptide segments), this metric represents the proportion of amino acids covered by the measured MS/MS spectra relative to the entire target molecule sequence. Key points: The higher the coverage, the more confident we can be in confirming the complete sequence. 100% coverage is ideal, but it is sometimes difficult to achieve due to factors such as peptide length, fragmentation efficiency, and ionization efficiency. Key functional domains or modification sites must be covered.

    Confidence Score

    A numerical value calculated by algorithms (e.g., Mascot Ion Score, Sequest XCorr, PeptideProphet Probability, Byonic Score) based on the number, intensity, continuity, and mass accuracy of fragment ion matches, quantifying the reliability of the sequence match. Key Points: Understand the threshold values of the scoring system used (e.g., Mascot Score > 20/25 is typically considered significant; XCorr varies with charge state, e.g., Charge 2+ >2.0, 3+ >2.5, etc.; probability >95% or 99%). Only matches above the threshold should be considered reliable. This is one of the most important indicators for determining sequence accuracy.

    Main Modifications Identified

    Report the types of modifications detected (e.g., oxidation-M, deamidation-N/Q, phosphorylation-S/T/Y) and their specific sites. Local confidence scores for modification sites are typically also provided.

    Impurities Identified

    Reports the detected major impurity sequences and their relative abundances (possibly based on parent ion intensity or peak area). Common impurities include missing sequences, inserted sequences, truncated peptides, peptides with incomplete deprotection, related sequence peptides, oxidation/deamidation impurities, etc.

    This section supports comprehensive peptide sequencing impurity analysis, which is critical for confirming peptide purity and batch consistency.

    Retention Time (RT)

    The time at which the peptide segment is eluted in liquid chromatography (LC). Used for reproducibility and co-elution impurity assessment.

    4. Fragment Ion Table and Validation Logic

    List or annotate all major fragment ions successfully matched on the MS/MS spectrum in detail. This includes:  

    Ion type

    b-ions (fragments with a positive charge at the N-terminus), y-ions (fragments with a positive charge at the C-terminus), a-ions, and neutral losses (e.g., H₂O, NH₃) are common types of fragment ions. Among these, b- and y-ions are the most important and most commonly used for sequence interpretation.

    Ion number

    Indicates which peptide bond was cleaved from the precursor ion to produce this ion (e.g., y5 denotes a fragment containing the last 5 amino acids from the C-terminus).

    Observed m/z

    The mass-to-charge ratio of the fragment ion detected in the mass spectrometer.

    Theoretical m/z

    The mass-to-charge ratio of the fragment ion calculated based on the matched sequence.

    Error

    The difference between the observed and theoretical m/z (ppm or Da).

    Intensity/Abundance

    The signal intensity of the fragment ion.

    Interpretation Key Points

    This is direct evidence to verify the reliability of sequence matching. High-confidence matches typically require:

    • High matching degree: Most of the main, high-intensity fragment peaks in the spectrum (especially in the high m/z region) can be explained by b/y ions.
    • Sequence ion continuity: The presence of continuous b ion or y ion series (e.g., y3, y4, y5, y6) provides information about the connection between adjacent amino acids in the sequence. The longer the interval covered by continuous ions, the more reliable the sequence confirmation.
    • Low mass error: The measured m/z of the matched fragment ions has a small error compared to the theoretical m/z (typically within a few ppm).
    • Confirmation of key amino acids: The presence of specific amino acids (e.g., I/L with the same mass, requiring other features for differentiation; W, R, H, P, etc., with characteristic fragments or neutral losses) should be supported by characteristic ions.

    5. Spectrum Review: Key Peaks and Modifications

    One of the core components of the report, this section visually displays the mass spectrometry diagram (m/z vs. Relative Intensity) of all fragment ions generated after the precursor ion is selected and fragmented.

    By referencing the fragment ion matching table/diagram, directly observe:

    • Signal-to-Noise Ratio (S/N): Whether the target fragment peaks are clearly distinguishable and exceed the background noise. A high S/N ratio is the foundation of high-quality data.
    • Major fragment peaks: Unassigned strong peaks may indicate impurities, unknown modifications, non-target fragments (e.g., a-ions, neutral loss, internal fragments), or signals from higher charge state fragments. It is important to assess whether these peaks influence the interpretation of results.
    • Key ion pairs: The mass difference (delta m) between adjacent b/y ions equals the mass of the corresponding amino acid residue. This is a basic method for manually deriving or verifying sequences (e.g., y5 - y4 = mass of residue at position 5 from the C-terminus). Reports typically include annotations, but understanding the principle aids in interpretation.
    • Modification features: Certain modifications can produce characteristic fragments or neutral losses (e.g., phosphorylation in negative ion mode often results in the loss of H₃PO₄ (98 Da); oxidation of M often results in the loss of •CH₃SH (64 Da) or SO•CH₃ (78 Da)).

    6. Database Search Parameters and Scoring Thresholds

    Describe the software used for searching and matching MS/MS data (e.g., Mascot, Sequest, Byonic, MaxQuant, PEAKS), search parameters (database name/version, enzyme digestion information, allowed modification types and mass tolerances, parent ion and fragment ion mass tolerances), scoring algorithms, and their significance thresholds.

    Understanding whether the search parameter settings are reasonable (e.g., do they include all possible modifications? Are the mass tolerances consistent with the instrument's precision?) is critical for assessing the comprehensiveness and reliability of the results. Database matching is the core of automated interpretation, but its results are highly dependent on parameter settings. Ensure that the database used includes your target sequences.

    Case Study: De Novo Sequencing of Rana dalmatina Peptides

    Research Background and Objectives

    Peptide Types Involved: The study focuses on defensive peptides secreted by the skin of the Slovenian agile frog (Rana dalmatina), including antimicrobial peptide families such as brevinins (types 1/2), temporins, bradykinin-related peptides (BRPs), and melittin-related peptides (MRPs).

    Issues and Requirements: The genome of this species is unknown, and traditional cloning-based sequencing methods are prone to inaccuracies due to primer design errors (e.g., the C-terminal extended forms of BRPs are often misinterpreted). Skin peptides degrade rapidly under stress conditions (due to the absence of protease inhibitors), necessitating a highly sensitive method to capture complete sequences.

    Research Objective: To establish a genome-independent de novo sequencing strategy to resolve the complete structure of peptide segments (including disulfide bond loop sequences and Leu/Ile isomer differentiation), providing new candidate molecules for the development of anti-infective drugs.

    Sample Information

    Target Peptides

    A total of 28 peptides were identified, including 4 brevinins containing disulfide bonds, 10 temporins, 1 MRP (FQ-22), and 13 BRPs.

    Key new sequences

    Brevinin 1Db (novel 24-peptide), sequence: `FFPAFLKVAAKVVPSILCSITKKC-OH` (containing a C-terminal disulfide bond "Rana box"). Temporin 1Da (novel linear peptide), `FLPLIAGLLGKL-NH₂`. Brevinin 2D (33-peptide): differs from the closely related species R. temporaria by only one site (His¹²→Val¹²).

    Analytical Methods

    Mass spectrometry

    Liquid chromatography-tandem mass spectrometry (LC-MS/MS) coupled with Orbitrap Elite ETD and Orbitrap Fusion high-resolution mass spectrometers.

    Gradient elution

    120 minutes (5–60% acetonitrile) to enhance the detection sensitivity of low-abundance peptides.  

    Key method highlight

    Top-down de novo sequencing combined with four fragmentation modes: CID/HCD provide main chain cleavage information. ETD preserves disulfide bonds and other modifications. EThcD (MS³ stepwise energy) resolves disulfide bond intra-ring sequences (e.g., the C-terminal `TKKC` of Brevinin 1Db) and distinguishes Leu/Ile isomers. Direct analysis of native peptides without chemical derivatization avoids modification interference. Manual verification is supplemented by Novor. Cloud and PEAKS software, but manual spectrum interpretation remains the gold standard.

    Key Findings

    Successfully obtained the full-length sequences of all peptide segments, including disulfide bond ring regions (e.g., the C-terminal region of Brevinin 1Db) and six Leu/Ile sites.

    Peptide profile characteristics: Lack of ranatuerins and ranacyclines, but inclusion of Brevinin 2Rd (previously only found in green frogs), suggesting cross-species gene exchange. BRPs are extremely low in abundance, Only bradykinin fragments and [Thr⁶] modified forms were detected, with rapid degradation rates (lack of protease inhibitors).

    Research Significance

    This study, through an innovative mass spectrometry strategy, has for the first time revealed the complete diversity of the skin peptide profile of Rana dalmatina*, breaking through the limitations of traditional sequencing, and providing key data for the development of antimicrobial peptides and studies on adaptive evolution in amphibians.

    Selected part of the EThcD spectrum of tetraprotonated brevinine 1Db (m/z 653.121) corresponding to Rana box fragmentation.Selected part of the EThcD spectrum of tetraprotonated brevinine 1Db (m/z 653.121) corresponding to Rana box fragmentation. (Figure from Tatiana Yu. Samgina, 2023)

    To learn more, click on the article [How Peptide Sequencing Drives Drug Discovery and Biomarker Validation]

    How to Use Reports for Decision-Making

    Main Component Confirmation

    The molecular weight error is extremely small, the confidence score is high, the sequence coverage is 100%, and the spectrum matching is excellent. The main peptide sequence and expected modifications (K4 acetylation, M7 oxidation) are strongly confirmed, and the main component quality meets the requirements.

    Impurity Assessment

    Missing F impurity

    Confirm the impurity identification results. Check the relative abundance of this impurity in the report. Action: If the abundance is high, provide feedback and request an investigation into the efficiency of the C-terminal coupling step in the synthesis and improvements. Assess the potential impact of this impurity on downstream applications.

    Peroxide impurity

    Confirm the specific identity of the impurity and check its relative abundance. If control is required, consider optimizing the synthesis, purification, and storage conditions. Closely monitor the growth of this impurity in subsequent stability studies.

    Follow-up studies

    During functional experiments (e.g., binding, activity assays), note whether the results are consistent with expectations.

    Batch-to-batch comparison

    Record the main peak retention time (RT), main component mass error, impurity types, and abundance. Compare these in subsequent batch testing to monitor process stability and product quality consistency.

    Interpreting Peptide Reports: Final Tips and Best Practices

    A peptide sequencing report is not just a collection of obscure numbers and spectra; it is the bridge connecting the raw signals output by the mass spectrometer to critical project decisions. Mastering the interpretation of peptide sequencing data hinges on:

    Focusing On Core Metrics

    Quality error, confidence score, and coverage form the "iron triangle" of sequence accuracy; impurity identification is key to purity assessment.

    Understanding Spectrum Logic

    Learn to interpret b/y ion series, confirm amino acids based on mass differences, and recognize the significance of key modification feature peaks and unmatched strong peaks.

    Making Decisions in Context

    Data interpretation must never be detached from project objectives. Acceptable impurity levels and requirements for key modification sites vary depending on the application scenario.

    Utilize Professional Support

    When encountering complex spectra, low-scoring matches, challenging impurities, or conflicting modification locations, actively communicate and discuss with the analytical scientists providing sequencing services.

    LC–HRMS and the amino acid composition of the cyclized core peptide sequences of TFLPPLFVPP and AFFPPFFIPP in Amanita subjunquillea.LC–HRMS and the amino acid composition of the cyclized core peptide sequences of TFLPPLFVPP and AFFPPFFIPP in Amanita subjunquillea. (Figure from Shengwen Zhou, 2021)

    References

    1. Samgina, T. Y. et al. (2023). Tandem Mass Spectrometry de novo Sequencing of the Skin Defense Peptides of the Central Slovenian Agile Frog Rana dalmatina. Molecules.
    2. Qiao, R., et al. Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices. Nat Mach Intell.
    3. Chuan-Yih Yu, et al. (2016) Automated Glycan Sequencing from Tandem Mass Spectra of N-Linked Glycopeptides. Analytical Chemistry.

    For research use only, not intended for any clinical use.

    Online Inquiry