Peptide Sequencing Reports: Structure and Interpretation Guide
- Home
- Resource
- Knowledge Bases
- Peptide Sequencing Reports: Structure and Interpretation Guide
Peptide sequencing technologies, like mass spectrometry, are vital in proteomics for revealing protein identity, abundance, modifications, and interactions. However, transforming raw data into useful insights requires complex processing. A clear and accurate peptide sequencing report is essential—it's the core output of proteomics research, not just data, but the key to biological understanding.
Researchers must master interpreting these reports. This means understanding key elements (sample info, quality metrics, sequences, protein IDs, quantitative data, statistical assessment), basic spectrum validation, and database matching results.
By systematically analyzing critical data points (peptide sequences, protein IDs, quantitative ratios, P-values/FDR, scores), researchers turn data into discovery: confirming targets, revealing changes, optimizing workflows, and advancing biomarker identification. Integrating this interpretation skill strengthens research decisions and illuminates the path to uncovering disease mechanisms.
First, lay a solid foundation by reading our articles [Peptide Sequencing Principles, Techniques, and Research Applications]
Peptide sequencing reports are the final data analysis results of LC-MS/MS peptide analysis, a core technique in proteomics for identifying and quantifying peptides.
It is generated based on raw spectra collected by mass spectrometry instruments, following peptide sequence identification, protein inference, quantitative analysis, and quality assessment using specialized bioinformatics software.
The report transforms complex spectral signals into understandable biological information, serving as a bridge between "data" and "biological insights."
The value of a peptide sequencing report goes far beyond a simple list of results; it is the core driving force behind the advancement of scientific research projects.
The report clearly identifies which proteins are present in the sample, providing information such as their names, UniProt IDs, and functional annotations. This provides a solid molecular basis and target list for subsequent research (such as functional validation, interaction studies, and pathway analysis).
For synthetic peptides, the report is the core basis for confirming whether the sequence is correct and the purity (whether there are impurities such as deletions, insertions, modification errors, etc.), and determines whether to accept or proceed to the next stage of application.
Through quantitative analysis (such as Label-Free Quantitation, TMT/iTRAQ, SILAC, etc.), the report shows changes in the relative or absolute abundance of target proteins under different conditions (such as disease vs. health, treatment vs. control). This directly reveals key regulatory molecules in biological processes (such as disease occurrence, drug response, and developmental stages).
Assess the stability and batch-to-batch consistency of synthesis or purification processes by interpreting key indicators in the report (such as sequence coverage, confidence scores, and impurity identification), identify potential risk points, and optimize resource allocation and timelines.
In biopharmaceutical development, peptide mapping is critical for monitoring the consistency of the primary structure of protein drugs. Peptide sequencing reports can accurately locate modification (such as deamidation, oxidation) or degradation sites, guiding formulation and process optimization.
A standard peptide sequencing report typically contains the following key modules and fields. Understanding their meanings is fundamental to interpreting the report:
Sample unique ID, biological origin (e.g., tissue type, cell line), experimental group (e.g., Control, Treated), sample processing method (e.g., lysis buffer, enzymatic digestion conditions, labeling method), loading volume, preparation date, operator, etc.
Ensuring the traceability of the experiment and clarifying the biological context of the results are prerequisites for correctly interpreting subsequent differential expression analysis. Results from samples with different processing methods cannot be directly compared.
Expected sequence (usually represented by single-letter amino acid codes), expected molecular weight (theoretical mass), expected modifications (such as acetylation, phosphorylation, disulfide bond positions, etc.), and expected charge state.
This is the benchmark for assessing the accuracy of sequencing results. Be sure to check that the expected information in the report is exactly the same as what you provided. Any discrepancies may lead to bias in subsequent analysis.
This is the essence of the report, which typically includes:
The mass of the precursor ion (parent ion) measured experimentally. Typically reported in the singly charged or major charged state.
The precise mass calculated based on the expected sequence and modifications.
The difference between the measured mass and the theoretical mass. Common units are parts per million (ppm) or Dalton (Da). The key point is that the smaller the error, the higher the match between the precursor ion mass and the target peptide segment, which is an important preliminary evidence. Modern high-resolution mass spectrometers (such as Orbitrap, Q-TOF) can control the error to within a few ppm (e.g., <5 ppm or <10 ppm is typically considered excellent).
The peptide sequence with the highest match to the measured MS/MS spectrum as reported by the software (typically the target sequence).
For longer peptide segments or small proteins (generated by enzymatic digestion into multiple peptide segments), this metric represents the proportion of amino acids covered by the measured MS/MS spectra relative to the entire target molecule sequence. Key points: The higher the coverage, the more confident we can be in confirming the complete sequence. 100% coverage is ideal, but it is sometimes difficult to achieve due to factors such as peptide length, fragmentation efficiency, and ionization efficiency. Key functional domains or modification sites must be covered.
A numerical value calculated by algorithms (e.g., Mascot Ion Score, Sequest XCorr, PeptideProphet Probability, Byonic Score) based on the number, intensity, continuity, and mass accuracy of fragment ion matches, quantifying the reliability of the sequence match. Key Points: Understand the threshold values of the scoring system used (e.g., Mascot Score > 20/25 is typically considered significant; XCorr varies with charge state, e.g., Charge 2+ >2.0, 3+ >2.5, etc.; probability >95% or 99%). Only matches above the threshold should be considered reliable. This is one of the most important indicators for determining sequence accuracy.
Report the types of modifications detected (e.g., oxidation-M, deamidation-N/Q, phosphorylation-S/T/Y) and their specific sites. Local confidence scores for modification sites are typically also provided.
Reports the detected major impurity sequences and their relative abundances (possibly based on parent ion intensity or peak area). Common impurities include missing sequences, inserted sequences, truncated peptides, peptides with incomplete deprotection, related sequence peptides, oxidation/deamidation impurities, etc.
This section supports comprehensive peptide sequencing impurity analysis, which is critical for confirming peptide purity and batch consistency.
The time at which the peptide segment is eluted in liquid chromatography (LC). Used for reproducibility and co-elution impurity assessment.
List or annotate all major fragment ions successfully matched on the MS/MS spectrum in detail. This includes:
b-ions (fragments with a positive charge at the N-terminus), y-ions (fragments with a positive charge at the C-terminus), a-ions, and neutral losses (e.g., H₂O, NH₃) are common types of fragment ions. Among these, b- and y-ions are the most important and most commonly used for sequence interpretation.
Indicates which peptide bond was cleaved from the precursor ion to produce this ion (e.g., y5 denotes a fragment containing the last 5 amino acids from the C-terminus).
The mass-to-charge ratio of the fragment ion detected in the mass spectrometer.
The mass-to-charge ratio of the fragment ion calculated based on the matched sequence.
The difference between the observed and theoretical m/z (ppm or Da).
The signal intensity of the fragment ion.
This is direct evidence to verify the reliability of sequence matching. High-confidence matches typically require:
One of the core components of the report, this section visually displays the mass spectrometry diagram (m/z vs. Relative Intensity) of all fragment ions generated after the precursor ion is selected and fragmented.
By referencing the fragment ion matching table/diagram, directly observe:
Describe the software used for searching and matching MS/MS data (e.g., Mascot, Sequest, Byonic, MaxQuant, PEAKS), search parameters (database name/version, enzyme digestion information, allowed modification types and mass tolerances, parent ion and fragment ion mass tolerances), scoring algorithms, and their significance thresholds.
Understanding whether the search parameter settings are reasonable (e.g., do they include all possible modifications? Are the mass tolerances consistent with the instrument's precision?) is critical for assessing the comprehensiveness and reliability of the results. Database matching is the core of automated interpretation, but its results are highly dependent on parameter settings. Ensure that the database used includes your target sequences.
Peptide Types Involved: The study focuses on defensive peptides secreted by the skin of the Slovenian agile frog (Rana dalmatina), including antimicrobial peptide families such as brevinins (types 1/2), temporins, bradykinin-related peptides (BRPs), and melittin-related peptides (MRPs).
Issues and Requirements: The genome of this species is unknown, and traditional cloning-based sequencing methods are prone to inaccuracies due to primer design errors (e.g., the C-terminal extended forms of BRPs are often misinterpreted). Skin peptides degrade rapidly under stress conditions (due to the absence of protease inhibitors), necessitating a highly sensitive method to capture complete sequences.
Research Objective: To establish a genome-independent de novo sequencing strategy to resolve the complete structure of peptide segments (including disulfide bond loop sequences and Leu/Ile isomer differentiation), providing new candidate molecules for the development of anti-infective drugs.
A total of 28 peptides were identified, including 4 brevinins containing disulfide bonds, 10 temporins, 1 MRP (FQ-22), and 13 BRPs.
Brevinin 1Db (novel 24-peptide), sequence: `FFPAFLKVAAKVVPSILCSITKKC-OH` (containing a C-terminal disulfide bond "Rana box"). Temporin 1Da (novel linear peptide), `FLPLIAGLLGKL-NH₂`. Brevinin 2D (33-peptide): differs from the closely related species R. temporaria by only one site (His¹²→Val¹²).
Liquid chromatography-tandem mass spectrometry (LC-MS/MS) coupled with Orbitrap Elite ETD and Orbitrap Fusion high-resolution mass spectrometers.
120 minutes (5–60% acetonitrile) to enhance the detection sensitivity of low-abundance peptides.
Top-down de novo sequencing combined with four fragmentation modes: CID/HCD provide main chain cleavage information. ETD preserves disulfide bonds and other modifications. EThcD (MS³ stepwise energy) resolves disulfide bond intra-ring sequences (e.g., the C-terminal `TKKC` of Brevinin 1Db) and distinguishes Leu/Ile isomers. Direct analysis of native peptides without chemical derivatization avoids modification interference. Manual verification is supplemented by Novor. Cloud and PEAKS software, but manual spectrum interpretation remains the gold standard.
Successfully obtained the full-length sequences of all peptide segments, including disulfide bond ring regions (e.g., the C-terminal region of Brevinin 1Db) and six Leu/Ile sites.
Peptide profile characteristics: Lack of ranatuerins and ranacyclines, but inclusion of Brevinin 2Rd (previously only found in green frogs), suggesting cross-species gene exchange. BRPs are extremely low in abundance, Only bradykinin fragments and [Thr⁶] modified forms were detected, with rapid degradation rates (lack of protease inhibitors).
This study, through an innovative mass spectrometry strategy, has for the first time revealed the complete diversity of the skin peptide profile of Rana dalmatina*, breaking through the limitations of traditional sequencing, and providing key data for the development of antimicrobial peptides and studies on adaptive evolution in amphibians.
Selected part of the EThcD spectrum of tetraprotonated brevinine 1Db (m/z 653.121) corresponding to Rana box fragmentation. (Figure from Tatiana Yu. Samgina, 2023)
To learn more, click on the article [How Peptide Sequencing Drives Drug Discovery and Biomarker Validation]
The molecular weight error is extremely small, the confidence score is high, the sequence coverage is 100%, and the spectrum matching is excellent. The main peptide sequence and expected modifications (K4 acetylation, M7 oxidation) are strongly confirmed, and the main component quality meets the requirements.
Confirm the impurity identification results. Check the relative abundance of this impurity in the report. Action: If the abundance is high, provide feedback and request an investigation into the efficiency of the C-terminal coupling step in the synthesis and improvements. Assess the potential impact of this impurity on downstream applications.
Confirm the specific identity of the impurity and check its relative abundance. If control is required, consider optimizing the synthesis, purification, and storage conditions. Closely monitor the growth of this impurity in subsequent stability studies.
During functional experiments (e.g., binding, activity assays), note whether the results are consistent with expectations.
Record the main peak retention time (RT), main component mass error, impurity types, and abundance. Compare these in subsequent batch testing to monitor process stability and product quality consistency.
A peptide sequencing report is not just a collection of obscure numbers and spectra; it is the bridge connecting the raw signals output by the mass spectrometer to critical project decisions. Mastering the interpretation of peptide sequencing data hinges on:
Quality error, confidence score, and coverage form the "iron triangle" of sequence accuracy; impurity identification is key to purity assessment.
Learn to interpret b/y ion series, confirm amino acids based on mass differences, and recognize the significance of key modification feature peaks and unmatched strong peaks.
Data interpretation must never be detached from project objectives. Acceptable impurity levels and requirements for key modification sites vary depending on the application scenario.
When encountering complex spectra, low-scoring matches, challenging impurities, or conflicting modification locations, actively communicate and discuss with the analytical scientists providing sequencing services.
LC–HRMS and the amino acid composition of the cyclized core peptide sequences of TFLPPLFVPP and AFFPPFFIPP in Amanita subjunquillea. (Figure from Shengwen Zhou, 2021)
References
For research use only, not intended for any clinical use.