What Is Antibody Sequencing? A Comprehensive Overview

Page Contents View

Antibodies are key proteins produced by the immune system that are responsible for recognizing and binding to specific foreign invaders. Each antibody possesses a unique amino acid sequence that determines its three-dimensional structure and ability to recognize antigens. Understanding this sequence is the basis for a deeper understanding of its function, specificity and how it interacts with antigens. This is essential for studying immune responses, disease mechanisms and developing antibody-based therapies.

Antibody sequencing relies on mass spectrometry and gene sequencing (Next-generation sequencing, NGS). Mass spectrometry physically analyzes protein samples directly, inferring their amino acid sequence by measuring the mass of protein fragments. Gene sequencing, on the other hand, indirectly obtains amino acid sequence information by decoding the gene encoding the antibody (from the B-cells or hybridoma cells that produce the antibody), and is a high-throughput and relatively sophisticated method.

Antibody sequencing is extremely versatile and valuable. In basic scientific research, it helps to analyze the molecular details of the immune response, the relationship between antibody structure and function, and the mechanism of antibody diversity. In the field of biopharmaceuticals, it is the core of monoclonal antibody drug development, providing key data for new drug discovery and engineering.

What Is Antibody Sequencing?

Definition of Antibody Sequencing

Antibody sequencing is a technique used to determine the amino acid sequence of a specific antibody molecule in a monoclonal or polyclonal antibody, which provides key information to understand the antibody's specificity, activity, structure, and affinity for the antigen.

Why Sequencing is Crucial

Accurate determination of antibody sequences is a central aspect of the biopharmaceutical field. In therapeutic antibody development, sequence information directly determines the effectiveness, safety and manufacturability of the drug. In the process development stage, it is the key basis for recombinant expression and stable production.

Sequence data provides the basis for antibody functional validation, is indispensable evidence for protecting intellectual property rights in patent applications, and enables the recovery of lost or mutated original sequences through "clone rescue" technology.

Core Targets

Focusing on Functional Region Resolution The core of antibody sequencing is the resolution of the variable region (VH/VL) that determines antigen binding specificity, which contains the highly conserved framework region (FR) and the highly variable complementary determining region (CDR). The FR maintains the overall spatial conformation of the antibody, while the CDR is directly involved in antigen recognition. Accurate resolution of the sequences of these two types of regions is a prerequisite for understanding antibody mechanisms and optimizing drug design.

The Challenge: When Genetic Information Is Not Available

Challenges in Antibody Research

A common challenge in antibody research and application is the lack of traceable sources of genetic information for many important antibodies. These antibodies fall into two main categories:

First, commercially produced antibodies (e.g., polyclonal sera or finished monoclonal antibodies), for which the original hybridoma cell lines or immunized animal sources used in the production process are often not preserved or available.

Second, laboratory "legacy clones", which are valuable early-developed antibodies whose genetic sequences may not have been fully resolved due to loss of hybridomas or technical limitations. Second, laboratory "legacy clones", which are valuable antibodies developed in the early days and whose gene sequences may not have been fully resolved due to loss of hybridomas, incomplete documentation, or technical limitations.

Limitations of Conventional Sequencing

Traditional sequencing technologies (e.g., NGS) rely heavily on high-quality RNA/cDNA extracted from B-cells and hybridomas as templates for amplification and sequencing, and the nucleic acids of the samples cannot be obtained if the parental cell lines corresponding to the antibodies are lost or degraded, or if the samples are stored in the form of purified proteins only. In this case, the amino acid sequence of the antibody becomes a difficult problem to decipher, even if the antibody itself has high activity and application value.

Heavy reliance is placed on the extraction of intact RNA/cDNA from B-cells and hybridomas for amplification and sequencing, and these methods fail once the cell line is lost, degraded, or cannot be cultured (which accounts for more than 30% of historical antibody samples).

Advantages of Mass Spectrometry Sequencing

Mass spectrometry solves this dilemma by directly analyzing the molecular entities of proteins. The core breakthroughs are:

Bypassing the genetic hierarchy: no cellular or nucleic acid templates are required, only a trace amount of protein is needed to work
Capturing the true post-translational state: directly read the mature antibody sequences that have been modified by glycosylation, oxidation, etc., avoiding the expression bias between gene sequencing and the real proteins
Compatible with complex samples: analyzing the supernatant of hybridomas, sera, and even partially degraded samples.

Mass Spectrometry-Based Antibody Sequencing: How It Works

Mass spectrometry (MS)-based antibody sequencing is a technology that directly analyzes the primary structure of proteins, with the core advantage of true de novo sequencing that realizes sequence reconstruction without relying on gene templates or referring to databases, thus completely solving the problem of antibody resolution under the scenarios of lack of genetic information.

Standardized Workflow

Enzymatic digestion: After purified antibody proteins are reduced alkylated, they are cleaved by specific proteases to break down the intact antibody into peptide fragments of moderate length.
Peptide Separation: Enzymatic digestion products are separated by HPLC. Based on the difference in hydrophobicity of the peptides, the separation is achieved in the time dimension during gradient elution, which significantly reduces sample complexity and avoids signal suppression during mass spectrometry detection.
Fragmentation analysis: The separated peptide enters primary and secondary mass spectrometry successively, generating characteristic daughter ion fragments. The fragments contain information about the breakpoints, which can be used to deduce the sequence of amino acid arrangement.
Sequence assembly: The complete sequence of the peptide is spliced by an algorithm that resolves mass differences in MS/MS spectra, identifies amino acid mass shifts between neighboring fragment ions, and ultimately integrates into the full-length sequence of the antibody.
Validation and annotation: Ensure sequence accuracy by synthesizing validated peptides, comparing with known antibody libraries or reverse translation to DNA for expression validation. Key functional regions are also annotated.

To learn more, click on the article "From Workflow to Data: A Practical Guide to Antibody Sequencing".

Key Technology Platforms

LC-MS/MS: Liquid chromatography coupled with tandem mass spectrometry for separation and fragmentation
Orbitrap: Ultra-high resolution (up to 240,000 FWHM) and ppm mass accuracy for complex peptide characterization
TOF (Time of Flight): Rapid scanning and high-quality accuracy for large molecular weight fragment analysis (e.g. MALDI-TOF)

Key Advantages of Mass Spectrometry-Based Sequencing

Mass spectrometry-based antibody sequencing technology breaks through the limitations of traditional gene sequencing and shows irreplaceable core value in the field of antibody characterization, and its advantages are mainly reflected in the following five aspects:

No Cell or Nucleic Acid Templates Required

mass spectrometry technology requires only trace amounts (μg) of purified antibody protein to initiate the sequencing process, eliminating the need to rely on hybridomas, B-cells or nucleic acid samples.

This feature makes it a key tool for resolving commercially purchased finished antibodies, "legacy antibodies" that are untraceable due to loss of cell line, and rare antibodies isolated from complex samples.

Directly Resolves the Structure of the Actual Protein being Expressed

Compared with traditional gene sequencing, mass spectrometry directly reads the mature protein structure after antibody translation instead of the theoretical sequence encoded by the gene.

This advantage avoids sequence deviations caused by somatic cell hypermutation and variable region splicing errors in genome sequencing; it truly reflects the actual structure of the final protein product. This ensures that the sequencing results are completely consistent with the molecular structure of the actual functional antibody.

Precision Capture of Post-Translational Modifications (PTMs)

Mass spectrometry directly identifies key post-translational modifications while determining the amino acid sequence
Glycosylation: resolves the type of N-glycan in the Fc region (e.g., high mannose type, fucoidan glycosylation), which directly affects the ADCC/CDC effector function
Chemical modification: detects process-related variants such as oxidized (methionine), deamidated (asparagine), and C-terminal lysine truncation
Disulfide bond localization: Confirmation of intra/inter-chain disulfide bond pairing by non-reducing enzymatic strategies.

This modification information is critical for antibody stability, immunogenicity, and efficacy, and gene sequencing cannot provide such data.

High Accuracy and Reproducibility

Modern high-resolution mass spectrometers (e.g. Orbitrap) combined with advanced algorithms enable significant improvements in sequencing reliability

Mass Accuracy: up to <2 ppm molecular weight error ensures precise differentiation of amino acid residues
Sequence Coverage: up to >95% coverage by multi-enzyme combination digestion (trypsin+Lys-C+Glu-C)
Algorithm Validation: PEAKS, etc. de novo tools use machine learning to optimize fragmentation ion resolution, dramatically reducing the rate of misinterpretation.

Broad Antibody Type Compatibility

The technology is applicable to all types of antibody molecules

Monoclonal Antibodies: Complete resolution of natural IgG, IgM and other subtypes

Engineered Antibodies: Accurate determination of small molecule configurations such as scFv, nanoantibodies, etc.

Bispecific Antibodies: Identification of light and heavy chain pairings in heterodimers

Mass spectrometry-based de novo sequencing of the monoclonal antibody Herceptin (Figure from Weiwei Peng, 2021)

Common Use Cases

Sequence Recovery of Traditional or Commercial Antibodies

For antibodies with unknown sequences due to historical technical limitations or missing samples (e.g., research antibodies lost in early hybridomas, discontinued commercial antibodies), mass spectrometry-based sequencing can reconstruct the full-length sequences by virtue of trace amounts of retained proteins.

Technology Protection and Proof of Originality

Antibody sequences provide core credentials for technological innovation: provide sequence information to support the technical protection of novel antibody molecule registries; validate sequence originality (e.g., <80% concordance with known sequences) by comparison with public databases; and generate experimental data that can be independently verified to strengthen the credibility of technological achievements.

Biosimilar Development and Quality Control

Mass spectrometry can reverse analyze the CDR sequence, Fc glycoforms and modification sites of the original drug to establish the target of imitation; prove the biosimilarity through sequence consistency and PTM spectrum overlap, and provide a chain of evidence for regulatory filing.

Sequence Confirmation before Recombinant Expression

Antibody sequencing corrects RNA sequence errors caused by somatic hypermutation prior to transferring the antibody into production cell lines; verifies the authenticity of natural pairing of light and heavy chains to prevent non-functional contamination; provides PTM benchmarking data (e.g., glycosylation) to guide cell line screening and ensures that the recombinant antibody is functionally identical to the parental clone.

To learn more, click on the article "Antibody Sequencing in Special Applications，Monoclonals, Hybridomas, Biosimilars".

Choosing the Right MS-Based Sequencing Provider

Choosing a professional and reliable mass spectrometry antibody sequencing service provider is the key to ensure data accuracy and project success.

Professional Depth of Mass Spectrometry Technology and Peptide Analysis

Priority is given to examining the strength of the service provider's technology platform:

Whether it is equipped with a high-resolution mass spectrometer (e.g., Orbitrap Exploris 480, Q-TOF) and a nanoliter liquid chromatography system.
Whether it can adopt a combination of multi-enzymatic digestion strategies (trypsin+Lys-C/Glu-C) to enhance the coverage of complex regions.
Whether it applies an AI-driven de Whether to apply AI-driven de novo algorithms (e.g. PEAKS, Byonic) to analyze difficult spectra.
The professional team should be able to decipher advanced structural information such as disulfide bond localization and inter-chain pairing.

Experience with Complex Antibody Types

Verify that the service provider has successfully handled all types of antibody forms:

Conventional monoclonal antibodies (IgG1/IgG4), engineered antibodies (scFv, nanoclonal antibodies)
Heterodimer resolution of b/multi-specific antibodies
Low-abundance samples (e.g., tissue-isolated antibodies, serum polyantibodies) and modification-sensitive antibodies

High Coverage and Rigorous Validation Process

Key quality indicators need to be clearly committed:

Sequence coverage: ≥95% (CDR region must be fully covered), peptide mapping needs to be provided as evidence.
Validation methods: mass spectrometry validation of synthesized peptides, functional comparison after recombinant expression, database cross-checking.
Error rate control: key sites need to be confirmed by dual technology.

Complete and Transparent Deliverables

Full-length sequence: with signal peptide, light and heavy chain variable/constant region annotations.
Modification mapping: quantitative report of modifications such as glycosylation sites (e.g. Asn297), oxidative deamidation, etc.
Confidence scoring: Reliability scores (e.g. based on fragmentation ionic strength/coverage) for each residue.
Raw data: Provide files in .raw or .mgf format for third party review.

Efficient Project Management and Technical Support

Cycle time commitment: whether the standard project can be completed within the promised time
Communication mechanism: whether the technical team provides timely feedback on experimental difficulties
Crisis management: whether alternatives can be initiated to address sequencing blindness (e.g. isoforms).

To learn more, click on the article "How to Choose the Right Antibody Sequencing Service".

Three approaches in MS-based antibody sequencing (Figure from Sebastiaan C. de Graaf, 2022)

Customer Concerns

What Kind of "Deliverables" Can I Get?

Professional antibody sequencing service should deliver a complete molecular profile that can be directly put into downstream research and development, including the following core contents:

Full sequence and key region annotation: Full sequence and key region annotation: The service provider will deliver primary structure analysis of the complete antibody molecule, including the full-length amino acid sequences of the heavy chain (VH+CH) and light chain (VL+CL), with signal peptide positions and mature protein start sites clearly labeled.
At the same time, we will validate the intra/inter-chain disulfide bonding topology to lay the molecular foundation for antibody engineering.
Data reliability guarantee system: Data reliability is ensured by residue-level confidence scoring system, with reliability score labeled for each amino acid site, of which 99% confidence is mandatory for CDR3 region (verified by HCD/ETD dual-mode fragmentation).
Provide peptide coverage mapping and synthetic peptide validation report of key sites (CDR/PTM)
In-depth annotation and structural analysis: quantitatively report the post-translational modification sites such as N-glycosylation, deamidation, oxidation, etc. in Fc region and the degree of modification; assess the affinity properties based on electrostatic potential energy/hydrophobicity index and quantify the humanization grade, which can provide the basis for optimization of the key quality attributes of antibody drugs.
Multi-format data delivery: Deliver ready-to-use multi-format packages, FASTA files (supporting primer design/database comparison), Excel tables (with modification site coordinates, confidence scores, and indexing of raw data), and PDF comprehensive reports. Raw mass spectrometry data (.raw/.mgf) are opened synchronously to meet the needs of third-party review and regulatory reporting.
Downstream convergence services: provide codon-optimized gene synthesis, eukaryotic expression vector construction (pcDNA3.4/pTT5, etc.), and HEK293/CHO cell line development; extend to the affinity maturation program based on the measured structure, humanization modification, and achieve the transformation from sequence to functional antibody.

Similarities and Differences Compared to Traditional Methods

Application scenario	NGS (gene level)	MS (protein level)
Cell/mRNA samples	✅ Preferred	✅ Possible
Purified antibodies only	❌ unfeasible	✅ only option
Confirmation of the actual structure of the expressed product is required	❌ Predicted sequences only	✅ direct parse
PTM assay requirements	❌ undetectable	✅ accurate identification
High-throughput antibody screening (>100 clones)	✅ Efficient parallelism	🚫 Suitable for monoclonal analysis

Choose MS over NGS when: (1) genetic material is unavailable (e.g., lost hybridoma or purified antibody only); (2) actual expressed protein structure must be verified (including PTMs, avoiding somatic mutation errors); or (3) critical quality attributes (e.g., glycosylation, deamidation) require analysis for therapeutic antibodies.

What Can I Do With This Sequence?

Recombinant expression (mammalian cell production): Based on the delivered full-length expressible sequence, gene synthesis can be optimized by codon optimization, cloned into eukaryotic expression vectors (e.g., pcDNA3.4), and transfected into CHO/HEK293 cells to achieve high-yield recombinant antibody production.

Antibody engineering modification: Accelerate the development of therapeutic antibodies by eliminating immunogenicity through humanization, enhancing ADCC effect through Fc glycosylation, and enhancing binding power 10-100 times through CDR affinity maturation.

High-throughput screening for hit-to-lead sequence confirmation: targeting functional hit clones from hybridoma/single B-cell pools, avoiding NGS non-functional rearrangement false positives.

Quality control and stability monitoring: Compare samples before and after preservation to detect degradation hotspots; monitor glycoform shifts due to process changes as well as construct sequence-PTM databases to support Root Cause Analysis (RCA) of abnormal batches.

References

Peng W, et al. Mass Spectrometry-Based De Novo Sequencing of Monoclonal Antibodies Using Multiple Proteases and a Dual Fragmentation Scheme. J Proteome Res. 2021 Jul 2;20(7):3559-3566. https://doi.org/10.1021/acs.jproteome.1c00169.
Vitorino, R., et al. (2020). De novo sequencing of proteins by mass spectrometry. Expert Review of Proteomics, 17(7–8) . https://doi.org/10.1080/14789450.2020.1831387

For research use only, not intended for any clinical use.