From Workflow to Data: A Practical Guide to Antibody Sequencing

Page Contents View

Antibody sequencing is of core value in vaccine and drug discovery and basic research, and its core challenges are the complexity of highly variable regions, the resolution of post-translational modifications (e.g. glycosylation), the handling of micro samples, and the retention of natural pairing information of light and heavy chains. Two major routes, nucleic acid sequencing and mass spectrometry, are used to provide a closed-loop solution from sample preparation to data analysis.

At the technical process level, nucleic acid sequencing covers primer design, RNA/DNA extraction quality control, amplification bias control and sequencing platform selection; while mass spectrometry process focuses on antibody pretreatment (double digestion/de-glycosylation), mass spectrometry parameter optimization (HCD fragmentation+gradient elution), and disulfide bond localization (non-reduced digestion under native conditions+ETD technology). In the data analysis, the nucleic acid route was realized by IgBLAST annotation and pairSEQ pairing for sequence reconstruction, while the mass spectrometry route relied on PEAKS ab initio sequencing and Byonic modification localization, complemented by key quality control.

NGS coupled with mass spectrometry allows simultaneous resolution of sequence and modifications (e.g., antibody-drug glycoform heterogeneity), and single-cell sequencing combined with pseudoviral neutralization screening accelerates therapeutic antibody discovery. In-depth understanding of the principles and key points of each step of the workflow and strict implementation of standardized procedures are the solid foundation for obtaining high-confidence antibody sequences and advancing antibody-related research and applications.

Sample Preparation for Antibody Sequencing

Sample preparation is the basic link of antibody sequencing, reasonable sample selection and high quality sample preparation are the prerequisite to ensure the success of subsequent experiments.

Sample type

Single B-cells: derived from peripheral blood, lymph nodes or bone marrow of immunized individuals or those recovered from infection. The advantage is that naturally paired antibody heavy and light chain sequences are available, which makes it particularly suitable for the discovery of novel antibodies. Isolation of target B cells (e.g. antigen-specific B cells, plasma cells) by flow cytometry or microfluidics is required.
Hybridoma cells: produced by fusion of splenocytes from immunized animals with myeloma cells, the main source of traditional monoclonal antibodies. Samples are hybridoma cell lines or their culture supernatants (containing secreted antibodies). Sequences represent a single clone, but there may be a risk of light/heavy chain mismatches.
Serum/plasma polyclonal antibodies: contain mixtures of multiple antibodies against specific antigens. Sequencing targets are usually populations enriched for antigen-specific antibodies, which need to be analyzed in conjunction with NGS and bioinformatics to resolve their compositional complexity, and are suitable for vaccine studies or for monitoring the dynamics of infection immunity.
Recombinant antibodies or fragments: e.g. Fab, scFv, etc., are usually partially purified and can be used directly for mass spectrometry sequencing or construction of sequencing libraries.

Sample quality control criteria

Cell samples (single B-cells, hybridomas)

Cell viability: >90% (Taipan blue staining or fluorescent dye exclusion method) to ensure RNA integrity.
Cell purity: The percentage of target cells (e.g., antigen-specific labeling positivity) should be clearly defined to avoid contamination by non-target cells.
RNA quality: RIN (RNA Integrity Number) ≥ 7 (Agilent Bioanalyzer/Tapestation) to ensure antibody transcript integrity. Adequate concentration (usually single cells require special amplification).

Protein samples (serum, supernatant, purified antibody)

Purity: SDS-PAGE/CGE shows clear primary bands with few impurities (especially serum requires antigen-specific enrichment and purification).
Concentration: Meet the requirements of downstream experiments (e.g. μg for mass spectrometry, ng for NGS library).
Integrity: Avoid severe degradation (mass spectrometry can tolerate limited protein degradation, whereas nucleic acid-based sequencing (e.g. NGS) requires intact RNA transcripts of IgG for accurate cDNA synthesis and sequencing.).
General requirements: Specify sample source, processing conditions (e.g., number of freeze-thaw cycles), storage conditions (liquid nitrogen/-80°C for cells or RNA, -20°C/-80°C for proteins), and clear documentation.

Nucleic Acid-Based Antibody Sequencing Process

Antibody sequencing reveals the structural diversity and functional mechanism of antibodies by analyzing the nucleic acid sequences (heavy chain and light chain variable regions) of antibodies in B cells, and the core process covers nucleic acid extraction, primer design, amplification and library construction, and selection of sequencing platforms.

Primer design strategy

Primer design for antibody sequencing needs to take into account the breadth of coverage and sequence accuracy.

Universal primers: designed to target the conserved framework of the variable region of the antibody (e.g. FR1 or FR4), they can amplify most of the V(D)J recombinant sequences, and are suitable for the discovery of unknown antibodies or diversity survey (e.g. sequencing of antibody libraries). However, they may amplify non-functional sequences and have limited sensitivity to low-frequency cloning.
Specific primers: Designed for constant regions of specific subtypes (e.g., IgG/IgA) or species (e.g., human/mouse), they can precisely enrich the target antibody and reduce background noise, and are suitable for in-depth analysis of monoclonal antibodies or specific subtypes.
Separation of light and heavy chain amplification is the key strategy: heavy chain (IgH) and light chain (κ/λ chain) primers need to be designed separately for independent PCR, avoiding artificial pairing or incorrect recombination of light and heavy chains during amplification.

RNA/DNA extraction

High-quality nucleic acids are the basis of antibody sequence accuracy. High-quality RNA (for subsequent acquisition of antibody variable region sequences) or genomic DNA (for acquisition of complete antibody gene sequences) is extracted from B-cells, hybridoma cells, or tissue samples using TRIzol or specialized kits.

RT-PCR and library construction

Reverse transcription (RT) strategy:

One-step PCR is easy to operate and reduces the risk of contamination, making it suitable for high-throughput screening. The two-step method synthesizes cDNA first and then uses it as a template for PCR. This approach offers greater flexibility and is suitable for detecting low-frequency antibodies or handling complex samples that require multiplex PCR.

Tips to avoid amplification bias:

Use template tailing to combine with universal primers and avoid primer preference at the 5' end of the variable region. During reverse transcription, incorporate molecule-level barcoding strategies to enable error correction from repeated PCR amplification. Additionally, limit the number of PCR cycles to reduce amplification bias.

Library construction:

Add sequencing adapters with sample indexes (e.g., Illumina TruSeq). Remove primer dimers through fragment screening using magnetic beads or gel electrophoresis. This ensures that the final library size is concentrated within the desired target range.

Sequencing platform selection

Sanger sequencing: clonal validation of monoclonal antibodies (e.g., hybridomas) or targeted sequencing of low-complexity samples. Long read length (~800 bp), accuracy > 99.9%, direct access to full-length heavy/light chain sequences, but low throughput, unable to resolve mixed samples.
Illumina (second-generation sequencing): for high-throughput antibody library analysis (e.g., 10⁵-10⁷ B cells). High throughput and low cost, suitable for large-scale diversity studies, but short read lengths need to be spliced and full-length pairwise information may be lost.
PacBio HiFi (triple sequencing): can directly obtain full-length antibody light and heavy chain sequences (~1.4 kb) without splicing.

Mass Spectrometry-Based Sequencing Process

Antibody preparation and purification

Target antibodies are obtained from the appropriate cell culture or animal model. Afterwards, high quality antibodies are purified from complex samples using affinity chromatography or other purification techniques.

Reduction and alkylation

Purpose: To break the disulfide bond between the heavy and light chains of the antibody, and to enclose and immobilize the free cysteine residues, preventing them from re-forming the disulfide bond or side reactions, and to ensure a thorough enzymatic digestion.
Reduction: Add a reducing agent and incubate for a certain time under heating (~55-60°C) to reduce -S-S- to -SH.
Alkylation: Add an alkylation reagent (e.g., iodoacetamide IAM or iodoacetic acid IAA) and incubate at room temperature and protected from light to alkylate the free -SH group (to form a stable carboxymethylcysteine or carbamoylmethylcysteine), preventing its reoxidation.

Enzymatic cleavage

A specific protease pair is used to cleave the large proteins (heavy and light chains) of the antibody into small peptides suitable for mass spectrometry analysis.

Trypsin: Most commonly used. Specifically cleaves the carboxyl terminus (C-terminus) of arginine and lysine residues. Produces peptides of moderate length (typically 8-20 amino acids) that are positively charged and well suited for LC-MS/MS analysis. However, it may not be able to cover certain regions that do not contain arginine/lysine (e.g. CDR3).
Lys-C: Specifically cleaves the C-terminus of lysine residues. Produces peptides that are usually longer than trypsin. Sometimes used in combination with trypsin (Lys-C then Trypsin) for more complete coverage.
Operation: The alkylated sample is incubated with the enzyme at a specific pH and temperature (usually 37°C) for several hours to overnight. The reaction may subsequently be terminated by the addition of formic acid or trifluoroacetic acid.

Peptide Purification and Mass Spectrometry

After enzymatic digestion, the peptide mixture may require further purification and separation for subsequent mass spectrometry. Liquid chromatography (LC) is a commonly used purification method, which allows effective separation of peptides based on their physicochemical properties (e.g., hydrophilic or hydrophobic).

The purified peptides are analyzed by mass spectrometry, and commonly used techniques include electrospray ionization (ESI) mass spectrometry and matrix-assisted laser desorption ionization (MALDI) mass spectrometry. Mass spectrometry analysis provides precise mass and sequence information of the peptides.

Schematic overview of the sequencing pipeline (Figure from Adrian Guthals, 2016)

Service

Mass Spectrometry Based Antibody Sequencing

Data Analysis: From Raw Data to Reliable Sequences

Nucleic Acid Antibody-Based Sequencing Data Analysis

Data pre-processing

Raw sequencing data usually contain technical noise, and need to be systematically pre-processed to improve reliability.

Firstly, FastQC tool is used for quality assessment to identify low-quality bases, abnormal sequence lengths or deviations in base distribution, etc.

Subsequently, Cutadapt is used to accurately excise sequencing junction sequences and primer residues, so as to avoid the introduction of artificial interference in the subsequent analysis. This stage is the key foundation for ensuring data integrity and accuracy.

Sequence splicing and annotation

The high quality sequences after preprocessing need to be further analyzed for the structure of the antibody variable region. Use IgBLAST or IMGT database for in-depth annotation to precisely identify the complementarity-determining region (CDR) and framework region (FR), and clarify the origin of the V(D)J gene in the light and heavy chains.

For single-cell sequencing data, pairSEQ and other algorithms should be used to realize the correct pairing of the light and heavy chains based on molecular tags or sequence overlap features, so as to reconstruct the pairing relationship of natural antibodies, and to provide a structural basis for the subsequent functional study.

Error Correction and Clone Validation

To eliminate the effects of PCR amplification or sequencing errors, multiple sequence comparisons of redundant sequences of the same clone are required to generate consensus sequences with high confidence.

To eliminate the effect of PCR amplification or sequencing errors, redundant sequences of the same clone are compared to generate a high-confidence consensus sequence.

This step significantly improves sequence reliability and provides a solid basis for antibody engineering or mechanistic studies.

Mass Spectrometry-Based Analysis of Antibody Sequencing Data

Database search

Mass spectrometry raw data need to be matched to antibody sequence databases with the help of tools (e.g. Mascot, MaxQuant or PEAKS).

To improve identification efficiency, a species-specific germline gene customization database covering all possible V/D/J gene fragment combinations needs to be constructed.

Sequence identification of antibody variable regions is achieved by scoring the match of peptide fragment ion masses to theoretical values, especially for antibody identification of known germline gene sources.

De novo sequencing

When the target antibody sequence is not in the database, a de novo sequencing strategy is required. Algorithms such as PEAKS or Byonic are utilized to directly interpret tandem MS/MS spectra to deduce peptide sequences from fragmentation patterns.

High confidence sequence tags are generated by segmentation, combined with extended splicing of overlapping regions to finally reconstruct the complete antibody variable region sequence. This method is essential for the discovery of rare or new species antibodies.

Modification localization and quantification

Antibody function is often regulated by post-translational modifications (e.g. glycosylation, oxidation).

With the help of software such as Skyline or Proteome Discoverer, modification sites (e.g., N-glycosylation sites) can be accurately localized based on fragmentation ions, and quantified by chromatographic peak area integration.

This analysis reveals the dynamic correlation between antibody structure and function and provides key information for optimizing the efficacy of therapeutic antibodies.

Learn more

What Is Antibody Sequencing? A Comprehensive Overview

References

Le Bihan, et al. De novo protein sequencing of antibodies for identification of neutralizing antibodies in human plasma post SARS-CoV-2 vaccination. Nat Commun 15, 8790 (2024). https://doi.org/10.1038/s41467-024-53105-8
Janin-Bussat, Marie-Claire, et al. "Characterization of antibody drug conjugate positional isomers at cysteine residues by peptide mapping LC–MS analysis."Journal of Chromatography B 981 (2015): 9-13. https://doi.org/10.1016/j.jchromb.2014.12.017
Guthals A, et al. De Novo MS/MS Sequencing of Native Human Antibodies. J Proteome Res. 2017 Jan 6;16(1):45-54. https://pubs.acs.org/doi/10.1021/acs.jproteome.6b00608

For research use only, not intended for any clinical use.