From Workflow to Data: A Practical Guide to Antibody Sequencing
- Home
- Resource
- Knowledge Bases
- From Workflow to Data: A Practical Guide to Antibody Sequencing
Antibody sequencing is of core value in vaccine and drug discovery and basic research, and its core challenges are the complexity of highly variable regions, the resolution of post-translational modifications (e.g. glycosylation), the handling of micro samples, and the retention of natural pairing information of light and heavy chains. Two major routes, nucleic acid sequencing and mass spectrometry, are used to provide a closed-loop solution from sample preparation to data analysis.
At the technical process level, nucleic acid sequencing covers primer design, RNA/DNA extraction quality control, amplification bias control and sequencing platform selection; while mass spectrometry process focuses on antibody pretreatment (double digestion/de-glycosylation), mass spectrometry parameter optimization (HCD fragmentation+gradient elution), and disulfide bond localization (non-reduced digestion under native conditions+ETD technology). In the data analysis, the nucleic acid route was realized by IgBLAST annotation and pairSEQ pairing for sequence reconstruction, while the mass spectrometry route relied on PEAKS ab initio sequencing and Byonic modification localization, complemented by key quality control.
NGS coupled with mass spectrometry allows simultaneous resolution of sequence and modifications (e.g., antibody-drug glycoform heterogeneity), and single-cell sequencing combined with pseudoviral neutralization screening accelerates therapeutic antibody discovery. In-depth understanding of the principles and key points of each step of the workflow and strict implementation of standardized procedures are the solid foundation for obtaining high-confidence antibody sequences and advancing antibody-related research and applications.
Sample preparation is the basic link of antibody sequencing, reasonable sample selection and high quality sample preparation are the prerequisite to ensure the success of subsequent experiments.
Cell samples (single B-cells, hybridomas)
Protein samples (serum, supernatant, purified antibody)
Antibody sequencing reveals the structural diversity and functional mechanism of antibodies by analyzing the nucleic acid sequences (heavy chain and light chain variable regions) of antibodies in B cells, and the core process covers nucleic acid extraction, primer design, amplification and library construction, and selection of sequencing platforms.
Primer design for antibody sequencing needs to take into account the breadth of coverage and sequence accuracy.
High-quality nucleic acids are the basis of antibody sequence accuracy. High-quality RNA (for subsequent acquisition of antibody variable region sequences) or genomic DNA (for acquisition of complete antibody gene sequences) is extracted from B-cells, hybridoma cells, or tissue samples using TRIzol or specialized kits.
Reverse transcription (RT) strategy:
One-step PCR is easy to operate and reduces the risk of contamination, making it suitable for high-throughput screening. The two-step method synthesizes cDNA first and then uses it as a template for PCR. This approach offers greater flexibility and is suitable for detecting low-frequency antibodies or handling complex samples that require multiplex PCR.
Tips to avoid amplification bias:
Use template tailing to combine with universal primers and avoid primer preference at the 5' end of the variable region. During reverse transcription, incorporate molecule-level barcoding strategies to enable error correction from repeated PCR amplification. Additionally, limit the number of PCR cycles to reduce amplification bias.
Library construction:
Add sequencing adapters with sample indexes (e.g., Illumina TruSeq). Remove primer dimers through fragment screening using magnetic beads or gel electrophoresis. This ensures that the final library size is concentrated within the desired target range.
Target antibodies are obtained from the appropriate cell culture or animal model. Afterwards, high quality antibodies are purified from complex samples using affinity chromatography or other purification techniques.
A specific protease pair is used to cleave the large proteins (heavy and light chains) of the antibody into small peptides suitable for mass spectrometry analysis.
After enzymatic digestion, the peptide mixture may require further purification and separation for subsequent mass spectrometry. Liquid chromatography (LC) is a commonly used purification method, which allows effective separation of peptides based on their physicochemical properties (e.g., hydrophilic or hydrophobic).
The purified peptides are analyzed by mass spectrometry, and commonly used techniques include electrospray ionization (ESI) mass spectrometry and matrix-assisted laser desorption ionization (MALDI) mass spectrometry. Mass spectrometry analysis provides precise mass and sequence information of the peptides.
Schematic overview of the sequencing pipeline (Figure from Adrian Guthals, 2016)
Raw sequencing data usually contain technical noise, and need to be systematically pre-processed to improve reliability.
Firstly, FastQC tool is used for quality assessment to identify low-quality bases, abnormal sequence lengths or deviations in base distribution, etc.
Subsequently, Cutadapt is used to accurately excise sequencing junction sequences and primer residues, so as to avoid the introduction of artificial interference in the subsequent analysis. This stage is the key foundation for ensuring data integrity and accuracy.
The high quality sequences after preprocessing need to be further analyzed for the structure of the antibody variable region. Use IgBLAST or IMGT database for in-depth annotation to precisely identify the complementarity-determining region (CDR) and framework region (FR), and clarify the origin of the V(D)J gene in the light and heavy chains.
For single-cell sequencing data, pairSEQ and other algorithms should be used to realize the correct pairing of the light and heavy chains based on molecular tags or sequence overlap features, so as to reconstruct the pairing relationship of natural antibodies, and to provide a structural basis for the subsequent functional study.
To eliminate the effects of PCR amplification or sequencing errors, multiple sequence comparisons of redundant sequences of the same clone are required to generate consensus sequences with high confidence.
To eliminate the effect of PCR amplification or sequencing errors, redundant sequences of the same clone are compared to generate a high-confidence consensus sequence.
This step significantly improves sequence reliability and provides a solid basis for antibody engineering or mechanistic studies.
Mass spectrometry raw data need to be matched to antibody sequence databases with the help of tools (e.g. Mascot, MaxQuant or PEAKS).
To improve identification efficiency, a species-specific germline gene customization database covering all possible V/D/J gene fragment combinations needs to be constructed.
Sequence identification of antibody variable regions is achieved by scoring the match of peptide fragment ion masses to theoretical values, especially for antibody identification of known germline gene sources.
When the target antibody sequence is not in the database, a de novo sequencing strategy is required. Algorithms such as PEAKS or Byonic are utilized to directly interpret tandem MS/MS spectra to deduce peptide sequences from fragmentation patterns.
High confidence sequence tags are generated by segmentation, combined with extended splicing of overlapping regions to finally reconstruct the complete antibody variable region sequence. This method is essential for the discovery of rare or new species antibodies.
Antibody function is often regulated by post-translational modifications (e.g. glycosylation, oxidation).
With the help of software such as Skyline or Proteome Discoverer, modification sites (e.g., N-glycosylation sites) can be accurately localized based on fragmentation ions, and quantified by chromatographic peak area integration.
This analysis reveals the dynamic correlation between antibody structure and function and provides key information for optimizing the efficacy of therapeutic antibodies.
References
For research use only, not intended for any clinical use.