Antibody Sequencing for Monoclonals, Hybridomas, Biosimilars

Antibody Sequencing for Monoclonals, Hybridomas, Biosimilars

Page Contents View

    Antibody sequencing plays a foundational role in biopharmaceutical innovation by enabling accurate analysis of antibody structure at the molecular level. As monoclonal antibody development, hybridoma platforms, and biosimilar research continue to expand, sequencing technologies have evolved from basic tools into essential components of R&D workflows.

    Traditional methods such as Edman degradation and Sanger sequencing have given way to advanced platforms like next-generation sequencing (NGS) and high-resolution mass spectrometry (MS), which offer higher throughput and more complete sequence coverage. In recent years, the integration of single-cell technologies and AI-driven analysis has further shortened discovery timelines and enhanced sequencing precision.

    Why Specialized Antibody Sequencing Is Critical

    Accurate Antibody Sequences Power Biologics Innovation and Therapeutics

    Precise antibody sequencing is foundational to every stage of biologics discovery and therapeutic development. The amino acid sequence of an antibody defines its three-dimensional structure, antigen-binding specificity, and overall functional integrity. Errors in sequence data can compromise expression, reduce therapeutic efficacy, or even introduce unexpected immunogenic responses.

    By using specialized, high-accuracy sequencing platforms, researchers can avoid such risks and ensure that antibody candidates are both functional and safe. Reliable sequence data is essential not just for product performance but also for regulatory compliance and intellectual property protection.

    Mass Spectrometry: A Key Tool for Direct, High-Resolution Antibody Sequencing

    Unlike genetic sequencing methods such as NGS, which infer protein structure from nucleic acid templates, mass spectrometry (MS) directly analyzes the physical antibody protein itself. This allows for label-free, reference-independent (ab initio) sequencing, providing a clear picture of the actual protein produced—especially critical when post-translational modifications (PTMs) are involved.

    Advanced tandem MS/MS techniques, paired with expert bioinformatics analysis, allow for confident identification of:

    • Amino acid sequences
    • Disulfide bond patterns
    • PTMs such as glycosylation, phosphorylation, and oxidation

    These insights are crucial for functional validation, comparability studies, and downstream engineering.

    Key Application Areas of Specialized Antibody Sequencing

    1. Monoclonal Antibody (mAb) Validation

    MS-based sequencing confirms that expressed antibody products match the intended design and helps detect cloning or expression-derived mutations early, ensuring product fidelity.

    2. Hybridoma Rescue & Intellectual Property Protection

    Sequencing antibodies directly from hybridoma supernatants enables preservation of sequence data even if cell lines are lost, securing both research value and patent rights.

    3. Biosimilar Development & Analytical Similarity

    For biosimilar programs, MS sequencing ensures that candidate molecules match originator antibodies not only in primary structure but also in PTM profiles (especially glycosylation), supporting regulatory comparability.

    Monoclonal Antibodies: Ensuring Sequence Integrity and Consistency

    Challenges

    In monoclonal antibodies (mAb), accurate validation of their amino acid sequences is critical, but three major challenges are often faced: sequence drift (accidental mutations during cell culture), loss of original records, and batch-to-batch variation. These issues can disrupt antibody function and delay the development process.

    Apply

    High-fidelity sequencing provides a verified reference for downstream engineering. Since antibody engineering depends on the precision of the input sequence, any error at this stage can result in design failure or deviation from intended product characteristics.

    In early-stage development, sequence confirmation serves as a critical quality control (QC) checkpoint. It ensures molecular consistency across cell line construction, process development, and early GMP manufacturing—forming a reliable foundation for subsequent characterization and comparability studies.

    How Mass Spec Helps

    Mass spectrometry plays an irreplaceable role in this segment - it directly physically analyzes the expressed protein product. Through high-precision LC-MS/MS peptide mapping and de novo sequencing, we are able to:

    • Completely confirm amino acid sequences in the variable and constant regions of the light and heavy chains
    • Identify any unintended amino acid substitutions, insertions, or deletions
    • Synchronize the detection of key post-translational modifications (e.g., N-glycosylation sites)

    Real-World Example

    Background

    Herceptin (trastuzumab) is a marketed humanized therapeutic monoclonal antibody targeting the HER2 receptor. In this study, it was used as a benchmark sample to validate the accuracy and reliability of mass spectrometry-based ab initio antibody sequencing methods. The goal was to establish a standardized, reproducible workflow for sequencing antibodies with unknown structures.

    Analytical methods

    A bottom-up LC-MS/MS strategy was employed, integrating two key innovations:

    1. Parallel Protease Digestion

    Nine complementary proteases (e.g., trypsin, coagulation protease, elastase) were used in parallel under unified buffer conditions. This generated overlapping peptide fragments, enhancing overall sequence coverage and reducing blind spots in difficult regions.

    2. Dual Fragmentation Strategy

    Simultaneous use of:

    • Stepped high-energy collisional dissociation (HCD)
    • Electron-transfer and higher-energy collision dissociation (EThcD)

    This approach maximized sequence data by generating complementary ion types and leveraging different fragmentation mechanisms across all peptide precursors.

    3. Data Processing

    Raw MS/MS data were processed using Supernovo software. The software assembled sequences de novo, then iteratively refined them against antibody germline sequence templates to correct for somatic mutations.

    Key findings 

    • Achieved 100% sequence coverage of both light and heavy chains.
    • Variable regions were determined with 99% accuracy, with only 3 minor misclassifications. For example, residues H91/Y92 in the light chain CDR3 were misclassified as W91/N92 due to indistinguishable mass-to-charge ratios (m/z) of amino acids with similar properties.
    • HCD or EThcD alone produced 2–3 times more errors than the combined fragmentation strategy. The dual-fragmentation approach significantly enhanced accuracy, demonstrating clear synergistic value.
    • This study marks the first demonstration that a multiprotease + dual-fragmentation workflow can nearly completely reconstruct a known antibody sequence, establishing a strong technological foundation for the de novo sequencing of unknown or novel antibodies.

    Mass spectrometry-based de novo sequencing of the monoclonal antibody herceptin.Mass spectrometry-based de novo sequencing of the monoclonal antibody herceptin. (Figure from Weiwei Peng, 2021)

    Hybridomas: Sequence Recovery Without Genetic Material

    Challenges

    Common challenges in hybridoma technology: accidental loss of cell lines, low antibody secretion capacity, or missing sequence information due to genetic instability of cell clones

    Solution

    Directly analyze secreted antibody proteins in hybridoma cell culture supernatants. Even if a cell line fails to survive or expand, the presence of trace amounts of secreted antibodies in the supernatant can be resolved using liquid chromatography-tandem mass spectrometry (LC-MS/MS).

    Enzymatic cleavage, peptide isolation and mass spectrometry acquisition, combined with de novo de novo sequencing algorithms, allow reconstruction of the complete antibody sequence directly from the physical molecules of the protein.

    Advantages

    It does not rely on intact cells or stable genetic material (RNA/DNA), making it especially valuable in cases where viable cells are unavailable, nucleic acids are degraded, or antibody secretion is minimal.

    Application Value 

    Recombinant Antibody Production: Successful resolution of the sequence enables cloning into mammalian expression vectors (e.g., CHO cells) for stable, high-titer, scaleable recombinant antibody production.

    Preservation of valuable traditional antibodies: For hybridomas with important research value but at risk of being lost, this method is an effective way to salvage their sequences, preserve their intellectual property and promote their further development.

    Real-World Example

    Background

    The hybridoma-derived monoclonal antibody to the plasma membrane of human breast cancer, 139H2, has been a valuable research tool for Western blotting, ELISA, IHC, and IF, but the lack of sequence information has hindered its widespread use, necessitating the sequencing of this functional antibody to enable recombinant production.

    Analytical methods

    LC-MS/MS-based bottom-up proteomics is the core analytical technique.

    • Ab initio sequencing: Peptide sequences are deduced directly from MS/MS profiles using PEAKS software without relying on genomic databases.
    • Multiple Protease Digestion: Purified 139H2 IgG is digested simultaneously with trypsin, chymotrypsin, α-cleaving protease, and thermolysin to generate overlapping peptides for complete coverage.
    • Hybrid fragmentation: Simultaneous use of stepwise high-energy collisional dissociation (sHCD) and electron transfer/high-energy collisional dissociation (EThcD) on all peptide precursors to improve sequence coverage and accuracy, especially for CDRs.
    • Template-based assembly: Using the in-house software Stitch, ab initio sequenced peptides were assembled into full-length heavy and light chains using the IMGT mouse antibody database as a template. Manual validation was performed.

    Key findings 

    • The full amino acid sequence of the heavy and light chains of 139H2 was successfully determined. 139H2 was identified as a mouse IgG1 antibody with IGHV1-53 heavy chain and IGKV8-30 light chain, showing moderate somatic mutation.
    • Structural and functional testing confirmed that 139H2 is resistant to O-glycosylation of key T4 residues in its epitope, unlike many other anti-MUC1 VNTR antibodies.
    • Sequence availability enables reliable recombinant production of 139H2 for ongoing research and potential therapeutic/diagnostic development.
    • The unique glycosylation-independent binding mode of 139H2 makes it an important tool for targeting MUC1 overexpression in tumors independent of glycosylation alterations commonly found in cancer.
    • This work demonstrates the power of MS-based de novo sequencing in rescuing and characterizing valuable but poorly defined historical antibody reagents, improving the reproducibility of scientific studies.

    Mass spectrometry-based de novo sequencing of the monoclonal antibody herceptin.De novo sequencing of the hybridoma 139H2 based on bottom-up proteomics. (Figure from Weiwei Peng, 2024)

    Biosimilars: Sequence Characterization for Analytical Comparison

    In biosimilar development, precise structural analysis of the original antibody (reference drug) is a core requirement to ensure similarity.

    Challenges

    While traditional methods are limited by patent information barriers or reverse engineering uncertainties, mass spectrometry provides a direct and objective means of analysis.

    Key Roles of Mass Spectrometry

    Combination of bottom-up strategy through high-resolution LC-MS/MS:

    • Sequence Identification: Direct determination of complete amino acid sequences of heavy and light chains, confirming concordance of key sites (e.g., CDRs) in variable regions
    • PTM Mapping: Quantification of key modifications such as glycosylation sites (e.g., N297 in Fc region), oxidation, deamidation, and other key modifications, with sensitivity up to a single-site difference
    • Isoform Confirmation Confirmation: identifies charge variants (e.g. C-terminal lysine truncation) and glycoform distribution (G0F/G1F/G2F) to ensure functional similarity.

    Real-World Example

    Background

    The expiration of the original patent on recombinant human erythropoietin (rHuEPO) has led to the entry of a large number of biosimilars into the market, and these complex glycoproteins exhibit significant heterogeneity in glycosylation patterns due to different production systems and processes. This heterogeneity poses a challenge for drug quality control. Reliable methods to characterize, differentiate and identify these closely related rHuEPO products are therefore urgently needed.

    Analytical methods

    The core analytical technique was performed using nanoLC-ESI-MS/MS (nano-liquid chromatography-electrospray ionization-tandem mass spectrometry) on the LTQ-Orbitrap Velos Pro platform.

    • Glycoproteome Analysis: Analyzes complete glycopeptides (proteins + glycans) rather than just released glycans, providing protein sequence data along with site-specific glycosylation information.
    • Dual Fragmentation: Use both collision-induced dissociation (CID) and high-energy collision dissociation (HCD) on the same sample; CID is gentler and preserves the glycan structure of the glycopeptide, while HCD provides higher-energy fragmentation, generating more peptide backbone ions that can be used for sequence identification.
    • Multiple Digestion Replicates: Three separate tryptic digests are performed on each rHuEPO to ensure robustness.
    • Bioinformatics: Database searches (Swiss-Prot) and de novo sequencing support were performed using PEAKS Studio, specifically searching for common and unique peptides/glycopeptides with defined PTM modifications.

    Key findings 

    • Unique Chromatograms: Each rHuEPO obtains a unique Base Peak Chromatogram (BPC), which can be visually differentiated based on elution pattern.
    • High Sequence Coverage: High protein sequence coverage (70-77% for mature EPO sequences) was achieved by combining CID and HCD data.
    • Identification of biomarkers: A large number of peptides and glycopeptides specific to each rHuEPO product were identified
    • Impact of glycosylation: Differences in glycosylation patterns have been shown to have a significant impact on digestion efficiency, peptide/glycopeptide profiles and chromatographic behavior.
    • Quality Control: Provides a standardized MS-based approach to clearly characterize and differentiate rHuEPO biosimilars from biologics for regulatory compliance and batch consistency.
    • Comprehensive characterization: Glycoproteomic approaches provide a more comprehensive view (protein sequence + site-specific glycosylation), which is critical to understanding the complexity of biosimilars.

    Erythropoietin Biosimilar Profiling FlowchartErythropoietin Biosimilar Profiling Flowchart (Figure from Mohd Afiq Hazlami Habib, 2019

    Our Approach: Mass Spectrometry-Based Antibody Sequencing

    Overview of LC-MS/MS Strategy

    We use LC-MS/MS as a core strategy for in-depth analysis of intact antibody molecules or complex peptide mixtures after their enzymatic digestion. The experimental procedure typically includes: sample reductive alkylation, enzymatic cleavage, peptide F separation, and finally the collection of precise mass-to-charge ratio data of the peptide parent ion and its fragment ions. This strategy is capable of comprehensively capturing key information about the primary sequence of the antibody and its post-translational modifications.

    Peptide Mapping and De novo Assembly

    Based on the raw mass spectrometry data, we first performed Peptide Mapping, which is a preliminary method to confirm the known sequence regions by comparing the obtained peptide masses with the theoretical database.

    For unknown sequences or regions with variations, we use De novo Sequencing: based on high quality MS/MS spectra, we directly analyze the b/y ion series generated by peptide breaks, and combined with bioinformatics algorithms, we obtain the order of amino acids, and realize the sequence assembly of the complete variable and constant regions.

    Benefits: No Need for Cell lines or Genetic Material

    Protein molecules themselves are directly characterized without relying on host cell lines, DNA templates or RNA. This feature makes it irreplaceably valuable in scenarios where traditional methods fail, such as when hybridoma cell lines are unstable, lost, or cannot be cultured, or when there is a need to directly characterize a final protein product that has undergone complex engineering modifications or post-translational modifications. Sequence resolution can be performed as long as trace amounts of the target protein are available.

    Sample Types Accepted (purified antibody, serum, hybridoma supernatant)

    Highly purified monoclonal antibodies are the ideal sample type. In addition, partially purified antibodies, hybridoma cell culture supernatants, and serum samples of polyclonal antibodies can be analyzed as effective starting materials.

    References

    1. Le Bihan, et al. De novo protein sequencing of antibodies for identification of neutralizing antibodies in human plasma post SARS-CoV-2 vaccination. Nat Commun 15, 8790 (2024). https://doi.org/10.1038/s41467-024-53105-8
    2. Mao, Y., et al. (2019). Fast protein sequencing of monoclonal antibody by real-time digestion on emitter during nanoelectrospray. mAbs, 11(4), 767–778. https://doi.org/10.1080/19420862.2019.1599633

    For research use only, not intended for any clinical use.

    Online Inquiry