- Service Details
- Case Study
What Is De Novo Peptides/Proteins Sequencing
The sequence of peptide/protein is important to study the biological function of the peptide/protein. However, complete characterization of peptides/proteins, including post-translational modifications (PTMs), sequence mutations and variants, is very challenging. There are two approaches to determine the sequence of peptide/protein by mass spectrometry: database search and de novo sequencing. Database search approach compares acquired mass spectra to a database of known protein sequences to identify the protein sequences. De novo sequencing is a process in which amino acid sequences are directly interpreted from tandem mass spectra without the assistance of a database.
Although database search identification of proteins by mass spectrometry is well established, the method does not apply if the protein sequence does not exist in the current database. Therefore, de novo sequencing is the only method for identifying novel peptides, unsequenced organisms, and antibodies drugs, which database search methods were not able to detect. However, de novo sequencing poses more challenging than the traditional database search approach, such as, ambiguous assignments of fragment ions, insufficient product ions generated in incomplete fragmentation leading to low sequence coverage and difficulty in distinguishing ion series, notably N-terminal from C-terminal MS/MS product ions (b ions from y ions).
Creative Proteomics Strategies
At Creative Proteomics, we tackle these challenges by using the following four strategies.
Firstly, Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) has the highest mass resolution and accuracy to avoid the false positivity led by low mass resolution and accuracy. With 7T solariX XR FTICR-MS, the mass resolving power can research 10,000,000. As the mass accuracy increases, the resulting assignments become increasingly confident.
Secondly, both bottom-up and top-down mass spectrometry is used to analyze the same sample. Bottom-up allows us to use different enzyme digestions to generate overlapping peptides while top-down mass spectrometry offers intact mass data and provides protein fragmentation details. By combing these data, the peptide/protein sequence can be confirmed in a better way.
Thirdly, four types of fragmentation techniques: collision induced dissociation (CID), electron transfer dissociation (ETD), electron capture dissociation (ECD) and high energy collisional dissociation (HCD) are employed for peptide/ protein fragmentation, which could provide more fragment ions from the same peptide/protein and these complementarity data will sure to improve the ion assignment. For example, it would be possible to distinguish C-terminal (z•) from N-terminal (c’) ECD product ions based on the ratio of prime to radical ion abundance in ECD vs activated-ion ECD (AI-ECD) MS/MS product ion mass spectra.
Finally, we also provide chemically derivation to identify ion series. For example, introduce bromide on the C-terminus by oxazolone chemistry can enable identification of y ions because of the distinct bromide isotope peaks. Similarly, introduction of guanidination can increase the possibilities of identification and the selectivity.
Protein De Novo Sequencing Services
Creative Proteomics is equipped with a high-quality Orbitrap Fusion Lumos mass spectrometer and has established a comprehensive database, which enables us to provide high-quality sequencing services of protein full sequence to a wide range of customers. For different experimental needs, we also provide Edman degradation method to determine up to 67 amino acid sequence lengths at the N-terminal end of proteins.
Whole protein de novo sequencing based on Orbitrap Fusion Lumos mass spectrometer complemented by Edman degradation.
After the customer sends the protein product to Creative Proteomics, Creative Proteomics will first identify and fragment the protein sample. We will use six commonly used proteases including Trypsin, Chymotrypsin, Asp-N, Glu-C, Lys-C and Lys- to obtain as A wide variety of fragmented sequences were obtained to achieve full protein sequence coverage. Sequence information from peptide mass spectrometry sequencing is spliced by PEAKs Studio (protein product) and PEAKs Ab (antibody product) software with manual assistance to achieve correctly spliced protein or antibody primary sequences. For more information on de novo amino acid sequencing, please refer to the antibody de novo sequencing service.
Advantages of de novo sequencing technology
(1) Determination of the amino acid sequence of unknown peptides from scratch, independent of known peptide sequence databases.
(2) Ability to localize the post-translational processing or chemical modification of side chain motifs without the restriction of N-terminal closure.
(3) Enables the identification of non-standard amino acids.
(4) Improved detection sensitivity.
- Nano HPLC- 7T solariX XR FTICR-MS (equipped with ECD)
- Nano HPLC- Orbitrap Fusion™ Lumos™ Tribrid™ MS (equipped with ETD)
- Normal amount: 500ug/protein
- Minimum amount: 200ug/protein
- Determine the sequence of protein by HPLC-MS.
- A detailed technical report will be provided at the end of the whole project, including the sample preparation, HPLC-MS/MS parameters.
- The sequence (FDR<1%) of the protein, molecular weight will be reported in the report.
FAQ of De Novo Peptides Sequencing
Q1: What is the difference between de novo sequencing and Edman degradation sequencing for protein sequencing?
A: De novo sequencing of proteins utilizes algorithms to directly deduce the peptide sequence from ion information in mass spectrometry spectra. On the other hand, Edman degradation sequencing is generally suitable for peptides composed of 15-20 amino acids and requires relatively high sample purity, at least 97% purity. It also involves protein digestion, peptide separation, purification, and individual peptide testing. Additionally, the time and economic costs of performing Edman degradation sequencing on a monoclonal antibody are relatively high.
Compared to the traditional Edman degradation method, mass spectrometry-based de novo sequencing is more efficient, high-throughput, and cost-effective. In cases where the amino acid sequence of a protein is already known, de novo sequencing can also identify new protein variants arising from unknown mutations, splicing events, and various post-translational modifications, providing more comprehensive information about the antibody sequence.
Q2: What types of samples can be used for de novo sequencing of protein?
A: De novo sequencing of proteins is not dependent on the size or length of the protein. It can be performed on monoclonal antibodies, Fab/Fc, bispecific antibodies, multispecific antibodies, recombinant proteins, peptides, fluorescently labeled antibodies, cross-linked antibodies, and other types of proteins.
Q3: Can fluorescently labeled or cross-linked antibodies be subjected to de novo sequencing?
A: Proteins that are cross-linked on beads or those that are fluorescently labeled, such as flow cytometry antibodies labeled with FITC, Cy5, PE, etc., can also undergo de novo sequencing.
Q4: Do the sample buffer components affect the sequencing results?
A: For standard sample submission, it is recommended to use PBS or Tris buffer system for protein dissolution. If there are components like glycerol, BSA, detergents, salts, etc., appropriate purification methods will be applied, ensuring that they do not affect the sequencing results.
Q5: Will sample glycosylation and post-translational modifications affect the sequencing?
A: During data analysis, possible post-translational modifications and glycosylation will be taken into consideration, and they will not affect the sequencing results of the sample itself.
Q6: What does the report of protein de novo sequencing include?
A: The report includes the following:
Complete amino acid sequence of the protein, including heavy and light chain sequences for antibodies, constant and variable regions.
Verification of the complete molecular weight, comparing the measured molecular weight to the theoretical sequence weight to validate sequence accuracy.
Peptide coverage map of the complete sequence, with support from over ten different peptide fragments at each amino acid position.
Supportive secondary mass spectrometry data for variable region peptide segments of antibodies.
Reliability analysis of I/L identification.
Q7: How accurate are the results of protein de novo sequencing?
A: The sequencing results guarantee full sequence coverage and 100% sequence accuracy. Each amino acid position is supported by over ten different peptide fragments in the mass spectrometry data, providing strong evidence. The sequencing results are obtained through a combination of software algorithms and manual verification, ensuring the accuracy of the sequence.
Antigen Presentation Profiling Reveals Recognition of Lymphoma Immunoglobulin Neoantigens
The article integrated genomic and proteomic strategies to directly analyze tumor antigen peptides presented by MHC I and MHC II from tumor cells. It was found that these tumor neoantigen peptides originated from the variable regions of immunoglobulin light chains or heavy chains of lymphomas. For the MHC-bound tumor neoantigen peptides, over 24,000 unique peptides bound to MHC-I were identified through immunoprecipitation (IP) and mass spectrometry, while over 12,500 unique peptides bound to MHC-II were identified. Additionally, a protein sequence database was constructed for comprehensive proteomic analysis of the MCL cell line.
In terms of data analysis methods, the team employed a combination of de novo sequencing and database search, with improvements to the percolator algorithm. The PEAKS Studio 7.0 software was used to execute de novo sequencing and database search, maximizing the identification of peptides.
The analysis method based on de novo sequencing in the study used a credibility filtering approach to obtain de novo tags for database search. By combining de novo sequencing with database search, the efficiency of peptide identification was maximized, leading to improved sensitivity and accuracy in the database search. This approach facilitated the identification of a greater number of modifications, sequence mutations, and entirely novel peptides.
- Khodadoust M S, Olsson N, Wagar L E, et al. Antigen presentation profiling reveals recognition of lymphoma immunoglobulin neoantigen. Nature, 2017, 543(7647):723-727.