De Novo protein sequence analysis

Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum (MS/MS) without the assistance of a sequence database. It is in contrast to another popular peptide identification approach – “database search”, which searches in a given database to find the target peptide. A clear advantage of de novo sequencing is that it works for both database and novel peptides.

Sequence determination of a protein that cannot be found in a sequence database requires de-novo sequencing. De Novo protein sequencing is the process by which the amino acid sequence is deduced without prior knowledge of the DNA or protein sequence. This differs from sequence confirmation, where the protein/DNA sequence is already known and the sequence data obtained is used to confirm that it is correct. De-novo protein sequencing of an intact protein needs careful experimental design, a combination of different analyses and evaluation of data from mass spectrometry, protein chemistry and bioinformatics.

De-novo peptide and protein sequencing are applied in the following applications: to find partial amino acid sequences for the design of DNA primers, cloning and DNA/mRNA sequencing; for sequencing of monoclonal antibody variable regions; full characterization of immunoglobulins purified from immunized organism or from hybridoma cells; for proteomics projects and protein identification in organisms where genome sequences are not available; for bioactive peptides with modified and unusual amino acids.

A de-novo protein sequencing project often includes: protein cleavage into peptides using specific proteases; de-novo peptide sequencing using SPITC labeling with 4-sulfophenyl isothiocyanate and MS/MS peptide fragmentation; N-terminal peptide sequencing by Edman degradation; Top-down protein sequencing by MALDI-ISD.

The main idea of de novo sequencing is to use the mass difference between two fragment ions to calculate the mass of an amino acid residue on the peptide backbone. Such a process can be continued until all the residues are determined.

In a tandem mass spectrometer, the peptide is fragmented along the peptide backbone and the resulting fragment ions are measured to produce the MS/MS spectrum. Depending on the fragmentation methods used, different fragment ion types can be produced. The most widely used fragmentation methods today are Collision-Induced Dissociation (CID) and Electron-Transfer Dissociation (ETD). CID produces mostly b and y-ions; and ETD produces mostly c and z-ions. A good quality spectrum often contains many (but not necessarily all) of the theoretical fragment ions.

In de-novo protein sequence analysis, if one can identify either the y-ion or b-ion series in the spectrum, the peptide sequence can be determined. However, the spectrum obtained from the mass spectrometry instrument does not tell the ion types of the peaks, which require either a human expert or a computer algorithm to figure out during the process of de novo sequencing. During this process, the following factors can cause de novo sequencing to figure out only a partially correct sequence tag from the spectrum: incorrect assignment of y and b ions; some fragment ions are missing; existence of other fragment ion types; existence of noise peaks in the spectrum; the same or similar mass of some residues may cause ambiguity; the PTM (post-translational modifications) on the residues may contribute to the mass ambiguity, as well as complicate the peptide fragmentation pattern.

In de-novo protein sequence analysis, if one can identify either the y-ion or b-ion series in the spectrum, the peptide sequence can be determined. However, the spectrum obtained from the mass spectrometry instrument does not tell the ion types of the peaks, which require either a human expert or a computer algorithm to figure out during the process of de novo sequencing. During this process, the following factors can cause de novo sequencing to figure out only a partially correct sequence tag from the spectrum: incorrect assignment of y and b ions; some fragment ions are missing; existence of other fragment ion types; existence of noise peaks in the spectrum; the same or similar mass of some residues may cause ambiguity; the PTM (post-translational modifications) on the residues may contribute to the mass ambiguity, as well as complicate the peptide fragmentation pattern.

De novo sequencing was historically thought to be slow. Therefore it has been mostly used when the protein database was unavailable. However, with recent development in computer algorithms such as PEAKS, speed is no longer an issue. This makes de novo sequencing a viable choice for every mass spectrometry analysis in proteomics. Even when a database is available, de novo sequencing can contribute to peptide identification in the following ways.

First, the matching or similarity between the de novo sequencing peptide and the database search peptide is a good indication that the database search result is correct. Therefore, de novo sequencing can be used to improve database search performance. Secondly, the de novo sequencing peptides without significant database hits are possibly novel peptides in the sample, and deserve further examination, such as the finding of unexpected PTM or peptide mutations.


+FAQ

Services

Online Inquiry

Please submit a detailed description of your project. We will provide you with a customized project plan to meet your research requests. You can also send emails directly to info@creative-proteomics.com for inquiries.
* Name:
* Email:
Organization:
Phone:
* Service & Products of Interest:
Services Required and Project Description:
* Verification code:   Please input "proteomics" as verification code.
SUBMIT