Resource

Online Inquiry

Methods and Techniques for Protein Sequencing

Proteins, the workhorses of biology, are intricately involved in almost every cellular process, making them a focal point of biological research. The elucidation of a protein's sequence, or protein sequencing, lies at the heart of understanding its function, structure, and role in health and disease.

To appreciate the present and envision the future of protein sequencing, we must first delve into its history. The origins of protein sequencing can be traced back to the mid-20th century when Frederick Sanger pioneered the groundbreaking technique known as Edman Degradation. This method allowed scientists to determine the amino acid sequence of proteins with remarkable precision. Sanger's efforts earned him two Nobel Prizes in Chemistry and laid the foundation for modern protein sequencing techniques.

Over the years, protein sequencing has evolved significantly. Technological advancements, such as mass spectrometry and next-generation sequencing (NGS), have revolutionized the field, enabling scientists to analyze proteins with unprecedented speed and accuracy. This historical journey underscores the relentless pursuit of knowledge in the biological sciences and sets the stage for the in-depth exploration of protein sequencing techniques.

Protein Sequencing Techniques

Edman Degradation

Principle:

Edman Degradation is a classical method for determining the amino acid sequence of a protein. It is based on the selective cleavage of the N-terminal amino acid residue from a peptide chain without affecting the rest of the sequence. The cleaved amino acid is then identified, and the process can be repeated sequentially to determine the complete sequence of the protein.

Steps:

PITC (Phenylisothiocyanate) Reaction: The Edman Degradation process begins by reacting the protein of interest with PITC. PITC reacts specifically with the primary amine group (-NH2) at the N-terminus of the protein, forming a stable phenylthiocarbamyl (PTC) derivative.

Cleavage: After the N-terminal PTC derivative is formed, the next step involves cleaving the N-terminal amino acid from the rest of the peptide chain. This is typically done using an acid, commonly trifluoroacetic acid (TFA), which breaks the peptide bond adjacent to the N-terminus.

Isolation of Cleaved Amino Acid: The cleaved N-terminal amino acid is now in the form of a PTC derivative. It is isolated from the reaction mixture.

Identification: The isolated PTC amino acid is subjected to various analytical techniques, such as high-performance liquid chromatography (HPLC) or mass spectrometry (MS). These techniques determine the identity of the cleaved amino acid.

Repeat the Process: The remaining peptide is then subjected to another cycle of Edman Degradation. This sequential process can be repeated until the entire protein sequence is determined, one amino acid at a time.

Applications:

Edman Degradation is suitable for determining the N-terminal sequence of a protein or peptide. It is commonly used for proteins of moderate size (typically less than 50 amino acids) and is particularly valuable for small to medium-sized proteins.

Limitations:

  • Edman Degradation is time-consuming and labor-intensive, especially for larger proteins, as it involves multiple cycles of reactions and analysis.
  • It is limited to determining the N-terminal sequence and cannot be used to obtain the C-terminal sequence or the complete protein sequence.
  • The accuracy of Edman Degradation can be affected by certain amino acids, such as proline, which may require special treatment.

Advantages:

  • Edman Degradation provides accurate N-terminal sequencing information.
  • It has been widely used and established as a reliable method for protein sequencing.

Protein N-Terminal Sequencing using Mass Spectrometry:

Methods:

Enzymatic Cleavage: To ascertain the N-terminal sequence, proteins are often enzymatically cleaved at specific amino acid residues using proteolytic enzymes such as trypsin or chymotrypsin. These enzymes cleave the protein at precise amino acid positions, resulting in the generation of peptides with known N-termini.

MALDI-TOF MS Analysis: Following enzymatic digestion, the resultant peptide fragments can be subjected to analysis using MALDI-TOF MS (Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry). This technique enables the direct determination of peptide masses, facilitating the identification of the protein's N-terminus.

Applications:

Protein N-Terminal Sequencing via Mass Spectrometry is a valuable tool for elucidating the amino acid sequence at the N-terminus of a protein. It is commonly employed to pinpoint the initiation point of a protein or to verify its identity.

Advantages:

Mass spectrometry offers high sensitivity and precision when conducting N-terminal sequencing. It is applicable to a broad spectrum of proteins and peptides.

Limitations:

The accuracy of N-terminal sequencing may be affected by the presence of specific amino acids and post-translational modifications. Furthermore, this method exclusively provides information about the N-terminus and does not yield the complete protein sequence.

Protein C-Terminal Sequencing using Mass Spectrometry:

Methods:

Enzymatic Cleavage: Much like N-terminal sequencing, the strategy involves cleaving the protein at specific amino acid positions using proteolytic enzymes. Enzymes such as Lys-C or Glu-C are used to cleave the protein at well-defined sites, yielding peptides with known C-termini.

MALDI-TOF MS Analysis: Following enzymatic digestion, the resulting peptide fragments undergo MALDI-TOF MS analysis to determine their mass. This direct mass analysis aids in the identification of the C-terminus.

Applications:

Protein C-Terminal Sequencing via Mass Spectrometry is invaluable for unveiling the amino acid sequence at the protein's C-terminus. It serves to pinpoint the termination point of a protein and validate its integrity.

Advantages:

Mass spectrometry provides exceptional precision and sensitivity in C-terminal sequencing. This technique is applicable to a wide array of proteins and peptides.

Limitations:

Similar to N-terminal sequencing, C-terminal sequencing may be influenced by specific amino acids and post-translational modifications. It exclusively furnishes information about the C-terminus and does not yield the complete protein sequence.

Full Protein Sequence Determination:

Protein sequencing aims to reveal the complete amino acid sequence of a protein, going beyond just examining its N- or C-termini. This process involves specific methods, has practical applications, and comes with its own set of advantages and limitations.

Methods:

Tandem Mass Spectrometry (MS/MS or MS2): This technique begins by ionizing the protein and then breaking it into smaller ions. These fragments are subsequently analyzed through mass spectrometry. By examining the mass-to-charge ratios of these fragments, the entire amino acid sequence of the protein can be determined.

De Novo Sequencing: When the protein's sequence is unknown, de novo sequencing comes into play. It involves reconstructing the protein's sequence from the data obtained through mass spectra, often with the help of computational algorithms. This method is particularly useful for discovering new proteins or post-

Applications:

Full protein sequence determination is essential for gaining a comprehensive understanding of a protein's amino acid sequence, including identifying any post-translational modifications. It plays a crucial role in proteomics for identifying unknown proteins and validating gene predictions.

Advantages:

Mass spectrometry-based techniques are effective and adaptable, capable of handling proteins of varying sizes. They are also sensitive to post-translational modifications, making them invaluable for studying protein alterations.

Limitations:

Performing mass spectrometry-based analysis requires specialized equipment and expertise in data analysis. The success of full protein sequence determination depends on the quality of the data and the availability of suitable software for data interpretation.

De Novo Protein Sequencing:

Principle:

De Novo Protein Sequencing aims to determine the complete amino acid sequence of a protein without relying on a known reference sequence. It relies on high-resolution mass spectrometry and computational algorithms.

Steps in De Novo Protein Sequencing:

Sample Preparation: The target protein is enzymatically digested into smaller peptides, typically using enzymes like trypsin. This digestion results in a mixture of peptide fragments.

Mass Spectrometry Analysis: The peptide mixture is subjected to mass spectrometry, such as liquid chromatography-mass spectrometry (LC-MS) or tandem mass spectrometry (MS/MS). MS/MS involves fragmenting selected peptides into smaller ions, and the mass-to-charge ratios (m/z) of these fragment ions are measured.

Protein sequencing using tandem MSProtein sequencing using tandem MS (Saraswathy et al., 2011)

Data Interpretation: Precursor Masses: Accurate masses of precursor ions (intact peptides) are determined. Fragmentation Patterns: Fragmentation patterns of the peptides are analyzed to identify the masses of the fragment ions generated during MS/MS.

Algorithmic Analysis: Sequence Tagging: Computational algorithms identify short amino acid sequence tags within the peptide based on observed fragment ions and their masses.

Spectrum Graph Construction: A spectrum graph is created, connecting sequence tags in a way that represents potential amino acid sequences.

Optimization: The algorithm optimizes the sequence by selecting the most likely path through the spectrum graph, considering factors like fragment ion masses, charge states, and mass errors.

Validation: The proposed amino acid sequence is validated by comparing calculated fragment ion masses with observed data. High confidence in the sequence is achieved when observed fragment ions match expected values.

Iterative Process: De Novo sequencing may involve multiple iterations and refinements to improve sequence accuracy, especially for longer or complex proteins.

Applications:

  • Identifying novel proteins or isoforms.
  • Characterizing post-translational modifications.
  • Investigating protein variations in diseases.
  • Studying non-model organisms with limited genomic data.

Challenges:

  • Computational intensity requires specialized software and hardware.
  • More challenging for larger or highly modified proteins.
  • Sequence accuracy may be lower for proteins with repetitive regions or low sequence complexity.

Reference

  1. Saraswathy, Nachimuthu, and Ponnusamy Ramalingam. Concepts and techniques in genomics and proteomics. Elsevier, 2011.
* For Research Use Only. Not for use in diagnostic procedures.
Our customer service representatives are available 24 hours a day, 7 days a week. Inquiry

Online Inquiry

Please submit a detailed description of your project. We will provide you with a customized project plan to meet your research requests. You can also send emails directly to for inquiries.

* Email
Phone
* Service & Products of Interest
Services Required and Project Description
* Verification Code
Verification Code