This repository is dedicated to cancer research. It will higlight theoretical knowledge-base together with related deep learning projects.
The primary structure of a protein is essentially a sequence of amino acids linked by peptide bonds, forming a polypeptide chain. This sequence is not random but is precisely determined by the corresponding gene via messenger RNA. The diversity of polypeptide chains arises from various combinations of 20 distinct amino acids, though selenocysteine, the 21st amino acid, also appears in 25 human selenoproteins, some of which are involved in antioxidant defense and thyroid hormone regulation.
Within a polypeptide chain, amino acids can form hydrogen bonds, which lay the foundation for the protein's secondary structure. This structure arises when the chain undergoes local folding, typically forming one of two patterns: the α-helix or β-sheet. In an α-helix, hydrogen bonds occur between the carbonyl group (C=O) of one amino acid and the amino group (N-H) of another, four residues further along the chain, creating a helical shape when these bonds follow a regular pattern. In contrast, β-sheets are stabilized by hydrogen bonds between the carbonyl group of one polypeptide segment and the amino group of a parallel segment, aligning multiple chain segments side by side.
Each amino acid has a unique side chain, or R group, which defines its properties, such as polarity, charge, and size. These R groups play a crucial role in folding the polypeptide into its tertiary structure through various interactions. For instance, ionic bonds form between oppositely charged R groups, stabilizing the structure through salt bridges [MB06; DKD11; Kuo+13]. Similarly, hydrogen bonds between polar side chains contribute to stabilization, and disulfide bonds between cysteine residues facilitate folding and structural integrity. Additionally, hydrophobic R groups tend to cluster inside the protein to avoid water, while hydrophilic R groups are typically found on the surface, interacting with water molecules.
The tertiary structure results from the integrity of the primary and secondary structures, producing a single monomeric protein. When multiple monomeric chains interact, they form a quaternary structure, an organized assembly of subunits.