General Reference

INTERNATIONAL UNION OF PURE AND APPLIED CHEMISTRY (IUPAC)
and
INTERNATIONAL UNION OF BIOCHEMISTRY AND MOLECULAR BIOLOGY (IUBMB)
IUPAC-IUB Joint Commission on Biochemical Nomenclature (JCBN)

Amino Acids

  • Side Chains in TEXT format
    http://www.chem.qmw.ac.uk/iupac2/AminoAcid/AA1n2.html
  • Nomenclature and Symbolism for Amino Acids and Peptides(JCBN)
    http://pdb.pdb.bnl.gov/PPS2/course/section2/AminoAcid/the_twenty.txt
  • Amino Acids tutorials from the on-line "Principles of Protein Structure"
    http://pdb.pdb.bnl.gov/PPS2/course/section2/index.html

    Single, 3-letter and ambiguity codes for Amino Acids

    The 20 naturally occuring amino acids:
    Alanine         Ala     A
    Cysteine        Cys     C
    Aspartic AciD   Asp     D
    Glutamic Acid   Glu     E
    Phenylalanine   Phe     F
    Glycine         Gly     G
    Histidine       His     H
    Isoleucine      Ile     I
    Lysine          Lys     K
    Leucine         Leu     L
    Methionine      Met     M
    AsparagiNe      Asn     N
    Proline         Pro     P       
    Glutamine       Gln     Q
    ARginine        Arg     R
    Serine          Ser     S
    Threonine       Thr     T
    Valine          Val     V
    Tryptophan      Trp     W
    TYrosine        Tyr     Y
    

    Amino acids ambiguity codes:

    Asparagine/Aspartic Acid    Asx   B 
    Glutamine/Glutamic Acid     Glx   Z 
    
    Not assigned to amino acids:
    J O U
    
    Symbols for sequence analysis:
    deletion or gap    . (dot) 
    End or Terminator  * (star)
    

    Nucleic Acids

    From:
    Nomenclature Committee of the International Union of Biochemistry (NC-IUB)
    Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences
    http://www.chem.qmw.ac.uk/iupac2/misc/naseq.html

    Single-letter and ambiguity codes for nucleotides

    The present nomenclature, summarised in Table N1, has been formulated to deal with incomplete specification of bases in nucleic acid sequences. In cases where two or more bases are permitted at a particular position the nomenclature permits the allocation of a single-letter symbol. The nomenclature may also be applied where uncertainty exists as to extent and/or identity.

    Table N1. Summary of single-letter code recommendations

    SymbolMeaningOrigin of designation
    GGGuanine
    AAAdenine
    TTThymine
    CCCytosine
    RG or ApuRine
    YT or CpYrimidine
    MA or CaMino
    KG or TKeto
    SG or CStrong interaction (3 H bonds)
    WA or TWeak interaction (2 H bonds)
    HA or C or Tnot-G, H follows G in the alphabet
    BG or T or Cnot-A, B follows A
    VG or C or Anot-T (not-U), V follows U
    DG or A or Tnot-C, D follows C
    NG or A or T or CaNy

    For double-stranded nucleic acids Table N2 permits the allocation of symbols to the complementary strand.

    Table N2. Definition of complementary symbols

    SymbolABCDGHKMSTVWN
    ComplementTVGHCDMKS*ABW*N*

    * In certain cases the symbol and its complement are identical.