One of the most important areas of biochemistry for which special symbols are essential is that of biopolymers. It is almost impossible to represent the name of even a simple protein, polynucleotide, or polysaccharide except by the use of logical and universally accepted abbreviations. The name of, e.g., one of the chains of insulin, expressed in terms of 30 amino-acid-radical names in order, is so unwieldy as to be useless. The symbolic representation gives the structure in two lines of print.
For some of the most important biochemical reagents, coenzymes, etc., even shorter abbreviations are universally employed, e.g., ATP, NAD, RNA. These abbreviations do not represent a chemical structure in the way that symbols do. The creation of such new abbreviations should therefore be restricted to an absolute minimum.
Other symbols or abbreviations than those listed in the IUPAC-IUB Rules should be used only in those situations where an objective case may be made for necessity; none should be used when pronouns and similar short terms may replace a long word or phrase. They should always be defined in each paper. Such ad hoc abbreviations and symbols should not conflict with known ones, or with the general principles. None should be introduced except when repeated use is required. If, in exceptional circumstances, symbols or abbreviations are used in the Summary, they should be defined in the Summary, as well as in the body of the paper.
There are three main series of symbols for monomeric units, those for amino acids, monosaccharides, and mononucleosides, of which the amino-acid series is the oldest. The monomeric units are generally designated by three letter symbols - a capital followed by two lower-case letters. The abbreviations should not be used for the free monomers in the text of papers.
A standard treatment has been devised for the three groups of macromolecules which are built up from these units. Where the sequence of residues is known, the symbols are written in order and joined by short lines (dashes, hyphens). Where the sequence is not known, the group of symbols, separated by commas, is enclosed in parentheses. Example: Ala-Gly-(Met,Pro)-Lys means that the sequence of methionine and proline is unknown.
Macromolecules composed of repeating sequences may be represented by the prefix 'poly' or the subscript n, both indicating 'polymer of'. The symbols for the monomeric units of the sequence are enclosed in parentheses. Thus, poly(Lys) or (Lys)n is polylysine, poly(Ala-Lys) or (Ala-Lys)n is a linear polymer consisting of alanine and lysine in regular alternating sequence and poly(Ala,Lys) is the irregular random copolymer of equal amounts of these amino acids. Between poly and the parenthesis there is no intervening space or hyphen. The n may be relaced by a definite number, an average (e.g. ), or a range (e.g. 8-12), as appropriate. 'Oligo' may replace 'poly' for short chains.
When other abbreviations for chemical compounds are needed, the maximum use should be made of standard chemical symbols (C, H, O, N, P, S, Na, Cl, etc.), numerical multiples (subscripts 2 and 3, not di or D or T etc., as in Me2SO, Me3Si-) and of trivial names and their symbols (e.g. folate, P, Me, Pr, Bu, Ph, Ac).
Symbols may be combined to represent more complex symbols, such as Tos-Arg-OMe, in which the basic structure (arginine) remains recognisable.
Names of enzymes are not to be abbreviated except in terms of substrates for which accepted abbreviations exist (hence ATPase and RNase, but not LDH, GPDH, ACE, etc.).
Peptide Hormones. The IUPAC-IUB Commission on Biochemical Nomenclature (CBN) has recommended trivial names short enough to make abbreviations unnecessary, e.g. corticotropin (for ACTH), follitropin (for FSH), folliberin (for FSH-RF), etc. (ref 1).
Class names, such as fatty acids, protein, virus, etc., or short terms (poly, furan, folate, etc.) are not to be abbreviated even when an associated term is abbreviated or symbolised (e.g. poly(X), not PX; H4folate, not THF).
No abbreviations should be used for terms such as 'central nervous system', 'red blood cells', or 'extra-cellular fluid'.
The following tables have been compiled to aid authors and readers. They list the symbols and abbreviations proposed in the various CBN documents already published. The biochemical journals accept most of the CBN recommendations.
Table 1. Symbols for amino acids
The symbols preceded by a plus sign may be used without definition. The use of the one-letter abbreviations (in brackets) should be resiricted to comparisons of long sequences in tables, lists, or figures, and for such special use as tagging three-dimensional models of proteins. They should not be used in papers where the single-letter system for nucleoside sequences is employed, as in repeating codons. Di(α-amino acids) are listed in appendix B of reference 2.
|Alanine||+ Ala (A)|
|Arginine||+ Arg (R)|
|Asparagine||+ Asn (N)|
|Aspartic acid||+ Asp (D)|
|Aspartic acid or asparagine||+ Asx (B)|
|Cysteine (cf. half-cystine)||+ Cys (C)|
|Glutamic acid||+ Glu (E)|
|Glutamine||+ Gln (Q)|
|Glutamic acid or glutamine||+ Glx (Z)|
|Glycine||+ Gly (G)|
|Half-cystine (cf. cysteine)||+ Cys|
|Histidine||+ His (H)|
|Homoserine lactone||Hse >|
|Isoleucine||+ Ile (I)|
|Leucine||+ Leu (L)|
|Lysine||+ Lys (K)|
|Methionine||+ Met (M)|
|Phenylalanine||+ Phe (F)|
|Proline||+ Pro (P)|
|5-Pyrrolidone-2-carboxylic acid (pyroglutamic acid; oxoproline)||<Glu|
|Serine||+ Ser (S)|
|Threonine||+ Thr (T)|
|Tryptophan||+ Trp (W)|
|Tyrosine||+ Tyr (Y)|
|Valine||+ Val (V)|
Table 2. Symbols for substituents of amino acids and of reagents used for their modification
These symbols should be defined
|Aminoethyl-||Act- or -(CH2)2NH2|
|Benzhydryl-||Bzh- or Ph2CH-|
|Benzoyl-||Bz- or PhCO-|
|Benzyl-||Bzl- or PhCH2-|
|Benzyloxy-||-OBzl or -OCH2Ph|
|Benzyloxycarbonyl-||Cbz- or Z-|
|Benzylthiomethyl-||Btm- or PhSCH2-|
|Butoxycarbonyl-||Boc- or ButOCO-|
|Carbamoyl-||Cbm- or NH2CO-|
|Carbamoylmethyl-||Cam- or -CH2CONH2|
|Carboxymethyl-||Cm- or -CH2CO2H|
|3-Carboxypropionyl- (cf. succinyl-)||Suc-|
|Cyclopentyloxycarbonyl-||Poc- or cPeOCO|
|Diazoacetyl-||N2Ac- or N2CHCO|
|Diisopropyl fluorophosphate||(PriO)2PO-F; PriP-F; iPr2P-F, or Dip-F|
|5-Dimethylaminonaphthalenesulfonyl-||Dns- or dansyl-|
|Dinitrophenyl-||N2ph- or Dnp-|
|Diphenylmethoxy-||-OBzh or -OCHPh2|
|Diphenylmethyl-||Bzh- or Ph2CH|
|Ethoxy-||-OEt or EtO-|
|Maleoyl-||Mal< or -Mal-|
|Methylthiocarbamoyl-||Mtc- or MeNHCS|
|Phenylthiocarbarnoyl-||PhNCS- or Ptc-|
|Phosphoric residue||P- or -P|
|Phthaloyl-||Pht< or -Pht-|
|Succinyl- (cf. 3-carboxypropionyl-)||Suc< or -Suc-|
|Triphenylmethyl-||Ph3C- or Trt-|
Table 3. Symbols for carbohydrates
This table lists the most commonly used symbols for carbohydrates; those preceded by a plus sign may be used without definition. Pyranose and furanose forms are designated where necessary by the suffixes p and f. Configurational symbols D and L (small Roman capital letters) and anomeric prefixes are shown where necessary as prefixes.
|Derivatives of, e.g., glucose|
Note The prefix 'd' indicates a 2-deoxysugar. Other deoxysugars may be designated similarly with a positional numeral, e.g., 3-deoxyglucose: 3-dGlc.
Table 4. Symbols for bases
These symbols should be defined
Table 5. Symbols for nucleosides and nucleolides
The symbols preceded by a plus sign may be used without definition.
Two systems are recognised, one using three-letter symbols for the common nueleosides and a capital italic P for the phosphoric residue, the other using single capital letters for the common nucleosides and a lower-case p for the phosphoric residue. The three letter symbols should be used whenever chemical changes involving nucleosides or nucleotides are being discussed. The one-letter symbols are intended for the nucleoside residues in sequences or partial sequences only; in these they should always be connected by hyphens (for internal phosphodiester 3'-5' linkages) and the terminal phosphoric residue should be indicated by p. The 2'-deoxyribonucleosides are indicated by the prefix 'd'
|Adenosine||+ Ado||+ A|
|Cytidine||+ Cyd||+ C|
|Dihydrouridine||D or hU|
|Guanosine||+ Guo||+ G|
|Inosine||+ Ino||+ I|
|6-Mercaptopurine ribonucleoside (6-thioinosine)||Sno||M or sI|
|Pseudouridine||+ ψrd||+ ψ or Q (for computer work)|
|'a purine nucleoside'||Puo||R|
|'a pyrimidine nucleoside'||Pyd||Y|
|Ribosylthymine||+ Thd||+ T|
|Thiouridine||Srd||S or sU|
|Thymidine (2'-deoxyribosylthymine)||+ dThd||+ dT|
|Uridine||+ Urd||+ U|
|Xanthosine||+ Xao||+ X|
|Phosphoric residue||-P||p or - (For internal phosphodiester bonds)|
Table 6. Symbols for modified bases, sugars, or phosphoric acid residues in polynucleotides
a) Substituents on bases and internal sugars. These symbols, all in lower-case letters, generally precede the nucleoside letter for base substitution and follow the nucleoside letter for sugar substitution. Locants are given as superscript, multipliers as subscripts
|Arabinose||a (Precedes the nucleoside letter.)|
|Deoxyribose||d (May precede the nucleoside letter|
or the whole chain, as appropriate.)
|Dihydro-||h (not h2)|
|Hydroxy-||ho or oh|
|Lyxose||l (Precedes the nucleoside letter.)|
|Phosphoric residue||p (Precedes the nucleoside letter for 5';|
follows the letter for 3'; > or >p for
2',3'-cyclic phosphoric acid residue;
replaced by hyphen for internal
|Xylose||x (Precedes the nucleoside letter.)|
b) Substituents on terminal sugar hydroxyl groups, and phosphoric acid protecting groups. These symbols, generally placed in parentheses, follow the appropriate nueleoside symbol or adjoin the appropriate symbol for the phosphoric acid residue
|5'-Cyanoethyl-; 3' (or 5')-cyanoethyl-||(CNEt)-; -(CNEt)|
Table 7. Symbols for specific preparations of nucleic acids
These symbols may be used without definition
|Specific transfer RNA species|
|Aminoacylated alanine-accepting tRNA||Ala-tRNAAla|
|Isoacceptor species of alanine-accepting tRNA||tRNA1Ala, tRNA2Ala etc.|
|Formylatable methionine-accepting tRNA||tRNAfMet or tRNAfMet|
|Formylaminoacylated formylatable methionine-accepting tRNA||fMet-tRNAfMet or fMet-tRNAfMet|
Table 8. Miscellaneous symbols
These symbols should be defined
|Phosphoric residue||P- or -P|
|Pteroic acid (pteroyl-)||Pte|
|Nα-Tosylarginine methyl ester||Tos-Arg-OMe|
|N-Tosylphenylalanine chloromethyl ketonec||Tos-PheCH2Cl|
a See ref 3 for the special application of these symbols.
b Folate and folyl- are not abbreviated.
c Correctly (2-phenyl-1-tosylamido)ethyl chloromethyl ketone, or chloro-(N-tosylphenylalanyl)methane.
Table 9. Abbreviations for semisystematic or trivial names
Those abbreviations preceded by a plus sign may be used without definition. The preceding tables list alternative symbols that may be preferred by some journals. Trivial names for peptide hormones have been recommended (ref 1)
|Acetyl-coenzyme A||+ CoASAc|
|Adenosine 5'-mono, di, and triphosphates||+ AMP, ADP, and ATPa|
|Coenzyme A||+ CoA(orCoASH)|
|Corticotropin (adrenocorticotropin, adrenocorticotropic hormone)||ACTH|
|Cytidine 5'-mono-, di-, and triphosphates||+ CMP, CDP, and CTPa|
|Deoxyribonucleic acid, or deoxyribonucleate||+ DNA|
|Diphosphothiamin (thiamin pyrophosphate)||DPT|
|Ethylenediamine tetraacetate||+ EDTA|
|Flavin-adenine dinucleotide||+ FAD|
|Glutathione and its oxidised form||+ GSH, GSSG|
|Guanosine 5'-mono-, di-, and triphosphates||+ GMP, GDP, and GTPa|
|Haemoglobin, carbon monoxide haemoglobin, oxyhaemoglobin||Hb, HbCO, HbO2|
|Inorganic orthophosphate||+ Pi|
|Inorganic pyrophosphate||+ PPi|
|Inosine 5'-mono-, di-, and triphosphates||+ IMP, IDP, and ITPa|
|Melanotropin (melanocyte-stimulating hormone)||MSH|
|Methemoglobin, metmyoglobin||MetHb, MetMb|
|Myoglobin, carbon monoxide myoglobin, oxymyoglobin||Mb, MbCO, MbO2|
|Nicotinamide-adenine dinucleotide and its oxidised and reduced forms||+ NAD, NAD+, and NADH|
|Nicotinamide-adenine dinucleotide phosphate and its oxidised and reduced forms||+ NADP, NADP+, and NADPH|
|Nicotinamide mononucleotide||+ NMN|
|Riboflavin 5'-phosphate||+ FMN|
|Ribonucleic acid or ribonucleate||+ RNA|
|Ribosylthymine 5'-mono-, di-, and triphosphates||+ TMP, TDP, and TTPa|
|Thymidine 5'-mono-, di-, and triphosphates||+ dTMP, dTDP, and dTTPa|
|Uridine 5'-mono-, di-, and triphosphates||+ UMP, UDP, and UTPa|
a The d prefix may be used to represent the corresponding deoxyribonucleoside phosphates, e.g. dADP. The various isomers of adenosine monophosphate may be written 2'-AMP, 3'-AMP, or 5'-AMP (in case of possible ambiguity). A similar procedure may be applied to other nucleoside or deoxyribonucleoside monophosphates.
1 IUPAC-IUB Commission on Biochemical Nomenclature (1975) Nomenclature of Peptide Hormones, Recommendations, 1974, Eur. J. Biochem. 55, 485-486.
2. IUPAC Commission on the Nomenclature of Organic Chemistry and IUPAC-IUB Commission on Biochemical Nomenclature (1975) Nomenclature of a-Amino Acids, Recommendations, 1974, Eur. J. Biochem. 53, 1-14.
3. IUPAC-IUB Commission on Biochemical Nomenclature (1975) Nomenclature of Quinones with lsoprenoid Side-chains, Recommendations, 1973, Eur. J. Biochem. 53, 15-18.