![]() ![]() | FRONTIERS IN BIOSCIENCE; GUIDELINES FOR HUMAN GENE NOMENCLATURE (1997) |
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
![]() ![]() ![]() ![]() |
The following guidelines are prepared by J.A. White, P.J. McAlpine, S. Antonarakis, H. Cann, K.
Frazer, J. Frezal, D. Lancet, J. Nahmias, P. Pearson, J. Peters, A. Scott, H. and the attendees at the nomenclature meeting 5th of March 1997. CONTENTS
1. General Rules for Gene Nomenclature
2. Recommendations for symbol construction
Hierarchical symbols, gene families and series
Homologies with other species
Genes identified from sequence information
Enzymes and proteins
Clinical disorders
Letters reserved for specific usage
3. Allele terminology
4. Printing Gene and Allele Symbols
5. Acknowledgements
1. General Rules for Gene Nomenclature1.1. Requirements for designation by gene symbol
1.2. Gene symbols
1.2.1. Gene symbols are designated by upper case Latin letters or by a
combination of upper-case letters and Arabic numbers. Symbols should be short
in order to be useful, and should not attempt to represent all known
information about a gene. Ideally symbols should be no longer than six
characters in length. Based on classical genetic guidelines, gene symbols are
always either underlined or italicized when referring to genotypic information (phenotypic information is represented in standard fonts). Exceptions to this rule are in catalogs
of known genes, and when fragments or synthesized segments of genes are
referred to. New symbols must not duplicate existing gene symbols (check the
Genome Database, or the HUGO/GDB Nomenclature Committee list of approved gene
symbols). 1.3. Gene names
1.3.1. Gene names should be brief and specific and should convey the character or function of the gene. 1.4. DNA segments
Part I: D for DNA
Part II: 0,1,2,...22,X,Y,XY for the chromosomal assignment, where XY is for
segments homologous on the X and Y chromosomes, and 0 is for unknown
chromosomal assignment.
Part III: A symbol indicating the complexity of the DNA segment detected by the
probe, with S for a unique DNA segment, and Z for repetitive DNA segments found
at a single chromosome site or F for small undefined families of homologous
sequences found on multiple chromosomes. Part IV: 1,2,3,..., a sequential
number to give uniqueness to the above concatenated characters.
Part V:When the DNA segment is known to be an expressed sequence the suffix E
can be added to indicate this fact.
These numbers can now be generated automatically in the Genome Database,
following entry of clone details.
2.1.1. Every attempt should be made to represent information in a hierarchical form to facilitate retrieval of sets of related genes from computerized databases.
2.2.1. Homologous genes in different species (orthologs) should where possible have the same gene nomenclature.
Genes predicted from EST clusters or from genomic sequence alone are regarded
as putative, and are designated by the chromosome of origin and arbitrary
number. Example: C2ORF1
Molecular technology has identified sequences (generally not transcribed) that
bear striking homologies to structural gene sequences. These sequences are
termed pseudogenes. In order to show the relatedness of pseudogenes to
functional genes, pseudogenes will be identified with the gene symbol of the
structural gene followed by a P for pseudogene. In order to reserve P for
pseudogenes, the use of P as the last character of a structural gene symbol
should be avoided where possible. Examples: HBBP1 (hemoglobin, beta pseudogene
1); ACTBP1 (actin, beta pseudogene 1); ACTBP2 (actin, beta pseudogene 2), etc.
Pseudogenes may be on different chromosomes or closely linked to the
functional gene and occur in varying numbers.
Related sequences identified by cross-hybridisation, and or by computer
searching of sequence databases (BLAST, FASTA), where no other functional
information is available for the construction of a symbol, are designated with
the symbol of the known gene followed by an L for like. (see also homology
section 2.3).
2.5.1. Inherited clinical disorders (monogenic Mendelian inheritance).
The first gene symbol allocated to an inherited clinical phenotype may be based
on an acronym which has been established as a name for the disorder, whilst
following the rules described in section 1. Example: ACH for achondroplasia.
However it is usual for this symbol to change when the gene product or function
is identified. In some cases a gene symbol based on product or function will
already exist, and this will take precedence over the symbol derived from the
clinical disorder when the gene descriptions are merged for example in the
case of achondroplasia the symbol changed to FGFR3 and the name to fibroblast
growth factor receptor 3 (achondroplasia, thanatophoric dwarfism)..
2.5.2. Complex/polygenic traits
Genome searches may suggest a contributing locus in a complex trait, which may
for convenience be given a gene symbol, although a proportion of these will
disappear in time. A symbol allocated to such a gene will not be re-used.
2.5.3. Contiguous gene syndromes.
Syndromes clearly associated with multiple loci should not be given gene
symbols. Syndromes associated with a regional deletion or duplication may be
assigned the letters CR (for chromosome region), in place of S for syndrome.
Examples: ANCR (Angelman syndrome chromosome region), DCR (Down syndrome chromosome region). However, as advances in database design have now increased
the possible ways of representing this type of information, we recommend that
such symbols are now classified as syndromic region symbols and not gene
symbols.
2.5.4. Loss of heterozygosity.
A chromosomal region in which the existence of genes may be inferred by loss of
heterozygosity can be designated by a symbol consisting of the letters LOH, the
chromosome number, CR (for chromosomal region) and then an arbitrary number.
2.6.1 Certain letters, or combinations of letters are used as the last letter
in a symbol to represent a specific meaning, these are P for pseudogene (but
note also BP for binding protein), L for like (see 2.1.), R for receptor or
regulator, N or NH for inhibitor. The use of these for other meanings should be
avoided where possible.
Allele terminology is now the responsibility of the Mutation Database
(ref/URL)
Gene and allele symbols are underlined in manuscript and italicized in
print. Italics need not be used in catalogs. It may be convenient in
manuscripts, computer printouts and in printed text to designate a gene symbol
by following it with an asterisk (e.g. PGM1*). When only allele symbols are
displayed they can be preceded by an asterisk. For example, for PGM1*1, the
allele is printed as *1.
Table 1: Species Abbreviations
abbreviation
Table 2: Greek-to-Latin alphabet conversion
Greek
Table 3: Single-letter amino acid symbols
Amino acid
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||