![]() |
![]() |
| Home | Candidate Gene Capture Arthritis |
Candidate Gene Capture Diabetes |
RGNC |
|
Rules for Nomenclature of Genes,
Genetic Markers, Alleles, and Mutations in Mouse and Rat
Revised:
Jan, 2004 International Committee on Standardized Genetic
Nomenclature for Mice
Chairperson: Dr. Janan T. Eppig (e-mail:jte@informatics.jax.org) Rat Genome
and Nomenclature Committee
Chairperson: Dr.
Eberhard Günther (e-mail:
eguenth@gwdg.de) Rules for mouse genetic nomenclature were first published by Dunn, Gruneberg, and Snell (1940) and subsequently revised by the International Committee for Standardized Genetic Nomenclature in Mice (1963, 1973, 1981, 1989, 1993). In 2000 these Guidelines were completely rewritten in order to explain more clearly the existing rules. Some existing gene names and symbols may retain names derived from earlier conventions that no longer apply. Archived Guidelines (since 2000) are available. Rules for rat genetic
nomenclature were first published by the Committee on Rat Nomenclature in 1992
and then by Levan et al. in 1995. In 2003, the
International Committee on Standardized Genetic Nomenclature for Mice and the
Rat Genome and Nomenclature Committee agreed to unify the rules and guidelines
for gene, allele, and mutation nomenclature in mouse and rats. This document
reflects that unification. Nomenclature guidelines are now reviewed and updated
annually by the two International Committees and current guidelines can be
found on the MGD and
RGD or RatMap
web sites. The Guidelines found here include all revisions approved at the Table of Contents 1 Principles of Nomenclature 1.1 Key Features 1.2 Definitions 1.3 Stability of Nomenclature 1.4 Synonyms 2 Names and Symbols of Genes and Loci 2.1 Laboratory Codes 2.2 Identification of New Genes 2.3 Gene and Locus Names and
Symbols 2.3.1 Gene Names 2.3.2 Gene Symbols 2.4 Structural Genes 2.4.1 Genes with Homologs
in Other Species 2.5 Phenotype Names and Symbols 2.5.1 Lethal Phenotypes 2.6 Gene Families 2.6.1 Families Identified by
Hybridization 2.6.2 Families Identified
by Sequence Comparison 2.7 ESTs 2.8 Anonymous DNA Segments 2.8.1 Mapped DNA Segments 2.8.2 STSs Used in
Physical Mapping 2.9 Gene Trap Loci 2.10 Quantitative Trait Loci,
Resistance, and Immune Response Genes 2.10.1 Names and Symbols
of QTL 2.10.2 Defining
uniqueness in QTL 2.11 Chromosomal Regions 2.11.1 Telomeres 2.11.2 Centromeres and
Pericentric Heterochromatin 2.11.3 Nucleolus
Organizers 2.11.4 Homogeneously
Staining Region 2.11.5 Chromosomal
Rearrangements 2.12 Genes Residing on the
Mitochondria 3 Names and Symbols for Variant and
Mutant Alleles 3.1 Mutant Phenotypes 3.1.1 Genes Known Only by
Mutant Phenotypes 3.1.2 Phenotypes Due to
Mutations in Structural Genes 3.1.3 Wild Type Alleles
and Revertants 3.2 Variants 3.2.1 Biochemical Variants 3.2.2 DNA Segment Variants 3.2.3 Single Nucleotide
Polymorphisms (SNPs) 3.3 Variation in Quantitative
Trait Loci and in Response and Resistance Genes 3.4 Insertional and Induced
Mutations 3.4.1 Mutations of
Structural Genes 3.4.2 Transgenic
Insertional Mutations
3.5 Targeted Mutations 4 Transgenes 5 Definitions 5.1 Gene 5.2 Pseudogene 5.3 Locus 5.4 Marker 5.5 Allele 5.6. Allelic Variant 5.7 Splice Variant or Alternative Splice 5.8 Mutation 5.9 Dominant and Recessive 5.10 Genotype 5.11 Phenotype 5.12 Quantitative Trait Loci
(QTLs) 5.13 Haplotype 5.14 Homolog 5.15 Ortholog 5.16 Paralog 6 References 1 Principles of
Nomenclature
1.1 Key Features The key component of nomenclature is the gene or locus name and symbol, which identifies a unit of inheritance. Other features, such as alleles, variants and mutations, are secondary to the gene name and become associated with it. Similarly, probes or assays used to detect a gene are not primary features and should not normally be used as names. The
primary purpose of a gene or
locus name and symbol is to be a unique identifier so
that information about the gene in publications, databases and other forms of
communication can be unambiguously associated with the correct gene. These
guidelines, therefore, are intended to aid the scientific community as a whole
to use genetic information. Other,
secondary, functions of nomenclature
for genes are to:
1.2 Definitions It
is important that the user understands what is being named and the principles
underlying these guidelines. Section 5 presents definitions that will aid the user in distinguishing,
for example, genes, loci, markers, and alleles. 1.3 Stability of Nomenclature On
the whole gene names should be stable; that is, they should not be changed over
time. However there are certain circumstances where a change is desirable:
1.4 Synonyms A gene can have several synonyms, which are names or symbols that have been applied to the gene at various times. These synonyms may be associated with the gene in databases and publications, but the established gene name and symbol should always be used as the primary identifier. 2 Names and
Symbols of Genes and Loci
The
prime function of a gene name is to provide a unique identifier. The
Mouse Genome Database (MGD) serves as a central repository of gene names and
symbols to avoid use of the same name for different genes or use of multiple
names for the same gene (http://www.informatics.jax.org). The MGD Nomenclature
Committee (nomen@informatics.jax.org) provides advice and assistance in assigning new names and
symbols. A web tool for proposing a new
locus symbol is located at the MGD site. For the rat, these
functions are carried out by RatMap (http://ratmap.gen.gu.se)
and RGD (http://rgd.mcw.edu) assisted by
the International Rat Genome and Nomenclature Committee (RGNC). A web tool for
proposing a new locus symbol is located at the RatMap and RGD sites. 2.1 Laboratory Codes A
key feature of mouse and rat
nomenclature is the Laboratory Registration Code or
Laboratory code, which is a code of usually three to four letters
(first letter upper case, followed by all lower case), that identifies a particular institute,
laboratory, or investigator that produced, and may hold stocks of, for example,
a DNA marker, a mouse or rat
strain ,or a mutation. Laboratory codes are also used in naming
chromosomal aberrations and transgenes. Laboratory codes can be assigned by MGD
or by the Institute for Laboratory Animal Research (ILAR) at http://dels.nas.edu/ilar/codes.asp?id=codes. Examples: J
The Jackson
Laboratory Mit
Massachusetts Institute
of Technology Leh Hans
Lehrach Kyo Ztm 2.2 Identification of New Genes Identification
of new genes in general comes in two ways; identification of a novel protein or
DNA sequence or identification of a novel phenotype or trait. In the case of
sequences, care should be taken in interpretation of database searches to
establish novelty (for example, to distinguish between a new member of a gene
family and an allele or alternative transcript of an existing family member).
Novel mutant phenotypes or traits should be named according to their primary
characteristic, but once the gene responsible for the phenotypic variation is
identified, this gives the primary name of the gene and the mutant name becomes
the name of the allele (see Section 2.3). 2.3 Gene Names and Symbols 2.3.1 Gene Names Names
of genes should be brief, and convey accurate information about the gene. The
name should not convey detailed information about the gene or assay used; this
can be associated with the gene in publications or databases. While the gene
name should ideally be informative as to the function or nature of the gene,
care should be taken to avoid putting inaccurate information in the name. For
example, a "liver-specific protein" may be shown by subsequent
studies to be expressed elsewhere. A gene name should: •
be specific and brief, conveying the character or function of the gene. • begin with a lower case letter, unless it
is a person’s name or is a typically capitalized
word. Example: Blr1 Burkitt lymphoma receptor 1 Acly ATP
citrate lyase • use American
spelling. •
not contain punctuation, except where necessary
to separate the main part of the name from
modifiers. Example: Acp1 acid
phosphatase 1, soluble Pigq phosphatidylinositol
glycan, class Q · include the name of the species from which the ortholog/homolog name was derived at the end of the name in parentheses only when that name is not in common usage. Example: Shh sonic
hedgehog [commonly used,
does not include species name] Fjx1 four
jointed [name includes
species derivative] · not include the word mouse (for a mouse gene name) or the word rat (for a rat gene name). · follow the conventions of the established gene family if it is a recognizable member of that family by sequence comparison, structure (motifs/domains), and/or function. · not contain potentially misleading information that may be experiment or assay specific, such as “kidney-specific” or “59 kDa”. 2.3.2. Gene Symbols Genes are given short
symbols as convenient abbreviations for speaking and writing about the genes. A gene symbol should: ·
be unique within the species and should not match a symbol in another
species that is not a homolog. ·
be short, normally 3-5 characters, and not more than 10 characters. ·
use only Roman letters and Arabic numbers. ·
begin with an upper case letter (not a number), followed by all lower case
letters / numbers (see exception below). ·
not include tissue specificity or molecular weight designations. ·
include punctuation only in specific special cases (see below). ·
ideally have the same initial letter as the initial letter of its
gene name to aid in indexing. However, letter order in a gene symbol need not
follow word order in the name. Example:Plaur urokinase plasminogen activator receptor Sta autosomal striping ·
be italicized in published articles. Because they may be difficult to
read, depending on the browser, gene symbols are frequently not italicized when
posted to a web page. ·
use a common stem or root symbol when belonging to a gene family. Family
member numbers or subunit designations should be placed at the end of the gene
symbol. Example:
Glra1 glycine
receptor, alpha 1 subunit Glra2 glycine
receptor, alpha 2 subunit Glra3 glycine
receptor, alpha 3 subunit ·
use the same symbol whenever possible for orthologs among human, mouse and
rat. Exceptions to
the rule of upper case first letter and lower case remaining letters in a gene
or locus symbol:
Use of hyphens within the symbol
should be kept to a minimum. Situations where hyphens may be used include:
Example:Hk1-rs1 hexose kinase-related sequence 1 Hba-ps3 hemoglobin alpha pseudogene 3
Example: Ki W-v kit oncogene allele name: viable dominant
spotting 2.4 Structural Genes, Splice Variants, and Promoters Ultimately, the majority of gene names will be for structural
genes that encode protein. The
gene should as far as possible be given the same name as the protein, whenever
the protein is identified. If the gene is recognizable by sequence comparison
as a member of an established gene family, it should
be named accordingly (see Section 2.6). Alternative transcripts that
originate from the same gene are not normally given different gene symbols and names. However, alternative
transcripts (splice variants) from a single locus can be differentiated by the
addition of an underscore v and a serial number (_v#) to the symbol and the
addition of a comma “variant” and serial number to the name. Example: Gene Slc14a2 solute
carrier family 14, member 2 Splice
variants Slc14a2_v1 solute
carrier family 14, member 2, variant 1 Slc14a2_v2
solute carrier family 14, member 2,
variant 2 Slc14a2_v3
solute carrier family 14, member 2,
variant 3 Slc14a2_v4
solute carrier family 14, member 2,
variant 4 Transcripts from the opposite strand that overlap another gene, or a transcript that is derived principally from the introns of another gene, or one that uses an alternative reading frame to another gene (and does not use the existing frame to a significant extent) should be given a different name. In
addition to alternative transcripts based on splice variants, genes may use
multiple promoters. The variant promoters can be designated in a similar way to
splice variants, adding an underscore pr and a serial number (_pr#) to the
symbol and the addition of a comma “promoter” and serial number to the name. Example: Gene Slc14a2 solute
carrier family 14, member 2 Promoter
variants Slc14a2_pr1 solute carrier
family 14, member 2, promoter 1 Slc14a2_pr2 solute carrier
family 14, member 2, promoter 2 It
should be noted that for both splice variants and promoters, that serial
numbers are assigned as these elements are discovered. The numbering sequence
does not correspond to relative location in the genome and the numbering
series are independent among species. For these elements, no attempt is made to
develop cross-species equivalency. 2.4.1 Genes with Homologs in Other
Species To
aid interspecific comparison of genetic and other information, genes that is
identifiable as a homolog of an already named gene in another species can be
named as "-like"
"-homolog" or "-related ". (Note: this is not the same as
"related sequence" which applies to related sequences within mouse or within rat.) The gene name or
symbol should not include the name mouse or the abbreviation "M" for
mouse or the name rat or the abbreviation “R” for rat. Where possible, genes
that are recognizable orthologs of already-named human genes should be given
the same name and symbol as the human gene. 2.5 Phenotype Names and Symbols Genes
named for phenotypes should aim to convey the phenotype briefly and accurately
in a few words. It is accepted that the name may not cover all aspects of the
phenotype; what is needed is a succinct, memorable and, most importantly,
unique, name. Bear in mind that identification of a variant
or mutant phenotype is recognition of an allelic form of an as-yet unidentified
gene that may already have or will be given a name. 2.5.1 Lethal Phenotypes Genes
identified solely by a recessive lethal phenotype with no heterozygous effect
are named for the chromosomal assignment, a serial number and the name of the
laboratory of origin (from the Laboratory code). Examples: l5H1 First lethal on Chromosome 5 at
Harwell l4Rn2 Second lethal on Chromosome 4 from laboratory of Gene Rinchik 2.6 Gene Families Genes
that appear to be members of a family should be named as family members.
Evidence of gene families comes in a variety of forms, e.g., from a probe
detecting multiple bands on a Southern blot, but is principally based on sequence
comparisons. 2.6.1 Families Identified by
Hybridization Historically,
many gene families have been identified as fragments detected by hybridization
to the same probe but which map to different loci. These family members may be
functional genes or pseudogenes. The loci can be named "related
sequence" of the founder gene with a serial number (symbol -rs1, -rs2, and so on). Example: mouse ornithine
decarboxylase-related sequences 1 to 21. Odc-rs1
to Odc-rs21 If
the founder or functional gene can not be identified, initially all the
fragments are named "related sequence" until it is identified; then
that particular "-rs" is dropped, without renumbering. If there is
evidence that any loci are pseudogenes, they should be named as such and given
serial numbers as in Section 2.6.2. Once
sequence evidence is accumulated on functional family members (which may or may
not have been previously identified as members) a systematic naming scheme
should be applied to the family as in Section 2.6.2. 2.6.2 Families Identified by
Sequence Comparison Sequencing
can identify genes that are clearly members of a family (paralogs). Where
possible, members of the family should be named and symbolized using the same
stem followed by a serial number. The same family members in different
mammalian species (orthologs) should,
wherever possible, be given the same name and symbol.
Pseudogenes should be suffixed by -ps
and a serial number if there are multiple pseudogenes. Note that the numbering
of pseudogenes among species is independent and no relationship should be
implied among mouse, rat, or human pseudogenes based on their serial numbering. Examples: In
mouse, phosphoglycerate kinase 1, pseudogenes 1 to 7, Pgk1-ps1 to Pgk1-ps7 In
rat, calmodulin pseudogene 1, Calm-ps1 Numerous gene families have been recognized and given systematic nomenclature. Information on these families can be found at family-specific web sites, some of which are linked from MGD and RGD or RatMap. Names and symbols of new members of these families should follow the rules of the particular family and ideally be assigned in consultation with the curator of that family. Nomenclature schemes and curation of new families benefit from examination of existing models. 2.7
ESTs Expressed
Sequence Tags (ESTs) differ from other expressed sequences in that they are
short, single pass sequences that are often convenient for PCR amplification
from genomic DNA. ESTs that clearly derive from a known gene should be
considered simply as an assay (marker) for that known gene. When anonymous ESTs
are mapped onto genetic or physical maps, their designations should be
symbolized using their sequence database accession number. 2.8 Anonymous DNA Segments Only
anonymous DNA segments that are mapped should be given systematic names and
symbols. 2.8.1 Mapped DNA Segments Anonymous
DNA segments are named and symbolized according to the laboratory identifying
or mapping the segment as "DNA segment, chromosome N, Lab Name" and a
serial number, where N is the chromosomal
assignment (1-19, X, Y in the mouse and 1-20, X, Y in the rat) and is
symbolized as DNLabcode# . Examples: D8Mit17 the 17th locus
mapped to mouse Chromosome 8 by M.I.T. D1Arb27 the 27th locus mapped to rat Chromosome
1 at the Arthritis and Rheumatism Branch, NIAMS. The
same convention is applied to DNA segments that are variant loci within known
genes. Examples: D4Mit17 an
SSLP within the mouse Orm1 gene D20Wox37 an
SSLP within the rat Tnf gene Mouse or rat DNA
segments that are detected by cross-hybridization to human segments are given
the human name with "chromosome N, cross-hybridizing to human DNA
segment" inserted between DNA segment and the human segment code (see
symbols). The same applies for rat DNA segments detected by cross-hybridization
to mouse segments (or vice versa). Example: D16H21S56 Mouse
DNA segment on Chr 16 that cross-hybridizes with a DNA
segment D21S56 from human Chr 21. D1M7Mit236 Rat
DNA segment on Chr 1 that cross-hybridizes with a DNA
segment D7Mit236 from mouse Chr 7 2.8.2 STSs Used in Physical Mapping When
physical maps are assembled (YAC or BAC contigs, for example) many markers may
be placed on the map in the form of Sequence Tagged Sites (STSs). These might
be clone end-fragments, inter-repeat sequence PCR products, or random sequences
from within clones. These markers serve to validate the contigs and appear on
the maps, but their further utility may be limited. It is not necessary to give
them names or symbols other than those assigned by the laboratory that produced
and used them. If the STSs are used more widely, they should be assigned
anonymous DNA segment names ("D-numbers"). 2.9 Gene Trap Loci Gene
trap experiments in embryonic stem (ES) cells produce cell lines in which
integration into a putative gene is selected by virtue of its expression in ES
cells. The trapped gene is usually (though not necessarily) mutated by the
integration. The site of integration can be characterized by a number of means,
including cloning or extension of cDNA products. The loci of integration of a
series of gene trap lines, once characterized as potentially unique, can be
named and symbolized as members of a series, using the prefix Gt (for gene
trap), followed by a vector designation in parentheses, a serial number assigned
by the laboratory characterizing the locus, and the laboratory ILAR code. For
example, the 26th gene "trapped" by the
A
gene trap designation becomes an allele of the gene into which it was inserted,
once that gene is identified. For example, Gt(pGT1.8TM)629Ska is now known to disrupt the netrin 1 (Ntn1) gene;
thus the full allele designation for this gene trap mutation is NtnGt(pGT1.8TM)629Ska and its
abbreviated form is NtnGt629Ska.
(Note: Abbreviate gene trap alleles by dropping the vector designation in
parentheses, if the resulting abbreviation is unique). 2.10 Quantitative Trait Loci, Resistance Genes, and Immune
Response Genes Differences between inbred strains and the phenotype of
offspring of crosses between strains provide evidence for the existence of
genes affecting disease resistance, immune response, and many other
quantitative traits (quantitative trait loci, QTL).
Evidence for QTL is generally obtained through extensive genetic crossing and
analysis that may uncover many genetic elements contributing to a phenotypic
trait. Generally, the number and effects of QTL can only be deduced following
experiments to map them. QTL should not be named until such mapping experiments
have been performed. 2.10.1 Names and Symbols of QTL Names and symbols for QTL
should be brief and descriptive and reflect the trait or phenotype measured.
Those QTL affecting the same trait should be given the same stem and serially
numbered. The series is separate for mouse and rat and no homology should be
implied by the serial numbers. Some historically named QTL
carry the name of the disease with which they are associated; these names are
maintained; but newly identified QTL should be named for the measured trait and
not a disease. The suffix "q" may be used optionally as the final
letter preceding the serial number in QTL symbols. Naming and symbolizing QTL
follow the same conventions as for naming and symbolizing genes (Section 2.3).
Specifically for a QTL, its name should include: • a root name describing the
measured trait • the designation QTL
(recommended) • a serial number Examples: in mouse Cafq1 caffeine metabolism QTL 1 Cafq2 caffeine metabolism QTL 2 Cafq3 caffeine
metabolism QTL
3 in rat Kw1 kidney
weight QTL 1 Kw2 kidney
weight QTL 2 Kw3 kidney
weight QTL 3 To obtain the next
available serial number for a new QTL with an already established root name, e.g., the next in the series of “liver
weight QTL” in mouse (Lwq#) or the
next in series of “blood pressure QTL” for rat (Bp#), users should submit their QTL on the “proposing a new locus
symbol” form at MGD (for mouse) or RGD or RatMap (for rat). Note that examining the database content
for a QTL is not sufficient, as a laboratory may have a QTL designation
reserved and private, pending publication. 2.10.2 Defining uniqueness in QTL Specific circumstances for
naming independent QTL include: ·
Independent
experiments study the same trait and map that trait to the same chromosomal
region. Because QTL are detected in the context of specific
strain combinations in specific crosses and generally in different laboratories
using different assays, each experimentally detecting QTL will be given a
unique symbol/name even when the trait measured and region defined is
superficially the same as that of an existing QTL. Example: In mouse, Obq1 (obesity QTL 1) was identified and
mapped to Chromosome 7 in a cross between strains 129/Sv and EL/Suz. Another
obesity QTL was also mapped to Chromosome 7, but because it involved distinct
strains (NZO and SM), it was given a different QTL
designation, Obq15. ·
A chromosomal
region containing many measured “traits”. If
multiple traits are measured in a single experiment and mapped to a single
chromosomal region, there may or may not be evidence that different QTL are
involved. If the traits are physiologically related, the QTL name should be
broad enough to represent all the measured traits or the name should reflect
the trait showing the highest LOD score/p-value. Conversely, if there is clear
evidence that the traits are independent, each trait
will constitute a unique QTL. Example: In mouse, Nidd1 (non-insulin-dependent diabetes mellitus 1) was associated with
related measurements of plasma insulin, non-fasted blood glucose, and body
weight and given a single QTL designation. In
rats, Uae5 (urinary albumin excretion QTL 5) and Hw1 (heart weight QTL 1) are QTLs derived from the same experiment that
map to overlapping regions of Chromosome 1. Because the measured traits are
independent, different QTL designations are assigned. 2.11 Chromosomal Regions Detailed
guidelines for nomenclature of chromosomes are given in the next section.
However, certain cytological features of normal chromosomes (such as telomeres,
centromeres, and nucleolar organizers) and abnormal chromosomes (such as
homogeneously-staining regions and end-points of deletions, inversions, and
translocations) are genetic loci that are given names and symbols. 2.11.1 Telomeres The functional telomere should be denoted by the symbol Tel. A DNA segment that includes the telomere repeat sequence (TTAGGG)n and which maps to a telomeric location is symbolized in four parts:
For
example, Tel14q1 2.11.2 Centromeres and
Pericentric Heterochromatin The
functional centromere should be denoted by the symbol Cen. Until the molecular nature of a functional mammalian
centromere is defined, DNA segments that map to the centromere should be given
anonymous DNA segment symbols as in Section 2.8.1. Pericentric
heterochromatin, that is cytologically visible, is given the
symbol Hc#, in which # is the chromosome on which it is located.
Variation
in heterochromatin band size can be denoted by superscripts to the symbol.
2.11.3 Nucleolus Organizers The nucleolus organizer is a cytological structure that contains the ribosomal RNA genes. These genes are given the symbols Rnr and the number of the chromosome on which they are located.
If
different Rnr loci can be genetically identified on the same chromosome, they
are given serial numbers in order of identification.
2.11.4 Homogeneously Staining
Regions Homogeneously
staining regions (HSRs) are amplified internal subchromosomal bands that are
identified cytologically by their Giemsa staining. A DNA segment that maps
within an HSR is given a conventional DNA segment symbol, when its locus is on
a normal (unamplified) chromosome. When expanded into an HSR its symbol follows
the guidelines for insertions, thus becoming Is(HSR;1)1Lub. 2.11.5 Chromosomal
Rearrangements Symbols for chromosomal deletions, inversions, and translocations are given in the chromosomal nomenclature section. The end points of each of these rearrangements, however, define a locus. Where there is only a single locus on a chromosome, the chromosome anomaly symbol serves to define it. However, where an anomaly gives two loci on a single chromosome they can be distinguished by the letters p and d for proximal and distal. | |||||||||||||||||||||||||||||||||||||||