Sequence Analysis Words
Definitions
primary sequence
the linkage order of nucleotides (5'63').  It is VERY BAD FORM to write RNA sequences with T's instead of U's !!!  DNAs or cDNAs have T's. RNAs have U's.  (e.g.: 5'-ACAAGCUUCUACAUG-3')
reverse of sequence
numbering of nucleotides is reversed relative to normal.  This is equivalent of counting 3'65', rather than 5'63'.  This trick is sometimes useful when making DOTPLOT comparisons. (e.g. reverse of above sequence is 5'-GUACAUCUUCGAACA-3')
complement of sequence
complement of each base is presented, but ORDER IS NOT INVERTED! Again, this technique is sometimes useful when plotting sequences on an axis during DOTPLOT comparisons.  (e.g. complement of above sequence is: 5'-UGUUCGAAGAUGUAC-3')
reverse-complement of sequence
THIS is the real minus strand sequence!!  Not only have the bases been converted to their pairing complements, but the order has been inverted so that the numbering is again, 5'63'. (e.g. the reverse-complement of above sequence is: 5'-CAUGUAGAAGCUUGU-3')
positive-strand
that strand of RNA that encodes the biggest, or most important ORFs.  With ssRNA viruses, this is not necessarily the strand that is packaged. (Also called the plus strand)
negative-strand
RNA strand that is the reverse-complement of a positive strand. (Also called the minus strand)   NOTE: in DNA, the minus strand encodes mRNA.
inverted repeat
two tandem, identical sequences within a single linear segment of RNA (or DNA), but where the second segment is the reverse-complement of the first. (e.g. '5-AAAUUCGCNNNNGCGAAUUU-3')
palindrome
inverted repeat within DNA or RNA duplex.  Effectively, the sequence can be read the same in either the positive or the negative strand. If intra-strand pairing does occur, a cruciform structure results from the apposed hairpins, at the junction of the 4 duplex regions
hairpin
antiparallel duplex structure that forms by pairing of inverted repeat sequences within a single-stranded RNA (or DNA).  The helical section(s) is called the stem and the unpaired base segment(s) at the end the structure is called the loop
bulge loop
extra bases within a stem that do not have any possible pairing partners. (Causes "kinks" in the helix)
interior loop
bases within a stem for whom pairing partners exist, but the partners are NOT of complementary sequence. (Causes distortion of the helix)
domain
one or more in a contiguous series of linked stems and loops which may possibly act in concert as a topological structure, or which collectively may contribute to functionality of a sequence region
dangling end
that portion of a sequence that is left-over after the energetically most-favorable segments have been identified and paired in computer-aided folding algorithms.  Certain algorithms may get "creative" with these segments, and predict potential pairings that have absolutely no biological relevance (Read: these segments can cause folding artifacts)
pseudoknot
a triple-stranded RNA structure formed when the loop at the top of a hairpin has sequences complementary to an unpaired segment near (or at) the base of the stem. Stringent topological rules determine the probability of pseudoknots formation
analogy(ous)
proteins or genes that perform or a share common function, but which arose from different developmental pathways (i.e. similarity NOT due to ancestry)
homology(ous)
proteins or genes that are descended from a common ancestor, and therefore share a significant degree of identity or similarity
identity
the degree (or percent) to which aligned protein or nucleotide sequences match each other exactly
similarity
the degree (or percent) that aligned proteins or nucleic acids are alike in some manner, even if they do not share exact sequences
scoring table
a data file used by sequence and comparison algorithms to "look up" whether any particular tested match, indeed has some significance
indel
any insertion or deletion in a sequence
gap
a spacer character (...) inserted into a sequence to indicate where an insertion may occur in an aligned sequence
gap length
the number of spacer characters inserted in a gap
gap weight
the mathematical penalty assessed by an algorithm for putting a gap into a sequence, when aligning that sequence to another
ORF
Open Reading Frame- refers to a sequence segment that actually encodes a (real) protein
URF
Unidentified Reading Frame- any ORF for which it is not yet established that it really encodes a protein.  Statistically, a sequence might contain many (overlapping?) URFs, but few if any of these usually turn out to be genuine ORFs

 

Phylogenetic Reconstruction Words
Definitions
additive tree a phylogenetic tree in which the distance between any 2 terminal nodes is equal to the sum of the branch lengths connecting them
analogy similarity by convergent evolution, but not by common evolutionary ancestry
bifurcation (dichotomy) the graphical representation in a phylogenetic tree of an evolutionary speciation event whereby an ancestral taxon splits into two
branch the graphical representation of an evolutionary relationship in a phylogenetic tree
character the position of a residue (aa or nucleotide) in an aligned sequence
character state the value that a character takes, that distinguishes it from all other characters
clade (1) According to the rigorous definition, a taxon consisting of a single species and all its descendants representing a monophyletic branch on an evolutionary tree. (2) In looser usage, as above, except that some descendants are not represented. (3) In reference to extant organisms, a subgroup of organisms from among a larger group under consideration, sharing a common ancestor not shared by the other organisms in the group
cladogram a graphic representation that portrays or attempts to portray the evolutionary relationships among a number of populations, species or higher taxa
degree of divergence the extent to which two homologous sequences differ from each other
distance matrix a matrix of genetic distances between taxa in a group under study
external nodes the graphic representation of extant taxonomic units (OTUs)
gene tree a phylogenetic tree that has been constructed from one or a few genes from each species
genetic distance (distance) broadly, any of several measures of the degree of genetic difference between individuals, populations, or species. In reference to molecular evolution, a measure of the number of nucleotide substitutions that have accumulated since divergence between the sequences
homology similarity by common ancestry or genetic relatedness
inferred tree a phylogenetic tree based on empirical data pertaining to extant taxa
informative site (diagnostic site) a site that is used to choose the most-parsimonious tree from among all possible phylogenetic trees. In molecular evolution, a site where there are at least two different kinds of nucleotides or amino acids, and each of them is represented in at least two sequences
internal node the graphical representation of an ancestral organism or gene in a phylogenetic tree
maximum parsimony (parsimony) the selection of the phylogenetic tree requiring the least number of substitutions from among all possible phylogenetic trees as the most likely to be the true phylogenetic tree
molecular clock (1) The rate at which mutations accumulate in a given genomic segment. (2) The hypothesis that in any given gene or DNA sequence, mutations accumulate at an approximately constant rate in all evolutionary lineages as long as the gene or the DNA sequence retains its original function. The extent to which the clock applies to all genes and all organisms is controversial
monophyletic sharing a common ancestor
multifurcation a graphic representation of an unknown branching order in a phylogenetic tree involving three or more taxa. Rarely, a graphic representation of a speciation event resulting in the simultaneous production of more than two species
multigene family a set of genes derived by duplication of an ancestral gene that display more than 50% similarity among them, frequently in close linkage with each other, and possessing similar or overlapping functions
neutral theory the proposal that evolution at the molecular level is primarily driven by mutational input and random genetic drift rather than by natural selection
node the graphical representation in a phylogenetic tree of an extant or ancestral operational taxonomic unit
operational taxonomic unit (OTU) any of the extant taxonomic units under study
orthology sequence similarity as a consequence of a speciation event
outgroup a species or set of species that is the least related to the others in a group of species. The taxon that diverged from a group of taxa before the others diverged from each other
parallel substitutions the independent occurrence of the same mutation at the same nucleotide site in two or more lineages
paralogy sequence similarity between the descendants of a duplicated ancestral gene
phenetics the study of relationships among a group of organisms on the basis of the degree of similarity between them, be that molecular, phenotypic, or anatomical
phenogram a tree-like diagram representing phenetic relationships. This is not necessarily identical to a cladogram, unless there is a direct linear relationship between the time of divergence and the degree of genetic (or morphological) divergence
phylogenetic tree a graphic representation of the phylogeny of a group of taxa or genes
phylogenetics the reconstruction of the evolutionary history of a group of taxa or genes
phylogeny the evolutionary history of a group of taxa or genes and their ancestry
polyphyletic descended from different ancestors
rate of gene substitution the number of gene substitutions per locus per unit time
root in rooted trees, the common ancestor of all taxa under study
rooted tree a phylogenetic tree that specifies ancestral and descendant species, thus indicating the direction of the evolutionary path
sequence divergence the differences between two homologous sequences due to independent accumulation of genetic changes in each lineage
sibling species species that are indistinguishable morphologically but are reproductively isolated
sister taxa (neighboring taxa) in general use, the pair of species among a group of species under study that are evolutionarially the closest to each other. In a phylogenetic tree, two taxa connected through a single internal node
speciation (cladogenesis) the splitting of one population into two or more populations that are reproductively isolated. The process by which new species arise
species a basic taxonomic category for which there are various definitions, among them: (1) a group on actually interbreeding individuals that is reproductively isolated from other such groups (biological species concept); (2) a lineage evolving separately from others (evolutionary species concept); (3) a group of organisms resembling each other more than they resemble any other organism outside the group (taxonomic species concept)
species tree a phylogenetic tree that represents the evolutionary relationships of a group of species
stochastic process a process, the outcome of which cannot be predicted exactly from the knowledge of initial conditions. However, given the initial conditions, each of the possible outcomes of the process can be assigned a certain probability
superfamily a collection of genes, all products of gene duplication that have diverged from each other to a considerable extent (i.e. less than 50% similar at the amino acid level, for a group of proteins)
taxon (taxa = plural) a taxonomic group of any rank (e.g. species, genus, kingdom, etc.) to which individual organisms are assigned
topology the branching pattern of a phylogenetic tree
tree a completely connected acyclic graph(in math terminology)... in phylogeny, this term generically refers to any bifurcating graphic representation of phylogenetic data
true tree a phylogenetic tree that represents the true evolutionary history of a group of taxa
unrooted tree an evolutionary tree that specifies neither the root nor direction of the evolutionary path
UPGMA Unweighted pair group method with arithmetic mean

 

More Good Terms

Definitions

adaptation character that has been modified and is or was maintained as a result of selection for increased fitness
adaptive landscape Sewell Wright model describing a topography in which high relative fitness corresponds to peaks and low fitness to valleys: each position occupied by a population bearing a unique genotype
allele   variant forms of the same gene that are found in different members (or different genomes) of a species
allopatric   species or populations whose geographic distributions do not contact each other
anagenesis change (evolution) of a new species that takes place progressively over time within a single lineage (opposite of cladogenesis)
apomorphy character derived from, yet different from the ancestral condition (also, synapomorphy)
APT Accepted Point Mutation an exchange of one amino acid for another that is accepted by natural selection
balanced load decrease in overall genetic fitness of a population caused by defective genotypes (e.g.: deleterious recessives) whose alleles persist because they confer a selective advantage in certain genotypic combinations (e.g. as heterozygotes)
bottleneck effect genetic drift that occurs when a population is reduced in size, then later expands in numbers.  Gene frequencies before and after bottleneck are typically quite different
cladistics mode of classification based on grouping taxa by virtue of their shared position of similar characters that differ from the ancestral condition
cladogenesis branching evolution involving the splitting and divergence of a lineage into two or more lineages
coevolution changes in one or more species as a response to changes in other species in the same community
concerted evolution process by which a series of nucleotide sequences or different members of a gene family remain similar or identical through time
conformations alternate, nonsuperimposable, three-dimensional arrangements of atoms within a linear polymer, that come about by rotation around single covalent bonds
convergence

evolution of similar (analogous) characters in genetically unrelated species, mostly because they have been subjected to similar selective pressures (also, homoplasy)

deme local population of a species (interbreeding group)
directional selection selective pressure that causes the phenotype of a character to shift towards one of its phenotypic extremes
ecotype phenotypic and genotypic variant of a species associated with a particular environmental habitat
epistasis interactions between tow or more genetic loci which produce phenotypes different from those expected if each locus were considered individually
evolution genetic changes in populations, through time, that lead to observable differences among them
fitness relative reproductive success
fixation achievement of a frequency of 100% (i.e. monomorphism) by an allele or genotype which began in a population at a lesser frequency (i.e. polymorphism)
founder effect random evolutionary process by which an individual's variant alleles are fixed into a new population, even if such alleles have no particular physiological significance, or even if they are slightly deleterious
frozen accident concept that an accidental event in the distant past was responsible for the presence of a universal feature in living organisms (i.e. survivors of a bottleneck, e.g. the genetic standard code)
gene a unit of genetic material providing a specific function to an organism.  It is found at a locus, and each variant is called an allele
gene frequency proportion of a particular allele among all alleles at a gene locus
gene pool sum of all genes present in a population during a given generation or period
genetic drift random process by which more than one neutral allele is maintained simultaneously in a population
Hardy-Weinberg principle conservation of allelic frequencies in large populations under conditions of random mating and in the absence of selective pressures (i.e. selection, migration, genetic drift) which might act to change gene frequencies
heterozygotes polyploidy individuals (e.g. with more than one genomic copy) which have 2 or more distinctive alleles at a given genomic location
heritability degree to which variations in phenotype are caused by genetic differences
heterosis hybrid vigor
homeotic mutations regulatory mutations that cause development of tissue in an inappropriate location (e.g. bithorax in Drosophila)
homozygotes polyploidy individuals (e.g.: with more than one genomic copy) which have identical alleles at a given genomic location
inclusive fitness measurement of fitness based not only on individual advantages, but also on the effect of related individuals (kin) that also possess it
K-selection selection, within multiple allele populations, for improved competitiveness, rather than for rapid numerical increase
macroevolution evolution of taxa higher than the species level, and commonly involving major morphological changes (also: punctuated equilibrium)
macromutation concept that there are single mutational events whose effects are large enough to produce an instantaneous new species or perhaps even the beginnings of new higher taxonomic categories
microevolution small changes that are usually responsible for differences between populations of a species.  Accumulation of such changes may be sufficient to explain the origin of most or all taxa
mutation any change in the nucleotide sequence of a gene
mutational load that portion of the genetic load caused by production of deleterious genes through recurrent mutation
natural selection differential reproduction or survival of replicating organisms caused by environmental agencies
neutral mutation sequences that differ from the "normal" but are functionally indistinguishable
nonsense mutation conversion of a coding triplet to a chain-termination triplet
orthogenesis concept that evolution of a group of related species proceeds in a particular direction (e.g. increase in size) because of forces other than selection
orthologous genes gene loci in different species that are sufficiently similar in sequence to suggest that they are homologs
PAM "Percent Accepted Mutations," a measurable unit of accepted point mutations, usually defined as # of changes per 100 amino acids of protein length
parallel evolution the process of convergence
paralogous genes gene loci in the same organism that are sufficiently similar in sequence to suggest that they are homologues (perhaps through gene duplication)
parsimony method choice of a phylogenic tree that minimizes the number of evolutionary changes necessary to explain divergence
peptide a small number of amino acids linked together with a defined sequence
phenotype the observed structural and functional properties of an organism, whether genetically or environmentally determined
phyletic evolution changes within a single, nonbranching lineage over time (also: anagenesis)
phylogenic evolution branching of a single ancestral line into two or more lineages (also: cladogenesis)
phylogeny the evolutionary history of a species or group of species in terms of their derivations and relationships
pleiotropy instances when a single gene produces phenotypic effects on more than one character
plesiomorphy instances when a species character is similar to that character in an ancestral species
polypeptide longer chains of amino acids linked together, but with either the exact length or the sequence not defined
polyphyletic presumed derivation of a single taxonomic group from two or more different ancestral lines through convergent or parallel evolution
primary structure the specific order of residues in a biopolymer
protein polypeptide chains with specific length, sequence and folded conformation
punctuated equilibrium view that evolution of a lineage follows a pattern of long intervals of stasis punctuated by short bursts of speciation and macroevolution, during which new taxa arise
quantum evolution rapid increase in rate of mutation fixation over a relatively short time
random drift changes in population allele frequencies due to sampling errors or perhaps bottlenecks
recombination processes by which chromosomes or chromosomal segments are exchanged and which produces heterotypic offspring relative to the parental genotypes
red queen hypothesis view that adaptive change in one species of a community causes deterioration of the environment of other species
residue the repeated unit of a polypeptide (i.e. amino acid) or nucleic acid (i.e. base)
sampling error variability in gene frequencies caused by the fact that not all samples taken from a population have exactly the same frequency as the population itself
secondary structure the local arrangement of a polypeptide or nucleic acid backbone
sequence the primary structure of a polypeptide or nucleic acid

selection

composite of all the forces that cause differential survival and reproduction among genetic variants
selection coefficient relative measure of the effect of selection, usually in terms of the loss of fitness by a genotype, given that the genotype with the greatest fitness has a value of 1
selfish DNA concept that the persistence of DNA sequences with no discernable cellular function arises from the likelihood that once fixed, such sequences are impossible to remove without the death of the organism
social Darwinism concept that social and cultural differences in human societies arise through natural selection processes similar to those that account for biological differences
sociobiology study of the biological basis for human behavior
stabilizing selection selection that favors the survival of organisms within a population that are of intermediate phenotype for particular character, at the expense of the extreme phenotypes (also: centripetal or normalizing selection)
stasis period of equilibrium during which change seems to be very slow or absent (punctuated equilibrium)
synapomorphy possession of two or more related lineages of the same phenotypic character derived from a different but homologous character in the ancestral lineage
systematics the study of classification or taxonomy based on comparative and evolutionary data
tachytelic a relatively rapid rate of evolution
taxon (taxa) a named taxonomic unit or category (e.g.: species, genus, family, order, class, phylum, kingdom)
tertiary structure the overall three dimensional architecture of a protein or nucleic acid
translocation mutational aberration in which a sequence of nucleotides is moved to a different position in the genome
typology study of organic diversity based on the principle that all members of a taxonomic group conform to a basic plan and variation among them is little or no significance