|
Sequence
Analysis Words
|
Definitions |
|---|---|
|
primary
sequence
|
the
linkage order of nucleotides (5'63').
It is VERY BAD FORM to write RNA sequences with T's instead of U's !!!
DNAs or cDNAs have T's. RNAs have U's. (e.g.: 5'-ACAAGCUUCUACAUG-3')
|
|
reverse
of sequence
|
numbering of nucleotides is reversed relative to normal. This is equivalent of counting 3'65', rather than 5'63'. This trick is sometimes useful when making DOTPLOT comparisons. (e.g. reverse of above sequence is 5'-GUACAUCUUCGAACA-3') |
|
complement
of sequence
|
complement of each base is presented, but ORDER IS NOT INVERTED! Again, this technique is sometimes useful when plotting sequences on an axis during DOTPLOT comparisons. (e.g. complement of above sequence is: 5'-UGUUCGAAGAUGUAC-3') |
|
reverse-complement
of sequence
|
THIS is the real minus strand sequence!! Not only have the bases been converted to their pairing complements, but the order has been inverted so that the numbering is again, 5'63'. (e.g. the reverse-complement of above sequence is: 5'-CAUGUAGAAGCUUGU-3') |
|
positive-strand
|
that strand of RNA that encodes the biggest, or most important ORFs. With ssRNA viruses, this is not necessarily the strand that is packaged. (Also called the plus strand) |
|
negative-strand
|
RNA strand that is the reverse-complement of a positive strand. (Also called the minus strand) NOTE: in DNA, the minus strand encodes mRNA. |
|
inverted
repeat
|
two tandem, identical sequences within a single linear segment of RNA (or DNA), but where the second segment is the reverse-complement of the first. (e.g. '5-AAAUUCGCNNNNGCGAAUUU-3') |
|
palindrome
|
inverted repeat within DNA or RNA duplex. Effectively, the sequence can be read the same in either the positive or the negative strand. If intra-strand pairing does occur, a cruciform structure results from the apposed hairpins, at the junction of the 4 duplex regions |
|
hairpin
|
antiparallel duplex structure that forms by pairing of inverted repeat sequences within a single-stranded RNA (or DNA). The helical section(s) is called the stem and the unpaired base segment(s) at the end the structure is called the loop |
|
bulge
loop
|
extra bases within a stem that do not have any possible pairing partners. (Causes "kinks" in the helix) |
|
interior
loop
|
bases within a stem for whom pairing partners exist, but the partners are NOT of complementary sequence. (Causes distortion of the helix) |
|
domain
|
one or more in a contiguous series of linked stems and loops which may possibly act in concert as a topological structure, or which collectively may contribute to functionality of a sequence region |
|
dangling
end
|
that portion of a sequence that is left-over after the energetically most-favorable segments have been identified and paired in computer-aided folding algorithms. Certain algorithms may get "creative" with these segments, and predict potential pairings that have absolutely no biological relevance (Read: these segments can cause folding artifacts) |
|
pseudoknot
|
a triple-stranded RNA structure formed when the loop at the top of a hairpin has sequences complementary to an unpaired segment near (or at) the base of the stem. Stringent topological rules determine the probability of pseudoknots formation |
|
analogy(ous)
|
proteins or genes that perform or a share common function, but which arose from different developmental pathways (i.e. similarity NOT due to ancestry) |
|
homology(ous)
|
proteins or genes that are descended from a common ancestor, and therefore share a significant degree of identity or similarity |
|
identity
|
the degree (or percent) to which aligned protein or nucleotide sequences match each other exactly |
|
similarity
|
the degree (or percent) that aligned proteins or nucleic acids are alike in some manner, even if they do not share exact sequences |
|
scoring
table
|
a data file used by sequence and comparison algorithms to "look up" whether any particular tested match, indeed has some significance |
|
indel
|
any insertion or deletion in a sequence |
|
gap
|
a spacer character (...) inserted into a sequence to indicate where an insertion may occur in an aligned sequence |
|
gap
length
|
the number of spacer characters inserted in a gap |
|
gap
weight
|
the mathematical penalty assessed by an algorithm for putting a gap into a sequence, when aligning that sequence to another |
|
ORF
|
Open Reading Frame- refers to a sequence segment that actually encodes a (real) protein |
|
URF
|
Unidentified Reading Frame- any ORF for which it is not yet established that it really encodes a protein. Statistically, a sequence might contain many (overlapping?) URFs, but few if any of these usually turn out to be genuine ORFs |
| Phylogenetic Reconstruction Words |
Definitions
|
| additive tree | a phylogenetic tree in which the distance between any 2 terminal nodes is equal to the sum of the branch lengths connecting them |
| analogy | similarity by convergent evolution, but not by common evolutionary ancestry |
| bifurcation (dichotomy) | the graphical representation in a phylogenetic tree of an evolutionary speciation event whereby an ancestral taxon splits into two |
| branch | the graphical representation of an evolutionary relationship in a phylogenetic tree |
| character | the position of a residue (aa or nucleotide) in an aligned sequence |
| character state | the value that a character takes, that distinguishes it from all other characters |
| clade | (1) According to the rigorous definition, a taxon consisting of a single species and all its descendants representing a monophyletic branch on an evolutionary tree. (2) In looser usage, as above, except that some descendants are not represented. (3) In reference to extant organisms, a subgroup of organisms from among a larger group under consideration, sharing a common ancestor not shared by the other organisms in the group |
| cladogram | a graphic representation that portrays or attempts to portray the evolutionary relationships among a number of populations, species or higher taxa |
| degree of divergence | the extent to which two homologous sequences differ from each other |
| distance matrix | a matrix of genetic distances between taxa in a group under study |
| external nodes | the graphic representation of extant taxonomic units (OTUs) |
| gene tree | a phylogenetic tree that has been constructed from one or a few genes from each species |
| genetic distance (distance) | broadly, any of several measures of the degree of genetic difference between individuals, populations, or species. In reference to molecular evolution, a measure of the number of nucleotide substitutions that have accumulated since divergence between the sequences |
| homology | similarity by common ancestry or genetic relatedness |
| inferred tree | a phylogenetic tree based on empirical data pertaining to extant taxa |
| informative site (diagnostic site) | a site that is used to choose the most-parsimonious tree from among all possible phylogenetic trees. In molecular evolution, a site where there are at least two different kinds of nucleotides or amino acids, and each of them is represented in at least two sequences |
| internal node | the graphical representation of an ancestral organism or gene in a phylogenetic tree |
| maximum parsimony (parsimony) | the selection of the phylogenetic tree requiring the least number of substitutions from among all possible phylogenetic trees as the most likely to be the true phylogenetic tree |
| molecular clock | (1) The rate at which mutations accumulate in a given genomic segment. (2) The hypothesis that in any given gene or DNA sequence, mutations accumulate at an approximately constant rate in all evolutionary lineages as long as the gene or the DNA sequence retains its original function. The extent to which the clock applies to all genes and all organisms is controversial |
| monophyletic | sharing a common ancestor |
| multifurcation | a graphic representation of an unknown branching order in a phylogenetic tree involving three or more taxa. Rarely, a graphic representation of a speciation event resulting in the simultaneous production of more than two species |
| multigene family | a set of genes derived by duplication of an ancestral gene that display more than 50% similarity among them, frequently in close linkage with each other, and possessing similar or overlapping functions |
| neutral theory | the proposal that evolution at the molecular level is primarily driven by mutational input and random genetic drift rather than by natural selection |
| node | the graphical representation in a phylogenetic tree of an extant or ancestral operational taxonomic unit |
| operational taxonomic unit (OTU) | any of the extant taxonomic units under study |
| orthology | sequence similarity as a consequence of a speciation event |
| outgroup | a species or set of species that is the least related to the others in a group of species. The taxon that diverged from a group of taxa before the others diverged from each other |
| parallel substitutions | the independent occurrence of the same mutation at the same nucleotide site in two or more lineages |
| paralogy | sequence similarity between the descendants of a duplicated ancestral gene |
| phenetics | the study of relationships among a group of organisms on the basis of the degree of similarity between them, be that molecular, phenotypic, or anatomical |
| phenogram | a tree-like diagram representing phenetic relationships. This is not necessarily identical to a cladogram, unless there is a direct linear relationship between the time of divergence and the degree of genetic (or morphological) divergence |
| phylogenetic tree | a graphic representation of the phylogeny of a group of taxa or genes |
| phylogenetics | the reconstruction of the evolutionary history of a group of taxa or genes |
| phylogeny | the evolutionary history of a group of taxa or genes and their ancestry |
| polyphyletic | descended from different ancestors |
| rate of gene substitution | the number of gene substitutions per locus per unit time |
| root | in rooted trees, the common ancestor of all taxa under study |
| rooted tree | a phylogenetic tree that specifies ancestral and descendant species, thus indicating the direction of the evolutionary path |
| sequence divergence | the differences between two homologous sequences due to independent accumulation of genetic changes in each lineage |
| sibling species | species that are indistinguishable morphologically but are reproductively isolated |
| sister taxa (neighboring taxa) | in general use, the pair of species among a group of species under study that are evolutionarially the closest to each other. In a phylogenetic tree, two taxa connected through a single internal node |
| speciation (cladogenesis) | the splitting of one population into two or more populations that are reproductively isolated. The process by which new species arise |
| species | a basic taxonomic category for which there are various definitions, among them: (1) a group on actually interbreeding individuals that is reproductively isolated from other such groups (biological species concept); (2) a lineage evolving separately from others (evolutionary species concept); (3) a group of organisms resembling each other more than they resemble any other organism outside the group (taxonomic species concept) |
| species tree | a phylogenetic tree that represents the evolutionary relationships of a group of species |
| stochastic process | a process, the outcome of which cannot be predicted exactly from the knowledge of initial conditions. However, given the initial conditions, each of the possible outcomes of the process can be assigned a certain probability |
| superfamily | a collection of genes, all products of gene duplication that have diverged from each other to a considerable extent (i.e. less than 50% similar at the amino acid level, for a group of proteins) |
| taxon (taxa = plural) | a taxonomic group of any rank (e.g. species, genus, kingdom, etc.) to which individual organisms are assigned |
| topology | the branching pattern of a phylogenetic tree |
| tree | a completely connected acyclic graph(in math terminology)... in phylogeny, this term generically refers to any bifurcating graphic representation of phylogenetic data |
| true tree | a phylogenetic tree that represents the true evolutionary history of a group of taxa |
| unrooted tree | an evolutionary tree that specifies neither the root nor direction of the evolutionary path |
| UPGMA | Unweighted pair group method with arithmetic mean |
|
More Good Terms |
Definitions |
|---|---|
| adaptation | character that has been modified and is or was maintained as a result of selection for increased fitness |
| adaptive landscape | Sewell Wright model describing a topography in which high relative fitness corresponds to peaks and low fitness to valleys: each position occupied by a population bearing a unique genotype |
| allele | variant forms of the same gene that are found in different members (or different genomes) of a species |
| allopatric | species or populations whose geographic distributions do not contact each other |
| anagenesis | change (evolution) of a new species that takes place progressively over time within a single lineage (opposite of cladogenesis) |
| apomorphy | character derived from, yet different from the ancestral condition (also, synapomorphy) |
| APT Accepted Point Mutation | an exchange of one amino acid for another that is accepted by natural selection |
| balanced load | decrease in overall genetic fitness of a population caused by defective genotypes (e.g.: deleterious recessives) whose alleles persist because they confer a selective advantage in certain genotypic combinations (e.g. as heterozygotes) |
| bottleneck effect | genetic drift that occurs when a population is reduced in size, then later expands in numbers. Gene frequencies before and after bottleneck are typically quite different |
| cladistics | mode of classification based on grouping taxa by virtue of their shared position of similar characters that differ from the ancestral condition |
| cladogenesis | branching evolution involving the splitting and divergence of a lineage into two or more lineages |
| coevolution | changes in one or more species as a response to changes in other species in the same community |
| concerted evolution | process by which a series of nucleotide sequences or different members of a gene family remain similar or identical through time |
| conformations | alternate, nonsuperimposable, three-dimensional arrangements of atoms within a linear polymer, that come about by rotation around single covalent bonds |
| convergence |
evolution of similar (analogous) characters in genetically unrelated species, mostly because they have been subjected to similar selective pressures (also, homoplasy) |
| deme | local population of a species (interbreeding group) |
| directional selection | selective pressure that causes the phenotype of a character to shift towards one of its phenotypic extremes |
| ecotype | phenotypic and genotypic variant of a species associated with a particular environmental habitat |
| epistasis | interactions between tow or more genetic loci which produce phenotypes different from those expected if each locus were considered individually |
| evolution | genetic changes in populations, through time, that lead to observable differences among them |
| fitness | relative reproductive success |
| fixation | achievement of a frequency of 100% (i.e. monomorphism) by an allele or genotype which began in a population at a lesser frequency (i.e. polymorphism) |
| founder effect | random evolutionary process by which an individual's variant alleles are fixed into a new population, even if such alleles have no particular physiological significance, or even if they are slightly deleterious |
| frozen accident | concept that an accidental event in the distant past was responsible for the presence of a universal feature in living organisms (i.e. survivors of a bottleneck, e.g. the genetic standard code) |
| gene | a unit of genetic material providing a specific function to an organism. It is found at a locus, and each variant is called an allele |
| gene frequency | proportion of a particular allele among all alleles at a gene locus |
| gene pool | sum of all genes present in a population during a given generation or period |
| genetic drift | random process by which more than one neutral allele is maintained simultaneously in a population |
| Hardy-Weinberg principle | conservation of allelic frequencies in large populations under conditions of random mating and in the absence of selective pressures (i.e. selection, migration, genetic drift) which might act to change gene frequencies |
| heterozygotes | polyploidy individuals (e.g. with more than one genomic copy) which have 2 or more distinctive alleles at a given genomic location |
| heritability | degree to which variations in phenotype are caused by genetic differences |
| heterosis | hybrid vigor |
| homeotic mutations | regulatory mutations that cause development of tissue in an inappropriate location (e.g. bithorax in Drosophila) |
| homozygotes | polyploidy individuals (e.g.: with more than one genomic copy) which have identical alleles at a given genomic location |
| inclusive fitness | measurement of fitness based not only on individual advantages, but also on the effect of related individuals (kin) that also possess it |
| K-selection | selection, within multiple allele populations, for improved competitiveness, rather than for rapid numerical increase |
| macroevolution | evolution of taxa higher than the species level, and commonly involving major morphological changes (also: punctuated equilibrium) |
| macromutation | concept that there are single mutational events whose effects are large enough to produce an instantaneous new species or perhaps even the beginnings of new higher taxonomic categories |
| microevolution | small changes that are usually responsible for differences between populations of a species. Accumulation of such changes may be sufficient to explain the origin of most or all taxa |
| mutation | any change in the nucleotide sequence of a gene |
| mutational load | that portion of the genetic load caused by production of deleterious genes through recurrent mutation |
| natural selection | differential reproduction or survival of replicating organisms caused by environmental agencies |
| neutral mutation | sequences that differ from the "normal" but are functionally indistinguishable |
| nonsense mutation | conversion of a coding triplet to a chain-termination triplet |
| orthogenesis | concept that evolution of a group of related species proceeds in a particular direction (e.g. increase in size) because of forces other than selection |
| orthologous genes | gene loci in different species that are sufficiently similar in sequence to suggest that they are homologs |
| PAM | "Percent Accepted Mutations," a measurable unit of accepted point mutations, usually defined as # of changes per 100 amino acids of protein length |
| parallel evolution | the process of convergence |
| paralogous genes | gene loci in the same organism that are sufficiently similar in sequence to suggest that they are homologues (perhaps through gene duplication) |
| parsimony method | choice of a phylogenic tree that minimizes the number of evolutionary changes necessary to explain divergence |
| peptide | a small number of amino acids linked together with a defined sequence |
| phenotype | the observed structural and functional properties of an organism, whether genetically or environmentally determined |
| phyletic evolution | changes within a single, nonbranching lineage over time (also: anagenesis) |
| phylogenic evolution | branching of a single ancestral line into two or more lineages (also: cladogenesis) |
| phylogeny | the evolutionary history of a species or group of species in terms of their derivations and relationships |
| pleiotropy | instances when a single gene produces phenotypic effects on more than one character |
| plesiomorphy | instances when a species character is similar to that character in an ancestral species |
| polypeptide | longer chains of amino acids linked together, but with either the exact length or the sequence not defined |
| polyphyletic | presumed derivation of a single taxonomic group from two or more different ancestral lines through convergent or parallel evolution |
| primary structure | the specific order of residues in a biopolymer |
| protein | polypeptide chains with specific length, sequence and folded conformation |
| punctuated equilibrium | view that evolution of a lineage follows a pattern of long intervals of stasis punctuated by short bursts of speciation and macroevolution, during which new taxa arise |
| quantum evolution | rapid increase in rate of mutation fixation over a relatively short time |
| random drift | changes in population allele frequencies due to sampling errors or perhaps bottlenecks |
| recombination | processes by which chromosomes or chromosomal segments are exchanged and which produces heterotypic offspring relative to the parental genotypes |
| red queen hypothesis | view that adaptive change in one species of a community causes deterioration of the environment of other species |
| residue | the repeated unit of a polypeptide (i.e. amino acid) or nucleic acid (i.e. base) |
| sampling error | variability in gene frequencies caused by the fact that not all samples taken from a population have exactly the same frequency as the population itself |
| secondary structure | the local arrangement of a polypeptide or nucleic acid backbone |
| sequence | the primary structure of a polypeptide or nucleic acid |
|
selection |
composite of all the forces that cause differential survival and reproduction among genetic variants |
| selection coefficient | relative measure of the effect of selection, usually in terms of the loss of fitness by a genotype, given that the genotype with the greatest fitness has a value of 1 |
| selfish DNA | concept that the persistence of DNA sequences with no discernable cellular function arises from the likelihood that once fixed, such sequences are impossible to remove without the death of the organism |
| social Darwinism | concept that social and cultural differences in human societies arise through natural selection processes similar to those that account for biological differences |
| sociobiology | study of the biological basis for human behavior |
| stabilizing selection | selection that favors the survival of organisms within a population that are of intermediate phenotype for particular character, at the expense of the extreme phenotypes (also: centripetal or normalizing selection) |
| stasis | period of equilibrium during which change seems to be very slow or absent (punctuated equilibrium) |
| synapomorphy | possession of two or more related lineages of the same phenotypic character derived from a different but homologous character in the ancestral lineage |
| systematics | the study of classification or taxonomy based on comparative and evolutionary data |
| tachytelic | a relatively rapid rate of evolution |
| taxon (taxa) | a named taxonomic unit or category (e.g.: species, genus, family, order, class, phylum, kingdom) |
| tertiary structure | the overall three dimensional architecture of a protein or nucleic acid |
| translocation | mutational aberration in which a sequence of nucleotides is moved to a different position in the genome |
| typology | study of organic diversity based on the principle that all members of a taxonomic group conform to a basic plan and variation among them is little or no significance |