[Frontiers in Bioscience S3, 408-415, January 1, 2011] |
|
|
Haplotype complementarity under mutational pressure Dietrich Stauffer1,2, Stanislaw Cebrat2
1 TABLE OF CONTENTS
1. ABSTRACTNatural populations do not correspond to Mendelian populations. Effective populations are much smaller, inbreeding higher, and organization of large number of genes into chromosomes connected with relatively low recombination rate invalidates the law of independent gene assortment. Under such conditions, a large number of genes is inherited as clusters and evolves as genetic units. Computer simulations have shown that mutations inside clusters are not eliminated independently by purifying selection but, instead, the whole clusters tend to complement each other. It means that whenever one haplotype carries one of two possible alleles, the other haplotype at that locus carries the other allele; thus inherited recessive deleterious diseases do not affect the health of the phenotype even if their fraction in the genome is high. This complementation seems to be a winning strategy in small or spatially distributed populations. We discuss possible consequences of this complementarity. 2. INTRODUCTION This review deals with the concept of complementarity for diploid genomes, which was presumably first found in computer simulations (1) similar to the old bit-string model of biological ageing (2, 3). We hope that this review (partially taken from (4)) will encourage experimental biologists to look for such effects in reality. Presumably this complementarity is more likely found in small populations with low recombination rates during sexual reproduction, and obviously for recessive instead of dominant mutations (our mutations are regarded as detrimental, causing life-threatening hereditary phenotypic defects). 2.1. Definition: purification versus complementation Usually Darwinian selection is thought to lower the number of detrimental mutations, but due to copying errors and other reasons new inheritable mutations appear. Thus the average number of deleterious mutations fluctuates about some low fraction of the total number of alleles. This mechanism is called "purification". However, for sexual reproduction of diploid organisms, another strategy is possible if (nearly) all mutations are recessive; we call this alternative "complementation''. Let us take a simple model of only eight genes; the wild type or functional allele is denoted by 0, and the deleterious one by 1. The diploid genome then consists of two sequences of 0 and 1, which are called bit-strings. An example may be 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 which means that only one locus, the third one, is un-functional, and all others are functional. Only if both alleles are mutated at the corresponding locus the mutations affect the phenotype. Thus genes 5 and 6 may be detrimental in future generations but not for this individual. This example is one of purification, since only one quarter of the bits are set to 1, and three quarters are in the wild form of 0. An alternative example, for complementation instead of purification, is: 1 1 0 1 0 1 0 0 0 0 1 0 1 0 1 1 where now half of the alleles are in the un-functional state of 1, and half are still wild (0). Nevertheless, this individual does not feel any of these mutations since at no locus both alleles are set to 1. Here we have complete complementarity. In general, one can change from one extreme to the other by measuring the heterozygosity, which is the fraction of genetic loci carrying different alleles in the two bit-strings. This heterozygosity is 2/8 = 0.25 in the first example and 8/8 = 1.0 in the second example. Thus full purification produces heterozygosity equal to zero, and full complementation heterozygosity equal to one. The Hamming distance is the number of loci which are different in the two bit-strings and thus it is the heterozygosity multiplied with the number of investigated loci. (See (5) for a polymorphic generalisation to eight instead of only one bit per allele.) We are not aware that this seemingly trivial possibility of complementarity was found in computer models earlier than (1); in hindsight one might interpret Figure 1 in (6) as indicating an evolution towards complementarity. Now we bring some examples where it was seen in recent computer simulations (7-14). 2.2. Mutations Mutations can happen due to errors in the genome duplication during cell division, or due to external reasons like ionizing radiation. They may happen in any cell of our body without being transferred to our children; then they are called somatic and ignored here. Alternatively, they may appear in the germline cells being transmitted to the offspring by gametes, and are called inheritable. Most of the mutations make an allele un-functional and thus are deleterious; back mutations to the original "wild'' state (reversions) are rare and ignored here. We also do not deal with the rare positive mutations which transformed the first living cells over thousands of million years into the present authors. Thus we deal only with deleterious inheritable mutations. At first, one may think that life would be better if mutations could be avoided. Indeed, sickness, ageing and death may come from such mutations . However, if we would live forever, there would be no place for our children, and biological evolution would not have happened. Indeed, some simulations (15) in a changing environment showed an optimal mutation rate to maximise the whole population. There are also some other premises suggesting that mutational pressure is optimised, i.e. some free living organisms, with very sophisticated systems of replication and DNA repair have lost their DNA repair systems when entering the strategy of parasitism, penetrating into the interior of host cells and reducing the genome size (16). The same effect has been observed during evolutionary reduction of free-living organism (17). The last phenomenon of decreasing the accuracy of replication system and keeping the mutational rate per genome per generation has been observed in many independent phylogenic lines (see for review (18, 19)). Therefore, mutational pressure, though bad for the individual, is not necessarily bad for Nature as a whole. We deal here with models where ageing or deaths are caused by this mutational pressure. 3. SOME COMPUTER MODELS The details of the models are less important than the emergence of complementarity in the genetic pools of populations simulated by those the models. The number of bits (genes) in each bit-string (haplotype) is called L, the minimum reproduction age R, the number of births for each pair after mating B, the mutation probability per haplotype (i.e. per bit-string replication) M, the probability of recombination during gamete production r (for both father and mother; also called the crossover rate C in the literature). The Verhulst factor N/K is the probability to die because of lack of food or space, where N is the current population size and K a "carrying capacity of the environment". In the Penna model (2), the position of a locus corresponds to the age of an individual, and only mutations at that or at earlier positions affect the health of the individual. It is assumed that each bit corresponds to one "year'' in the individual lifetime, and consequently each individual can live at most for L "years''. As an example, an individual with a genome 10100... would start to become sick during its first year of life and would become worse during its third year when a new disease appears. In this way the bit-string represents in fact a "chronological genome". New mutations, introduced during the gamete production are transmitted to the offspring, not to the parent, an effect of somatic mutations is neglected. Active mutations (T) kill the individual at that age. Typical values are L=64, R = 5L/8, B=4, M=1, 0.001 < r < 1, T=3. 3.1. Emerging complementarity For the sexual Penna model, Figure 1 shows the two regimes of low and high recombination rates. Each curve has a gap in the middle where the population dies out, for r near some critical value rc. For low r the population survives with the help of the above complementarity trick; for high r it survives through purification, considered as the usual Darwinian selection of the fittest with a small number of deleterious mutations. In the right part of Figure 1, purification happens, and the population is the larger the larger K is. In the centre of Figure 1, a gap appears which shifts to the left with increasing K; to the left of the gap, complementarity appears. (Increasing the births from B=2 to B=4 avoids the gap.) Figure 2 shows how the equilibrium distribution of Hamming distances looks like, after purification and for complementation, when only the first R=40 of 64 bits are counted. In such a population, under complementation strategy, nearly all individuals have the same pair of bit-strings A and A' in their diploid genome, thus producing haploid gametes (ovum and sperm cells) of two types only, either A or A'. An A sperm combined with an ovum of gamete type A cannot survive with many homozygous loci with recessive mutations which affect the phenotype of individuals. The same happens with ovum and sperm cell both of type A'. But if one is of type A and one of type A', the A||A'-zygote can survive even if half of the bits (alleles of the genome) are mutated, since there is always a one-bit in A combined with a zero-bit in A' and thus for recessive mutations the phenotype is not affected. Thus high numbers of mutations can be tolerated in this strategy. For purification, on the other hand, mutations are rare. (Warning: Sometimes changes are very slow; for L=512, R=320 we even had a case where the population decayed first very slowly, and after 700 million iterations went very fast to extinction.) 3.2. Gamete recognition Somewhat related is gamete recognition (10), where the ovum rejects those sperm cells for fusion into a diploid zygote whose haploid genome is too similar to the haploid genome of the ovum. This effect is beneficial if the population, due to a low recombination rate, shows complementarity. If this gamete selection is added to the sexual Penna model then complementarity survives for higher r, the population size to the left of the gap in Figure 1 (small r) is strongly enhanced while the populations to the right of the gap (r closer to unity) barely change. Also Figure 3 illustrates through the Hamming distances this balance between complementation at small r (upper data in Figure 3) and purification at large r (lower data in Figure 3), separated by extinction at intermediate r near rc. For these Hamming distances we take into account the first R=40 of the 64 bits. For complementarity without gamete recognition, the whole diploid population has two bit-strings A and A', each of which with about 20 bits zero and 20 bits one. The zygotes thus are of type A||A and A'||A' with Hamming distances close to 0 and of type A||A' and A'||A with Hamming distances close to 40; the average Hamming distance therefore is close to 20, as shown in Figure 3 near t = 10,000. The A||A and A'||A' will die out in the next iteration, the A||A' und A'||A will survive. After 25,000 iterations, gamete recognition is switched on, neither A||A nor A'||A' is allowed to form a zygote, and the Hamming distances approach 40, as shown in the interval 26,000 to 100,000 iterations. For large r and purification, the number of mutated bits and thus the Hamming distance is much smaller, and the latter shows only a small jump from 9.6 to 10.3 (independent of K) when gamete recognition is switched on. (If one of the 64 mutations is made dominant, not much changes, but nine dominant mutations lead to catastrophe and population extinction (10). 3.3. Role of mutational pressure on the emerging of complementarity Mutations are needed to drive evolution, but also endanger the survival. Figure 4 shows the effect of this mutational pressure: For M=2 mutations per generation and haplotype, the population died out while for M=1 extinction was avoided. For mutation pressure below M=1 the population reaches nearly its maximum K. Figure 5 shows the slow emergence of the separation of complementation (small r) and purification (large r); here the simulation time t measured in updates per individual varies from hundred (bottom curve, +) to ten million (top curve, o). 3.4. The role of inbreeding As has been shown in Figures 1, and 3, reproduction success depends (among other parameters) on the interplay between the intragenomic recombination rate (crossover frequency) and the size of population. Below a specific crossover rate populations prefer to complement haplotypes instead of to intensively eliminate defective alleles. In Figure 6 we show how this critical crossover rate depends on the population size (11), where rC is defined as the crossing probability at which the number of mutations goes down drastically. In the range of two decades in population size there is a power law relation. Nevertheless, the data shown in the plot were obtained in simulations of panmictic populations. In such populations females look for and choose randomly a sexual partner from the whole population. In Nature the process of choosing the partner is usually non-random and, what is more important, it is spatially restricted. Individuals are looking for partners in their neighbourhood. Thus, the effect of the population size should be considered as an effect of the inbreeding, rather. Inbreeding (coefficient) is a measure of genetic relatedness between mates. If the individuals live in small "inbreeding" groups, then the inbreeding coefficient is high and there is a high probability that the sexual partners share some undisrupted fragments of the same ancestral genome. To study the effect of inbreeding, the simulation of evolution was performed on lattices; see (7) for inbreeding without lattices, by dividing the population into groups. On lattices the level of inbreeding was set by declaring the maximum distance where individuals can look for partners and where they can place their offspring (1). The simulations were performed on a square lattice 1000x1000. (Indeed, if the lattice size varies with a fixed size of the neighbourhood parameters, rc barely changes (11). If the above distances within which partners are searched were set to 5, the critical crossover rate rc was around 0.2. Populations evolving under lower recombination rate or shorter distances prefer the strategy of complementing the haplotypes while under higher recombination rate or longer distances they choose the strategy of purifying selection. Nevertheless, there are very important consequences of such a kind of choice. The complementarity evolves locally and remote subpopulations on the same lattice can have different distributions of defective alleles in their haplotypes. Using some tricks with colouring the individuals according to their genomes' structure it has been shown that the lattice is occupied by individuals with different genotypes but they are clustered. Individuals with the same genotypes occupy the same territory (see http://www.smorfland.uni.wroc.pl/sympatry/ for some examples of simulations under different conditions). Further studies have shown that for sympatric speciation only the central part of the genome is responsible. The lateral part of the genome is much more polymorphic and decides on biodiversity, rather than speciation. That is why the Hamming distances between homologous haplotypes inside species are noticeable. These simulations show that sympatric speciation is possible and there is no need for physical, geographical or even biological barriers for the new species to emerge inside the population of the older one.) However, complementarity is not always a strong function of the population size. The results of (5) are unclear; and in the simple model of (20), Figure7 shows about the same transition from complementation to purification, when the capacity K is increased by a factor of thousand. We see practically no change whether K is 5000, half a million, or five million. In that simplification of (9), no age structure is involved, and an individual survives with probability xnV where V is the usual Verhulst factor for adults, n the number of deleterious homozygous loci appearing in both bit-strings (chromosomes) of the diploid genome, and x < 1 determines the damage made by a single homozygous locus present in a genome. Males and females are distinguished, new mutations occur at gamete production and are transferred only to the baby. One of the explanations of the transition to the complementation in those simulations is a large fluctuation of population size under such conditions of simulations. Recent studies of population evolution performed on lattices have shown that fluctuations in population size induced by changing environment enhance the sympatric speciation (21; see also (5)). 4. CONSEQUENCES Now that the reader may have understood the complementarity principle, what are the consequences of this possible survival strategy via complementarity? Also, complementarity is an advantage of sexual reproduction compared to the asexual haploid case where complementarity is impossible. Now we discuss some further consequences. 4.1. Sympatric Speciation If one species splits into two with largely overlapping geographical ranges, this is called sympatric speciation. It is facilitated by complementarity through the following effect. Originally we have complementary haplotypes A and A' as discussed in section 3.1. leading to survivable A||A' or A'||A zygotes even though only about half of the alleles are of the wild type. Slowly, for part of the population A may be change into B for any one haplotype, and simultaneously A' into the complement B' of B. After some time, A||B' and A'||B zygotes may no longer be viable, and the subpopulation with B and B' has become reproductively isolated from that with A and A' haplotypes. For purification, in contrast, we have only A changing into B while keeping most alleles in the wild type, thus still keeping A||B zygotes survivable because of the low number of deleterious alleles. In this way, reproductive isolation and thus speciation is easier for complementarity than for purification (22). 4.2. Distribution of recombination events Computer simulations have shown that a critical parameter for the emergence of complementarity is the recombination frequency. Human genome parameters suggest that the consequences of complementarity should be seen at least in some regions of human chromosomes, especially if one considers the uneven distribution of recombination events along them. There are so-called recombination hot spots observed where recombinations happen relatively often, and recombination deserts where recombinations are not observed at all. In these deserts complementing clusters of genes should be more likely to appear. Moreover, these regions seem to be non-randomly distributed on chromosomes. It has been noticed that the distribution of accepted recombination events in the genomes of simulated populations depends on parameters of simulations. If evolution is studied in small effective populations under relatively low recombination rates, the central parts of chromosomes start to form clusters of genes where recombinations have deleterious effect on reproduction potential. Gametes which are produced by recombinations in these regions have lower chance to produce the surviving zygotes. As a result, the recombination events in gametes which succeeded in forming the surviving individuals have a characteristic distribution, with higher recombination frequencies in the regions close to the ends of chromosomes and lower recombination rate in the central part of chromosomes (22), as observed in reality (23, 24). 4.3. The effect of gamete recognition Complementation strategy assumes that two different (complementing) sequences of alleles fit to each other producing a better fitted genome. If we consider a set of chromosomes with only one pair of complementing clusters then it would be more economical to recognize which chromosome has an identical cluster and which one has a complementing cluster of genes, before two gametes fuse to form a zygote. Such systems of recognition or probing the information inside another cell are known even in the bacteria world - i.e. an entry exclusion system which prevents a bacterium to engage in conjugational process if a partner cell already possesses genetic information to be transferred (25). It is suggested that in humans the Major Histocompatibility Complex (MHC) can play such a role in preselection of partners (26, 27). This complex alone is not enough to guarantee the fusion of complementing haplotypes. The mechanism should be located at the level of gametes and, to be efficient, it should be independent for different pairs of chromosomes nestling the complementing clusters of genes. There is a group of genes which could fulfil such a role - Olfactory Receptor genes (OR). This is the largest gene family in the human genome composed of almost 1000 genes and pseudogenes, clustered in many different groups located on almost all chromosomes (excluding Y) and at least some of these genes are expressed during spermatogenesis (28). If we assume that each of our 22 pairs of autosomes has complementing clusters of genes, then an ovum would have extremely low chance to find a fully complementing sperm cell (2-22). If an ovum could choose such a sperm cell, it should have a pool of at least 222 sperm cells. In fact this pool seems to be about 10 times larger. Comparison of simulated critical recombination rate rc for effective population size 100, 200, 1000 (from top to bottom) for chromosomes containing L genes with parameters of human chromosomes. For real chromosomes (+), data show the average number of crossovers per meiosis (y-axis) against the number of genes per chromosome (x-axis). Thus human genomic data suggest that at least some parts of our genome could evolve under complementing regime. 5. DISCUSSION The destruction of complementarity by high crossover rates r is easy to understand: The delicate emergence of two complementary bit-strings A and A' in the whole population is destroyed for each individual where crossover in the middle of the chromosome leads to massive changes in the chromosome structure. The dependence on the (effective) size of the population seems more complicated. It is also possible that rc depends on the size of the chromosome, or that in one genome some chromosomes should complement while others follow purification (12). Figure 8 compares such simulations with reality; the order of magnitude seems to be realistic. Complementarity may also affect the distribution of crossing points along the bit-strings (13, 14). Complementarity requires that whole sequences of neighbouring genes are transmitted together after recombination; if for each locus the transmitted allele is selected randomly one would hardly find nearly complementary haplotypes (14). In any case, the details of the models are not important; our important point is that allele complementarity is in principle plausible, was found in some computer simulations, and should be checked in reality. This review dealt with complementarity. A referee pointed out modifications which should be investigated in the underlying models, independently as well as in the association with complementarity: 1. Other variants of allelic interactions: a) incomplete domination (when organisms with "Aa" genotype and organisms with "AA" genotype have different phenotypes); b) co-domination (when both allelic genes are expressed and needed for survival, just like in case of ABO blood-group antigens); 2. Variants of non-allelic interactions like epistasis; 3. The situation when organisms with "Aa" genotype have higher risk of "A" to "a" deleterious mutation in their somatic cells than organisms with "AA" genotype. 6. REFERENCES 1. Zawierta M, Biecek P, Waga W, Cebrat S: The role of intragenomic recombination rate in the evolution of population's genetic pool. Theory Biosci, 125, 123-132 (2007) Abbreviation: MHC: major histocompatibility complex; OR: olfactory receptor genes Key Words: Recombination, Recombination Hot Spot, Recombination Desert, Mutational Pressure, Recessive Mutation, Purifying Selection, Complementation, Sympatric Speciation, Computer Simulation, Population Evolution, Evolution On Lattice, Genetic Pool, Gene Clusters, Penna Model, Review Send correspondence to: Stanislaw Cebrat, Department of Genomics, University of Wroclaw, ul. Przybyszewskiego 63/77, 51-148 Wroclaw, Poland, Tel: 48713756303, Fax: 48713252151, E-mail:cebrat@smorfland.uni.wroc.pl |