[Frontiers in Bioscience E4, 2464-2475, June 1, 2012] |
|
|
A bivariate variance components model for mapping iQTLs underlying endosperm traits Gengxin Li1, Cen Wu1, Cintia Coelho2, Rongling Wu3,4, Brian A. Larkins2, Yuehua Cui1
1 TABLE OF CONTENTS
1. ABSTRACT Genomic imprinting plays a pivotal role in early stage development in plants. Linkage analysis has been proven to be useful in mapping imprinted quantitative trait loci (iQTLs) underlying imprinting phenotypic traits in natural populations or experimental crosses. For correlated traits, studies have shown that multivariate genetic linkage analysis can improve QTL mapping power and precision, especially when a QTL has a pleiotropic effect on several traits. In addition, the joint analysis of multiple traits can test a number of biologically interesting hypotheses, such as pleiotropic effects vs close linkage. Motivated by a triploid maize endosperm dataset, we extended the variance components linkage analysis model incorporating imprinting effect proposed by Li and Cui (2010) to a bivariate trait modeling framework, aimed to improve the mapping precision and to identify pleiotropic imprinting effects. We proposed to partition the genetic variance of a QTL into sex-specific allelic variance components, to model and test the imprinting effect of an iQTL on two traits. Both simulation studies and real data analysis show the power and utility of the method. 2. INTRODUCTION With the availability of linkage map and molecular markers in many species, coupled with the development of statistical and computational methods, enormous progress has been made in the identification of novel genes or quantitative trait loci (QTL) underlying various complex traits of interest (e.g. 1). Recent advances in biotechnology have enabled the generation of high throughput genome-wide dense single nucleotide polymorphism (SNP) data. Even though there have been large successes in genome-wide association studies (GWAS), GWAS still cannot substitute QTL mapping due to high false positive and false negative rates compared to linkage mapping (2). In linkage mapping, when multiple correlated traits are available, a number of studies have shown that jointly modeling correlated traits can significantly improve the power and mapping precision to detect QTLs (3-5). For a gene with a pleiotropic effect, often it might code for a product which has a signaling function on various targets. This is practically important as the gene might belong to a signaling pathway and could be a potential target for further functional validation. This makes linkage mapping with multiple traits particularly attractive as it can test the pleiotropic effect of a QTL (6). Genomic imprinting is an epigenetic phenomenon in which the expression of the same alleles could be different, owning to their parental origin (7). It plays a critical role in early stage development in many species, which makes it practically important to identify imprinted genes underlying various traits of interest (8). In addition to Mendelian traits, genomic imprinting has been routinely considered in genetic mapping, with the aim to identify genes with sex-specific expression due to epigenetic modification. So far, various statistical attempts have been made to map imprinted QTLs (iQTLs) underlying complex traits (e.g. 9-16). In current applications, statistical methods for iQTL mapping are predominantly focused on single trait analysis, termed univariate iQTL mapping. In real application, it is frequent to observe multivariate traits which are potentially controlled by imprinted genes. For example, the percentage of endoreduplication and mean ploidy level in maize endosperm are two highly correlated traits (17). The two traits describe the level of endoreduplication in endosperm, which is thought to be genetically controlled by imprinted genes (18). We have previously developed a variance components statistical framework for mapping iQTLs underlying the single endosperm trait (see 16). Considering the advantage of joint analysis of multiple traits in QTL mapping, in this study we propose a multivariate variance components model in mapping iQTLs underlying multivariate imprinted traits, and further determine if there is a pleiotropic iQTL effect (6, 19). Our work is based on a published endosperm mapping data set, and the biological application makes the study particularly attractive (17). Endosperm is a triploid tissue resulting from a unique double fertilization process in angiosperms. As a result, the endosperm genome carries two copies of chromosomes inherited from female parent and one copy from male parent. Maize endosperm cells undergoing endoreduplication are generally larger than other cells, which consequently results in larger fruits or seeds and is beneficial to human beings (20). It is thus particularly important from the breeding point of view to identify which genes control the endoreduplication process and where they are located in the genome. To our best knowledge, no study has been conducted to map iQTLs underlying the imprinting process with multivariate traits. Variance components models have been shown to be powerful tools in multi-trait linkage analysis for an outbred or human population (e.g., 3, 4). Due to the special inbreeding structure and unique genetic make-up of the endosperm genome, the current multi-trait linkage analysis methods cannot be directly applied to the endosperm genome. We have previously shown that the variance components model can be applied to an inbreeding population to identify imprinted QTLs underlying a univariate endosperm trait (16). As an extension to our previous method, in this work we propose a bivariate iQTL mapping method to target iQTLs with potential pleiotropic effects. This study will fill the gap in genetic mapping iQTL underlying multiple endosperm traits by considering the imprinting property of a QTL. 3. STATISTICAL METHOD 3.1. The genetic model and parent-specific allelic sharing We consider a backcross design initiated with two inbreeding parental lines with a large contrast in the phenotype of interest. Denote the genotype of two parental lines as Consider two phenotypic traits of interest. Let where the coefficient of the maternal allele effect is set as 2 due to the two identical copies. For the proposed backcross design, there are a total of three possible maternal genotypes, denoted as The above variance components model is built upon the basis of IBD sharing at a QTL. For a triploid inbreeding population, a unique decomposition of parent-specific allele sharing is illustrated in figure 1 of Li and Cui (16). Following the definition given in Li and Cui (16), the phenotypic variance-covariance corresponding to trait j (=1, 2) in family k can be expressed as: The IBD sharing probability mentioned above is calculated assuming that a QTL is located at a marker position. Unless markers are dense enough, a QTL can be anywhere in the genome bracketed by two flanking markers. Here we assume a QTL can be anywhere in the genome and calculate the IBD sharing probability based on the recombination information between a putative QTL and two flanking markers. In a genome-wide linkage scan, we search a QTL every 1 or 2cM throughout the entire genome and the conditional probability of a QTL conditional on two flanking markers can be calculated (see 22). These conditional probabilities are then considered when calculating the IBD probability of a putative QTL at a given genome position (see 16 for more details). 3.2. Parameter estimation Let where All data in where
3.3. Hypothesis testing We propose to search for iQTLs across the genome by assuming a putative QTL every 1 or 2cM by partitioning the whole linkage map into small intervals. At each putative QTL position, we test if there is a significant QTL effect on the bivariate traits by formulating the following hypotheses The significance of the above test is assessed through the likelihood ratio test (LRT). Let which, under the null hypothesis, is distributed as a mixture chi-square distribution with the form
determine the genome-wide significance threshold. Once a QTL is identified at a genomic position, its imprinting property for both phenotypic traits is assessed by the following imprinting hypothesis Again, the likelihood ratio test is applied and the test statistic (denoted as or due to complete silence of the paternal allele by testing
The likelihood ratio test statistic (denoted as We can also test whether a QTL controls trait 1 by testing or controls trait 2 by testing The likelihood ratio statistic corresponding to the above tests is denoted as asymptotically follows a mixture chi-square distribution with the form where Rejection of the null of the above two tests indicates the pleiotropic effect (i.e., one gene acts on two traits). But if two genes are closely linked at the detected (i)QTL (i.e., close linkage), the pleiotropic effect might be a false positive due to close linkage. Thus, it is essential to distinguish a pleiotropic effect vs close linkage. This is exactly the relative advantage of the multi-trait linkage analysis. To further distinguish close linkage against pleiotropic effect, we develop the following two tests for testing pleiotropic effect and for testing close linkage, where r
.¢
s are genetic correlation measures for different variance components between two traits. The null hypothesis in test Hypothesis (11) indicates that the additive effects for two traits are perfectly correlated and two traits are possibly controlled by a single gene. On the contrary, the null hypothesis in test Hypothesis (12) indicates two closely linkage genes at one (i)QTL location. The likelihood ratio test is denoted by 4. SIMULATION 4.1. Simulation design We designed a simulation study to evaluate the performance of the joint analysis as well as the effect of different genetic designs on testing power and parameter estimation. Six equally-spaced markers (M1-M6) were simulated for one linkage group assuming a backcross design. This linkage group covers a length of 100cM. Haldane map function was used to convert map distance to recombination rate. Assume there was one QTL located at 48cM away from the first marker which had effects on two phenotypic traits. Phenotypic values of two traits were generated from a multivariate normal distribution with variance-covariance specified in Equation (2). Parameter values used for simulation are given in Table 2-3. As described in Li and Cui (16), different combinations of family and offspring size could influence testing power and parameter estimation. To mimic the real data, we fixed the total sample size as 400 and considered two designs, i.e., 4 families with 100 offsprings each (denoted as 4�100) and 20 families with 20 offsprings each (denoted as 20�20). In each simulation scenario, the IBD value of any two siblings was calculated at every 2cM along the linkage group using the approach described in Li and Cui (16). The REML method was adopted to estimate unknown parameters of interests, and 100 simulation replicates were recorded. 4.2. Simulation results Figure 1 plots the likelihood ratio statistic averaged over 100 simulation replicates across the entire linkage group obtained with the proposed multivariate analysis and single trait analysis. The bivariate analysis shows consistently high LR values across the linkage group with a clear peak corresponding to the simulated QTL position, which implies the potential power gain in QTL identification with the joint analysis. This is further confirmed by the power analysis in Table 1 (82% power for the bivariate analysis vs 70% and 60% for each single trait analysis) with data simulated assuming no imprinting effect (i.e., The parameter estimation, power of QTL identification and type I error of the imprinting test are summarized in Table 1. Overall, the joint analysis gives larger statistical power (82%), smaller type I error rate for the imprinting test (6%) and less biased variance components estimation compared to the results obtained with the single trait analysis. We also noticed that the joint analysis gives less bias and smaller standard error in QTL location estimation compared to the single trait analysis. In addition, the joint analysis greatly improved the false positive control with the 4�100 design. The simulation results indicate that the joint mapping by incorporating bivariate phenotypic information greatly improves the efficiency of QTL detection. The above simulation results were based on a 4�100 design. Our previous investigation showed that this design is less powerful for single trait analysis compared to the 20�20 design when the sample size is fixed (16). Thus, we did additional simulation under the 20�20 design. The results showed improved power for QTL detection and also improved precision for parameter estimation (Table 2). In real application, our recommendation is to adopt a balanced design with a reasonably large family size. Focusing on the 20�20 design, we did additional simulation to evaluate the performance of imprinting analysis for the joint model under different imprinting mechanisms. The results are tabulated in Table 3. Four imprinting mechanisms were considered: complete paternal imprinting (i.e., 5. REAL DATA ANALYSIS Endosperm is created from the fusion of two polar nuclei and a sperm cell, resulting in a triploid tissue with two identical chromosomes inherited from the maternal parent and one chromosome from the paternal parent. The endosperm supplies nutrition to the embryo during seed germination, and serves as a major source of food for human beings (26). The function of the endosperm is far more complicated than we understand, and is beyond simple nutrient delivery to the embryo. As mentioned in the introduction section, maize endosperm cells undergoing endoreduplication are generally larger than other cells (20). Recent empirical study revealed that endoreduplication in maize endosperm may be associated with genes showing parent-of-origin effects (18). A study was recently launched to identify parent-of-origin effects associated with endoreduplication in maize endosperm (17). Two inbred lines (Sg18 and Mo17) were used to generate the F1 population. Four backcross segregation populations were then generated following a reciprocal backcross design, i.e., Sg18 The sample correlation between the two traits is 0.812 after combining the four backcross phenotype data together, which indicates the potential benefit of a joint analysis by using the proposed bivariate variance components model. To enhance the power of iQTL mapping and further identify the pleiotropic iQTL effect, we applied our newly derived method to this data set. Figure 2 shows the LR profile plot across the ten linkage groups. In addition to the results based on the joint analysis (solid curve), the single trait analysis results are also plotted (dotted line for trait Endo and dashed line for trait Ploid). The 5% genome-wide significance threshold corresponding to the joint analysis (horizontal dotted lines) was determined by permutations based on repeatedly shuffling the relationship between marker genotypes and phenotypes (27). The horizontal dotted line in G7 indicates the 5% chromosome-wide permutation threshold. Note that the single trait analysis was done without correcting for the neighborhood QTL effect, hence the LR curves obtained in Li and Cui are different from the ones presented in this paper (16). As shown in the figure, five QTLs were detected at the 5% genome-wide significance level and are located in linkage groups G2, G4, G6, G9 and G10. One QTL in G7 only passes the chromosome-wide threshold and is a suggestive QTL. Compared to the single trait analysis results, more QTLs were detected by the joint analysis. Table 3 lists the estimates of the QTL position, variance components, and p-values for testing overall QTL effect and various imprinting and pleiotropic effects. With the joint analysis, the identified QTLs in G4 and G6 are imprinted with imprinting p-values 0.02, and 0.04 respectively. Further tests show that the QTL in G4 shows paternal expression ( 6. DISCUSSION A number of studies have shown that for correlated traits, multivariate approaches for genetic linkage analysis can increase the power and precision to identify genetic effects, especially when a QTL has a pleiotropic effect on several traits (5, 6). Considering the importance of imprinted genes in endosperm development and the relative merit of multi-trait analysis, we developed a bivariate variance components model based on a reciprocal backcross design to identify iQTLs while incorporating the special genetic make-up of the triploid inbreeding population. Simulation studies showed the performance of the method under different sampling designs with finite sample size. Comparing the results of joint analysis with those of single trait analysis, the joint analysis greatly improves the performance in QTL position estimation, testing power, and type I error rate. We applied the joint model to a real data set with two endosperm traits, e.g., % of endoreduplication and mean ploidy level. Six QTLs were detected on G2, G4, G6, G7, G9, and G10 across the maize endosperm genome. Among the six QTLs, five showed genome-wide significance, two are iQTLs with maternal imprinting (on G4) and paternal imprinting (on G6). Compared with the single trait analysis, more QTLs were mapped in the joint analysis. In maize, several paternal imprinting genes have been identified. For example, there is the In addition to mapping several QTLs, we also identified significant pleiotropic effects. In maize, some vital genes displaying pleiotropic effects have been reported. For example, maize zfl regulatory genes have pleiotropic effects on structure traits in branching and inflorescence formation (32). The tb1 gene and its intergenic sequences illustrates the pleiotropic effects on maize morphology (33). A maize gene GLOSSY1 (GL1) expresses its effect on trichome size and cutin structure during epidermis development (34). Given the high correlation between the two endoreduplication traits, the identification of genes with pleiotropic effects is practically important. Further functional verification is needed to confirm the findings of this investigation. In the simulation study, the results indicate low power for imprinting detection with the 4�100 design. This result is consistent with the findings we found earlier for a single trait analysis (16). When we changed the design to 20�20 with a fixed total sample size, improved results were observed (Table 3). Even with the extremely unbalanced design (4�100), simulations also show reasonable false imprinting detection rate (6%). The real data setting is quite similar to the simulation design, thus the detected imprinting effects should have little chance to be false positives. Overall, the simulation studies provide practical guidance to real experimental designs: try to maintain a balanced sampling design and avoid extremely small families with large offspring and extremely large families with very small offspring. The current method was derived for single iQTL mapping. Extension to multiple iQTL mapping is in fact not straightforward. Further investigation is needed in this context. On the other hand, even though the method was developed for experimental crosses, extension to human genetic mapping studies is straightforward. The only modification is the IBD sharing probability of sibpairs, where the calculation should take the family structure into account (e.g., 9). In fact, the IBD calculation is simpler in humans, because the cross sharing probability reduces to zero for a natural population with random mating. We hope our method will shed light on human genetic mapping studies to identify imprinting effects with variance components models while considering multiple traits. 7. ACKNOWLEDGEMENT This work was supported by NSF grant DMS-0707031. We thank the two anonymous referees for their helpful comments that greatly improved the manuscript. 8. APPENDIX Derivation of Fisher scoring algorithm for REML estimation. Define the IBD sharing matrices corresponding to the three phenotypic vectors as
The IBD sharing matrices for
Let
9. REFERENCES 1. C Li, A Zhou, T Sang: Rice domestication by reducing shattering. Science 311, 1936-1939 (2006). Abbreviations: GWAS: genome-wide association studies, IBD: identical-by-descent, iQTL: imprinting quantitative trait loci, LR: likelihood ratio, QTL: quantitative trait loci, REML: restricted maximum likelihood, SNP: single nucleotide polymorphism Key Words: Backcross, Close linkage, Likelihood ratio test, Maize, Maximum likelihood, Pleiotropic effect Send correspondence to: Yuehua Cui, Department of Statistics and Probability, A-432 Wells Hall, Michigan State University, East Lansing, MI 48824, Tel: 517-432-7098, Fax: 517-432-1405, E-mail:cui@stt.msu.edu |