[Frontiers in Bioscience E4, 2670-2682, June 1, 2012] |
|
|
A score-statistic approach for determining threshold values in QTL mapping Chen-Hung Kao1, Hsiang-An Ho1
1 TABLE OF CONTENTS
1. ABSTRACT Issues in determining the threshold values of QTL mapping are often investigated for the backcross and F2 populations with relatively simple genome structures so far. The investigations of these issues in the progeny populations after F2 (advanced populations) with relatively more complicated genomes are generally inadequate. As these advanced populations have been well implemented in QTL mapping, it is important to address these issues for them in more details. Due to an increasing number of meiosis cycle, the genomes of the advanced populations can be very different from the backcross and F2 genomes. Therefore, special devices that consider the specific genome structures present in the advanced populations are required to resolve these issues. By considering the differences in genome structure between populations, we formulate more general score test statistics and Gaussian processes to evaluate their threshold values. In general, we found that, given a significance level and a genome size, threshold values for QTL detection are higher in the denser marker maps and in the more advanced populations. Simulations were performed to validate our approach. 2. INTRODUCTION The statistical model of interval mapping (IM) proposed by Lander and Botstein (1) is generally a normal mixture model, as the genotypes of the quantitative trait locus (QTL) are not observable and needed to be inferred from its flanking markers. In the parameter estimation of the normal mixture model, the maximum likelihood estimation is commonly implemented to obtain the maximum likelihood estimates (MLE) through the EM algorithm (2) by treating the model as an incomplete-data problem. Typically, the presence of a QTL, i.e. the null hypothesis of no QTL, is tested over the all possible positions in the whole genome by using likelihood ratio test (LRT) statistics, and the position with the significantly maximum LRT statistic is regarded as the estimated QTL location. Under this framework, it has been recognized that the determination of the threshold values for claiming significant QTL detection (rejecting the null hypothesis) along the genomes is one of the complicated and important issues in QTL mapping (3-4) for the following reasons. One is that the QTL position is unidentified under the null hypothesis, and the maximum LRT statistic does not follow the standard χ2 distribution asymptotically (4). Further, various factors, such as the number and size of intervals, population genome structures, and informativeness of markers, will involve in and should be considered in determining the threshold value for claiming QTL detection (5-8). Besides, because multiple correlated and uncorrelated tests are performed in searching for QTLs on the whole genome, the common pointwise significance level is not appropriate and genomewise significance level should be considered in QTL mapping (1,4,6,8).
Several theoretical and simulation approximations have been proposed to determine the threshold values of QTL detection. Lander and Botstein (1) suggested using Bonferroni argument for sparse-map case and ORENSTEIN-UHLENBECK diffusion for dense-map case to determine the threshold value. For intermediate situations, extensive numerical simulations have been used to determine the thresholds (1,9). Churchill and Doerge (10) suggested using a permutation test for determining an appropriate threshold values for specific data sets. Rebai, Goffinet and Mangin (11) used Davies's bound (12) to derive a conservative formulas for calculating the approximate thresholds for backcross and F2 populations. They demonstrated good performance of their formulas using simulation. Dupuis and Siegmund (13) provided approximate formulas to calculate the threshold for the case of very dense markers in the population, but they did not take interval mapping into account in their approximation. Piepho (14) also used Davies's bound to propose a quick method for computing the approximate thresholds for general designs. The quick method is computationally inexpensive and claimed to be an alternative to permutation procedure. Zou et al. (8) and Chang et al. (4) proposed a score-statistic framework to assess the threshold values. As the maximum of the square of score test statistics is asymptotically equivalent to the maximum LRT statistics, the threshold values derived from the score test statistics can be used as those for the LRT statistics in the population (4, 8, 15-16).
On the basis of score test statistics, Zou et al. (8) proposed a resampling approach to obtain the threshold values mainly for the F2 population by simulating the F2 genome structure. Chang et al. (4) also devised a score-statistic method for computing the threshold values in a backcross population by analytically analyzing the backcross genome structure. Chang et al. (4) showed that score test statistics along the genomes is a Gaussian stochastic process with mean zero and well-structured covariance, and they used them to compute the threshold values of QTL detection in the backcross population. The score-statistic method not only can be less computationally demanding than the permutation test and numerical simulation, but also can be more accurate than previous approximate formulas in the computation of the threshold values (4,8). So far, most of the studies of assessing the threshold values for QTL detection are investigated in the backcross and F2 populations (8, 11 and 14 discussed the threshold values for F3 populations), still they are generally lacking or inadequate for the progeny populations after F2 (advanced populations). These advanced populations, such as recombinant inbred (RI) and advanced intercrossed (AI) populations, have been well devised and implemented in genetic studies. For example, Bai et al. (17) used RI populations in rice and Kelly et al. (18) implemented AI populations in mice for investigating the genetic architectures of quantitative traits in their studies. For specific populations where time is not an issue, the advanced populations can have some very useful properties in that their genomic structures allow researchers to yield better results in their investigations. Therefore, it is of importance to address the issues of assessing the threshold values for these populations in more details in QTL mapping study. Due to the fact that these advanced populations undergo multiple meiosis cycles, their genomic structures, such as homozygosity, genotypic frequency and variance components, are differing and can be very different from the backcross and F2 genomes. In this article, by distinguishing between different population genome structures, we formulate more general score test statistics and Gaussian processes under the framework of interval mapping to compute the threshold values and to study their behaviors for various populations. One of the keys to our approach is to devise the genotypic distributions of two, three and four genes of the populations into the formulations, so that their specific genome structures can be well described to address these issues across various populations. Simulation studies are performed to evaluate our approach to assessing the threshold values. The R program of our approach is available on http://www.stat.sinica.edu.tw/~chkao/.
3. EXPERIMENAL POPULATIONS
3.1. Advanced populations
Various experimental populations have been designed for the study of QTL mapping. Among these populations, the backcross and F2 populations have been the most widely used designs in the studies. Besides, the progeny populations from the F2, which are called advanced populations, are also very common. These experimental populations are produced as follows. A cross between two parental inbred lines, P1 and P2, is performed to produce an F1 population. If the F1 individuals are backcrossed to P1 or P2, it produces the backcross population. If the F1 individuals are selfed or randomly mated, it produces an F2 population. If the F2 population is further selfed and/or randomly mated for generations, the produced progeny populations will be called advanced populations. These advanced populations include recombinant inbred (RI) and advanced intercrossed (AI) populations. As the RI populations or AI populations is obtained by repeatedly selfing (inbreeding) or randomly intermating the F2 individuals for several generations, their genomes will be subject to more meiosis cycles as compared to the backcross and F2 populations. Therefore, the genomic constitutions, such as homozygosity, genotypic distribution and linkage disequilibrium, of the advanced populations will be different from each other (19) and no longer similar to those of the backcross and F2 populations. In general, selfing will increase the homozygosity at the expense of heterozygotes. Also, further meiosis cycles tend to accumulate crossovers so that the proportion of recombinants will increase and linkage disequilibrium between genes will reduce in the populations. These features in the advanced populations can be very useful and may allow for more productive investigations. For example, the RI populations can increase the homozygosity to assist the detection of additive effects in QTL mapping (20). Further, the AI populations can harbor more recombination events in a short chromosome segment for genetic fine mapping and may provide better power in the separation of closely linked QTL (20-21).
3.2. Genotypic distributions
As will be shown and explained later, an important key to successfully compute the threshold values for these populations is to well devise the genotypic distributions of one, two, three and four genes of the populations into the formulations, so that the specific genome structures of the populations can be well considered under our proposed framework. We now explain briefly how these genotypic distributions are obtained in different populations. Consider an F2, AI or RI population used for the QTL mapping studies. In general, for m genes, there are 2m different gametic genotypes and 2(2m-1)+2m/2 zygotic genotypes. Therefore, for m=1, 2, 3 and 4, there are 2, 4, 8 and 16 gametic genotypes and 3, 10, 36 and 136 zygotic genotypes. As the different populations undergo different numbers of meiosis cycle, they will have different distributions of gametic and zygotic genotypes in the genomic constitutions. For one gene, there are three possible genotypes, P1 homozygote, heterozygote and P2 homozygote. The frequencies of these three genotypes are expected to be 1/4, 1/2 and 1/4, respectively, in the AI populations. In the RI populations, the frequency of heterozygote is halved in each selfing cycle. For m=2, 3 or 4, the genotypic frequencies in different AI and RI populations can be obtained by using the transition equations of Haldane and Waddington (22), Geiringer (23), and Kao and Zeng (20, 24). These transition equations, which are derived under the assumptions of Haldane map function and equal crossover value in two sexes, aim to obatin the genotypic distributions of the populations in different generations when individuals are subject to random mating or selfing process. In RI populations, the 5, 20 and 72 transition equations in Haldane and Waddington (22) and Kao and Zeng (20, 24) can be used to compute the frequencies of the 10, 36 and 136 genotypes of two, three and four genes. In AI populations, the gametic frequencies are first computed, and then the genotypic frequencies can be obtained from the product of gametic frequencies. To obtain the frequencies of the 4, 8 and 16 gametic frequencies for m=2, 3 and 4, Geiringer's approach (23) can be used to formulate the transition equations of gametic frequencies or the sets of transition equations provided by Kao and Zeng (20, 24) can be directly used. Conceptually, it is also possible to obtain the genotypic distributions for any number of genes in any AI and RI populations by extending their approaches. However, there may be too many equations, and each equation contains numerous terms.
4. STATISTCAL MOEL OF INTERVAL MAPPING FO QTL
4.1. Statistical model
The data structure of QTL mapping generally consists of two parts, where are the coded variables of genotypes of Q, and 4.2. Conditional probabilities
Although Q is not observable, its genotypic distribution can be inferred from its flanking marker genotype according to the principle of conditional probability as
Therefore, obtaining the above probability involves in the use of the genotypic distributions of two and three genes in the experimental populations. For any advanced population under consideration, the two flanking markers can have nine different genotypes, MN/MN, MN/Mn, Mn/Mn, MN/mN, MNmn (MN/mn or Mn/mN), Mn/mn, mN/mN, mN/mn and mn/mn. For each one of the nine marker genotypes, the genotype of Q can be 5. LIKLIHOOD OF THE STATISTICAL MODEL
For an individual Note that where 6. SCORE TEST STATISTICS
6.1. Score functions
The score functions for the additive and dominance effects are the derivatives of the log likelihood (Equation (4)) with respective to the parameters, If the and
where and
and the covariance between 6.2. Score test statistics
If only additive or dominance effect under the null, If both additive and dominance effects are considered at a time, we may define the score function as where 6.3. Asymptotic forms of score test statistics
To understand the null distributions of the score test statistics, the asymptotically equivalent forms of where
Also, due to symmetry, the term where
Note that the term 7. GAUSSIAN STOCHASTIC PROCESS
7.1. Covariance between test statistics
The previous discussions mainly focus on deriving the score test statistics at a fixed position where and
Similarly, the remaining and the covariance between As where 7.2. Covariance between trait means
For where Similarly, we can obtain the other components of covariances as
Since which depends on the genotypic frequencies of four loci. Similarly, for neighboring intervals, the covariance between The other covariances, such as 8. SIMULATING THE NULL DISTRIBUTION
The scheme of simulating the null distributions of When simulating To simulate the null distribution of 1.Generate ( 2.Compute the remaining three trait means, e.g., 3. Compute the score test statistics, The above steps are repeated many times, say 10,000 times, to obtain the approximate distribution of 9. REAL EXAMPLE AND SIMULATION STUDIES
9.1. Real example
As a real data application, we considered a backcross model to compute the threshold value of QTL mapping in a pseudo-testcross population of Radiata pine. One hundred and twenty markers contributed genotypic information across twelve linkage groups and covered ~1679.3 cM. The traits considered are tree diameter, branch quality scores and brown cone number. The average spacing of the 107 marker intervals was 13.5 cM. The maximum distance between two adjacent markers is 74.8 cM, and the minimum is 1.5 cM. For this practice of QTL detection, Kao et al. (7) used Bonferroni argument to choose a value of 12.12 (χ21,0.05/107) as a threshold in the analysis. By using our approach, a larger threshold value 12.43 is obtained. By using 12.43 as a threshold of QTL mapping, the numbers of detected QTL for the three traits are the same as those by using 12.12 as a threshold in this case.
9.2. Simulation studies
Simulations were performed to evaluate the performance of the proposed method and study the behaviors of the threshold values under various marker densities in several experimental populations. For each population, one chromosome with a total length of 100 cM was simulated. On the chromosome, we assume that there are 101, 51, 21, 11, 6, 3 and 2 evenly spaced markers, i.e. the marker distances are 1, 2, 5, 10, 20, 50 and 100 cM, respectively. The experimental populations considered include F2, AI (RI) F3, AI (RI) F4, AI (RI) F5, AI (RI) F6 and AI (RI) F10 populations. Except for the RI F10 population, both additive and dominance effects are considered. For the RI F10 population, only the additive effect is considered, as there are very few heterozygotes due to continuous self. For each case considered, score test statistics are computed every 1 cM, and the number of simulated replicate is 10,000. The 10,000 maximums of the score test statistics along the chromosome are recorded. To validate our method, we also simulate 10,000 sets of the traits and markers from 200 individuals for each marker density and experimental population under the null hypothesis. Each data set is then analyzed by the interval mapping approach, and the LRT statistics were computed every 1 cM and the 10,000 maximums were obtained for comparison. Results of the threshold values at α=0.05 from the maximums of LRT statistics and score test statistics are given in Table 2. It is worth pointing out that the score-statistic approach is several hundred times faster than the LRT-statistic approach to complete the results in Table 2. This advantage of greatly saving computation time has been identified by Chang et al. (4).
Table 2 shows that the threshold values obtained from the two different approach are generally very close to each other, especially, in the F2 population. For example, the values by the score-statistic approach are 8.103, 8.966, 9.853, 10.574, 11.145, 11.732 and 11.959, respectively, for the sever marker densities in the F2 population, and those from the LRT statistic are 8.128, 8.978, 9.998, 10.481, 11.120, 11.750 and 12.318, respectively. For more advanced AI or RI populations, the differences between the threshold values seem to become relatively larger, especially for the sparse marker density in the more advanced AI populations. For example, in the AI F10 population, the threshold values from the score test statistic are 9.410 and 8.194 in the 50- and 100 cM-marker spacing, and those from the LRT statistic are 10.401 and 9.879. Also, the threshold values are increasing for denser marker maps, which is consistent with the findings in previous studies (8, 11). Such an increasing trend becomes more obvious in the more advanced populations as compared with that in the earlier populations. For example, the values for 100- and 1-cM marker spacing are 8.200 and 13.518 (8.273 and 13.257), respectively, in the AI (RI) F6 population, and they are 8.183 and 12.580 (8.076 and 12.447) in the AI (RI) F3 population. Besides, the threshold values are higher in the more advanced populations. For example, the values for 10-cM marker spacing are 10.574, 10.887 (10.859), 11.132 (11.186), 11.533 (11.315), 11.871 in the F2, AI (RI) F3, AI (RI) F4, AI (RI) F5, AI (RI) F6 and AI F10 populations. The threshold values in the AI populations are generally larger as compared to those in the RI populations (except for the case of 100-cM spacing in the F6 population). For example, the threshold values are 8.075, 9.080, 10.090, 10.859, 11.487, 12.212 and 12.447 for the seven marker densities in the RI F3 population, and they are 8.183, 9.100, 10.130, 10.887, 11.638, 12.325 and 12.580 in the AI F3 population. The similar trends can be also observed in the other AI and RI Ft populations.
10. DISCUSSION
When applying the interval mapping procedure to search the whole genomes for QTL, typically, the LRT statistics are constructed over the all possible positions to test for the null hypothesis of no QTL, and the position with the significant maximum of LRT statistic is regarded as the estimated QTL position. Under such a procedure, the determination of the threshold values for the test statistics to declare significance has been a central issue in QTL mapping. As the maximums of the score test and LRT statistics are asymptotically equivalent (4, 8, 26) and the computation cost of the score test statistics is much cheaper (4, 8), we propose a general score-statistic approach to computing the threshold values of QTL detection for various experimental populations. These experimental populations include F2, AI and RI populations. In our approach, the score test statistics are formulated in terms of the trait means of marker classes, mixing proportions, and the genotypic distributions of two and three genes of the populations. The asymptotic distribution of the score test statistics along the genomes is characterized by a Gaussian process with mean zero and well-structured variance-covariance matrix. We devise the genotypic distributions of three and four genes into the variance-covariance matrix to take care of the changes in the genomic structure between different populations, so that the threshold values for the different populations can be computed and their behaviors can be explored in various marker densities and genome sizes. The validity of our approach is compared with the LRT statistics by Monte Carlo simulations. In general, the threshold values obtained by our approach are very close to those by the LRT statistics in the various advanced populations as shown in Table 2. Given a significance level and a genome size, the threshold values should be enhanced in denser marker maps and in more advanced populations.
The different advanced populations are subject to different numbers of meiosis cycle either by inbreeding and/or random mating. They will produce different genomic structures, and their genotypic distributions will be different from each other. By recognizing such differences between populations, we incorporate the distributions of two, three and four genes of these populations into the score test statistics and Gaussian processes (equations~(12), (13), (14) and (15)). Therefore, our approach can consider their specific genome structures to well compute their threshold values and investigate their behaviors. More importantly, it has to be pointed out that the genotypic frequencies of three and four genes can be directly obtained by the genotypic frequencies of pairwise genes in the F2 populations, as the F2 genomes have a Markovian structure under Haldane map function (27). However, for advanced populations, this Markovian property disappears, and obtaining the genotypic frequencies of three and four genes is not straightforward. Here, we use the sets of transition equations proposed by Kao and Zeng (20, 24) to obtain these frequencies. If these frequencies are approximated by using Jiang and Zeng's method (28), which implicitly assumes a Markovian property, and the approximate frequencies are used in the construction of the test statistics and Gaussian processes in the computation of the threshold values. We found that the threshold values obtained by using the approximate genotypic frequencies are generally larger, especially in the denser marker maps and in the more advanced populations (results not shown), as compared to those by using exact frequencies.
The proposed score-statistic approach for computing the threshold value is mainly devised for the interval mapping model, i.e. a single QTL model. Under a single QTL model, the score test statistics for the additive and dominance effects in equations (12) and (13) are not mixtures and are functions of the nine trait means. The nine trait means correspond to the nine flanking marker genotypes whose proportions are adjusted to different population structures. In the setup of the variance-covariance matrices of Gaussian processes, the elements are obtained from the covariances between the test statistics (trait means). Therefore, it needs to consider the distributions of three (for neighboring intervals) and four genes (for non-neighboring intervals) at a time, and the dimension of the variance-covariance matrix is usually large. For example, the dimensions of the matrices are 63�63 (9+(10-1)�6 = 63) and 123�123 (9+(20-1)x6 = 123) for a chromosome with 10 and 20 intervals, respectively. If composite interval mapping (CIM; 6) or multiple interval mapping (MIM; 7) is considered, it will need to evaluate many more genotypic frequencies of more genes to construct a much higher-dimensional variance-covariance matrix in the processes. Taking a two-QTL MIM model, yi = μ+a1x1*+ a2x2* +εi, as an example, if searching a chromosome segment with 10 marker intervals for the first QTL (testing H0: a1=0 and a2 ≠ 0) by conditioning on the second QTL in the other region, the score test statistic for a1 can be classified into 81 different categories according to the 81 marker genotypes of the two flanking intervals (92=81), and each category contains mixture components. As the score statistic is a mixture, the derivations of its variance and asymptotic form are not simple. Furthermore, the dimension of the variance-covariance matrix in the Gaussian process will increase to 567x567 (81+(10-1)x54=567), and obtaining the covariance elements needs to evaluate the 2080 genotypic frequencies of six genes for this MIM model. If the second QTL is coincident with a marker, the two-QTL MIM model reduces to a CIM model. The corresponding score test statistic can be classified into 27 categories corresponding to 27 marker genotypes. The variance-covariance matrix of the Gaussian process has a 189x189 (27+(10-1)x18=189) dimension, and obtaining the covariance elements needs to evaluate the 528 genotypic frequencies of five genes for this CIM model. Certainly, if more QTL are considered at a time in the model, the obtaining of the score test statistics and Gaussian processes will be even more complicated as they will involve using the distributions of a large number of genes at a time in the populations. Consequently, the issues in determining the threshold values for the multiple-QTL models remain challenging. The approaches to unraveling these issues will not be in a straightforward manner and are worth pursuing in the future.
11. ACKNOWLEDGEMENS
This work was supported by grants NSC99-2118-M-001-011 from the National Science Council, Taiwan, Republic of China.
12. APPENDIX
Evaluation of the covariances between where If the two intervals are neighboring, the covariances are functions of the genotypic frequencies of two and three genes, and they can be obtained in a similar way. The frequencies of the 10, 36 and 136 genotypes for two, three and four genes in the different advanced populations can be obtained by using the transition equations of Haldane and Waddington (1931) and Kao and Zeng (2009, 2010).
13. REFERENCES
1. Lander E, D Botstein: Mapping Mendelian factors underlying quantitative traits using RELP linkage maps. Genetics 121, 185-199 (1989).
2. Dempster A, N Laird, D Rubin: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39, 1-38 (1977).
3. Lander E, N Schork. Genetic dissection of complex traits: Science 265, 2037-2048 (1994). doi:10.1126/science.8091226
4. Chang M, RL Wu, S Wu, G Casella: Score statistics for mapping quantitative trait loci. Statistical Applications in Genetics and Molecular Biology 8 Article 16 (2009). doi:10.2202/1544-6115.1386
5. Jensen R: Interval mapping of multiple quantitative trait loci. Genetics 135, 205-211 (1113).
6. Zeng ZB: Precision mapping of quantitative trait loci. Genetics 136, 1457-1468 (1994).
7. Kao CH, ZB Zeng, R Teasdale: Multiple interval mapping for Quantitative Trait Loci. Genetics 152, 1203-1216 (1999). doi:10.1017/S0016672399004255
8. Zou F, J Fine, J Hu, D Lin: An Efficient Resampling Method for Assessing Genome-Wide Statistical Significance in Mapping Quantitative Trait Loci Genetics 168, 2307-2316 (2004). doi:10.1534/genetics.104.031427
9. Van Ooijen J: Accuracy of mapping quantitative trait loci in autogamous species. Theor Appl Genet 84, 803-811 (1992). doi:10.1007/BF00227388
10. Churchill G, R Doerge: Empirical threshold values for quantitative trait mapping. Genetics 138, 963-971 (1994).
11. Rebai A, B Goffinet, B Mangin: Approximate thresholds of interval mapping tests for QTL detection. Genetics 138, 235-240 (1994).
12. Davies R. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 64, 247-254 (1977). doi:10.2307/2335690
13. Dupuis J, D Siegmund: Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics 151, 373-386 (1999).
14. Piepho HP: A quick method for computing approximate thresholds for quantitative trait loci detection. Genetics 157, 425-432 (2001). doi:10.1007/s001220100606
15. Schaid D, C Rowland, D Tines, R Jacobson, G Poland: Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 70, 425-434 (2002). doi:10.1086/338688
16. Wang K & J Huang: A score-statistic approach for mapping quantitative trait loci with sibships of arbitrary size. Am J Hum Genet 70, 412-424 (2002). doi:10.1086/338659
17. Bai X, L Luo, W Yan, M Rao Kovi, W Zhan, Y Xing: Genetic dissection of rice grain shape using a recombinant inbred line population derived from two contrasting parents and fine mapping a pleiotropic quantitative trait locus qGL7. BMC Genetics, 11:16 (2010). doi:10.1186/1471-2156-11-16
18. Kelly S, D Nehrenberg, J Peirce, K Hua, B Steffy, T Wiltshire, F Villena, T Garland Jr., Daniel Pomp: Genetic architecture of voluntary exercise in an advanced intercross line of mice. Physiol Genomics, 42:190-200 (2010). doi:10.1152/physiolgenomics.00028.2010
19. Weir B: Genetic Data Analysis II. Sinauer Associates, Inc. Sunderland, Massachusetts (1996).
20. Kao CH, MH Zeng: An Investigation of the Power for Separating Closely Linked QTL in Experimental Populations. Genet Res 92, 283-294 (2010). doi:10.1017/S0016672310000273
21. Darvasi A: Experimental strategies for the genetic dissection of complex traits in animal models. Nat Genet 18, 19 24 (1998). doi:10.1038/ng0198-19
22. Haldane JBS, C Waddington: Inbreeding and linkage. Genetics 16, 357- 374 (1931).
23. Geiringer H: On the probability theory of linkage in Mendelian heredity. The Annals of Mathematical Statistics 15, 25 57 (1944). doi:10.1214/aoms/1177731313
24. Kao CH, MH Zeng: A study on the Mapping of Quantitative Trait Loci in Advanced Populations Derived from Two Inbred Lines. Genet Res 91, 85-99 (2009). doi:10.1017/S0016672309000081
25. Kao CH, ZB Zeng: General formulae for obtaining the maximum likelihood estimates and the asymptotic variance-covariance matrix in QTL mapping when using the EM algorithm. Biometrics 53, 653-665 (1997). doi:10.2307/2533965
26. Cox D, D Hinkley. Theoretical Statistics. Chapman & Hall: London (1974). doi:10.2307/2347138
27. Haldane JBS: The combination of linkage values and the calculation of distances between the loci of linked factors. J Genet 8, 299- 309 (1919).
28. Jiang C, ZB Zeng: Mapping quantitative trait loci with dominant and missing markers in various populations from inbred lines. Genetica 101, 47- 85 (1997). doi:10.1023/A:1018394410659
Abbreviations: QTL: quantitative trait loci, AI: advanced intercrossed, RI: recombinant inbred, MLE: maximum likelihood estimate, LRT: Likelihood ratio test
Key Words Advanced populations, Gaussian process, QTL mapping, Score statistics, Threshold values, Interval mapping, genotypic distribution, Review
Send correspondence to: Chen-Hung Kao, Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, ROC, Tel: 886-2-27835611x418, Fax: 886-2-27831523, E-mail:chkao@stat.sinica.edu.tw
|