Abstract
Genome-wide association study (GWAS) is a powerful tool to identify genomic loci underlying complex traits. However, the application in natural populations comes with challenges, especially power loss due to population stratification. Here, we introduce a bivariate analysis approach to a GWAS dataset of Arabidopsis thaliana. We demonstrate the efficiency of dual-phenotype analysis to uncover hidden genetic loci masked by population structure via a series of simulations. In real data analysis, a common allele, strongly confounded with population structure, is discovered to be associated with late flowering and slow maturation of the plant. The discovered genetic effect on flowering time is further replicated in independent datasets. Using Mendelian randomization analysis based on summary statistics from our GWAS and expression QTL scans, we predicted and replicated a candidate gene AT1G11560 that potentially causes this association. Further analysis indicates that this locus is co-selected with flowering-time-related genes. The discovered pleiotropic genotype-phenotype map provides new insights into understanding the genetic correlation of complex traits.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41437-024-00688-z/MediaObjects/41437_2024_688_Fig1_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41437-024-00688-z/MediaObjects/41437_2024_688_Fig2_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41437-024-00688-z/MediaObjects/41437_2024_688_Fig3_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41437-024-00688-z/MediaObjects/41437_2024_688_Fig4_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41437-024-00688-z/MediaObjects/41437_2024_688_Fig5_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41437-024-00688-z/MediaObjects/41437_2024_688_Fig6_HTML.png)
Similar content being viewed by others
![](https://cdn.statically.io/img/media.springernature.com/w215h120/springer-static/image/art%3A10.1038%2Fs41437-021-00456-3/MediaObjects/41437_2021_456_Fig1_HTML.png)
Data availability
All genotypes and phenotypes data we used in this study are publicly available from Atwell et al. (2010), Kawakatsu et al. (2016), Schmitz et al. (2013), The 1001 Genomes Consortium (2016). Atwell et al.’s dataset, which includes 199 natural Arabidopsis thaliana inbred lines containing 107 phenotypes and corresponding genotypes are publicly available at https://github.com/Gregor-Mendel-Institute/atpolydb/blob/master/miscellaneous_data/phenotype_published_raw.tsvand https://github.com/Gregor-Mendel-Institute/atpolydb/blob/master/250k_snp_data/call_method_75.tar.gz. The genotypes and phenotypes related to flowering time at 10 °C in A. thaliana within the 1001 Genomes project are publicly available at https://1001genomes.org and https://arapheno.1001genomes.org/phenotype/261. The expression data for A. thaliana is publicly available at the NCBI Gene Expression Omnibus (GEO) database under the accession numbers GSE80744 and GSE43858.
Code availability
The multivariate analysis method was implemented in the MultiABEL package. The source code is available on GitHub (https://github.com/xiashen/MultiABEL).
References
Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19(9):1655–1664
Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y (2010) Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465(7298):627–631
Aulchenko YS, Ripke S, Isaacs A, van Duijn CM (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics 23(10):1294–1296
Baduel P, Leduque B, Ignace A, Gy I, Gil J, Loudet O (2021) Genetic and environmental modulation of transposition shapes the evolutionary potential of Arabidopsis thaliana. Genome Biol 22:138
Brachi B, Faure N, Horton M, Flahauw E, Vazquez A, Nordborg M (2010) Linkage and association mapping of Arabidopsis thaliana flowering time in nature. PLOS Genet 6(5):e1000940
Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh PR (2015) An atlas of genetic correlations across human diseases and traits. Nat Genet 47:1236–1241
Burgess S, Scott RA, Timpson NJ, Davey Smith G, Thompson SG, EPIC- InterAct Consortium (2015) Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur J Epidemiol 30:543
Casellas MJA, Pérez-Martín L, Busoms S, Boesten R, Llugany M, Aarts MGM (2023) A genome-wide association study identifies novel players in Na and Fe homeostasis in Arabidopsis thaliana under alkaline-salinity stress. Plant J 113:225–245
Chan EKF, Rowe HC, Kliebenstein DJ (2010) Understanding the evolution of defense metabolites in Arabidopsis thaliana using genome-wide association mapping. Genetics 185:991–1007
Crawley MJ (2009) Plant Ecology. John Wiley & Sons, Chichester
Dittmar EL, Oakley CG, Ågren J, Schemske DW (2014) Flowering time QTL in natural populations of Arabidopsis thaliana and implications for their adaptive value. Mol Ecol 23(17):4291–4303
El-Soda M, Malosetti M, Zwaan BJ, Koornneef M, Aarts MG (2014) Genotype × environment interaction QTL mapping in plants: lessons from Arabidopsis. Trends Plant Sci 19:390–398
Ferrero-Serrano A, Assmann SM (2019) Phenotypic and genome-wide association with the local environment of Arabidopsis. Nat Ecol Evol 3:274–285
Grotzinger AD, Rhemtulla M, de Vlaming R, Ritchie SJ, Mallard TT, Hill WD (2019) Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat Hum Behav 3:513–525
Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6(2):95–108
Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y (2010) Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 42(11):961–967
Kang M, Wu H, Liu H, Liu W, Zhu M, Han Y (2023) The pan-genome and local adaptation of Arabidopsis thaliana. Nat Commun 14:6259
Kawakatsu T, Huang SSC, Jupe F, Sasaki E, Schmitz RJ, Urich MA (2016) Epigenomic diversity in a global collection of Arabidopsis thaliana accessions. Cell 166(2):492–505
Kim J, Zhang Y, Pan W (2016) Powerful and adaptive testing for multi-trait and multi-SNP associations with GWAS and sequencing data. Genetics 203:715–731
Korte A, Farlow A (2013) The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9:29
Li M, Zhang YW, Xiang Y, Liu MH, Zhang YM (2022a) IIIVmrMLM: The R and C++ tools associated with 3VmrMLM, a comprehensive GWAS method for dissecting quantitative traits. Mol Plant 15:1251–1253
Li M, Zhang YW, Zhang ZC, Xiang Y, Liu MH, Zhou YH (2022b) A compressed variance component mixed model for detecting QTNs and QTN-by-environment and QTN-by-QTN interactions in genome-wide association studies. Mol Plant 15:630–650
Li T, Ning Z, Yang Z, Zhai R, Zheng C, Xu W (2021) Total genetic contribution assessment across the human genome. Nat Commun 12:2845
Li Y, Huang Y, Bergelson J, Nordborg M, Borevitz JO (2010) Association mapping of local climate-sensitive quantitative trait loci in Arabidopsis thaliana. Proc Natl Acad Sci 107:21199–21204
Liang Z, Qiu Y, Schnable JC (2020) Genome-phenome wide association in maize and Arabidopsis identifies a common molecular and evolutionary signature. Mol Plant 13:907–922
Liu X, Tian D, Li C, Tang B, Wang Z, Zhang R (2023) GWAS Atlas: an updated knowledgebase integrating more curated associations in plants and animals. Nucleic Acids Res 51:D969–D976
Long Q, Rabanal FA, Meng D, Huber CD, Farlow A, Platzer A (2013) Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden. Nat Genet 45(8):884–890
Ning Z, Pawitan Y, Shen X (2020) High-definition likelihood inference of genetic correlations across human complex traits. Nat Genet 52:859–864
Ning Z, Tsepilov YA, Sharapov SZ, Wang Z, Grishenko AK, Feng X (2021) Nontrivial replication of loci detected by multi-trait methods. Front Genet 12:627989
Sasaki E, Köcher T, Filiault DL, Nordborg M (2021) Revisiting a GWAS peak in Arabidopsis thaliana reveals possible confounding by genetic heterogeneity. Heredity 127:245–252
Sasaki E, Zhang P, Atwell S, Meng D, Nordborg M (2015) "Missing" G x E variation controls flowering time in Arabidopsis thaliana. PLOS Genet 11:e1005597
Schmitz RJ, Schultz MD, Urich MA, Nery JR, Pelizzola M, Libiger O (2013) Patterns of population epigenomic diversity. Nature 495(7440):193–198
Shen X, De Jonge J, Forsberg SKG, Pettersson ME, Sheng Z, Hennig L (2014) Natural CMT2 variation is associated with genome-wide methylation changes and temperature seasonality. PLOS Genet 10(12):e1004842
Shen X, Klarić L, Sharapov S, Mangino M, Ning Z, Wu D (2017) Multivariate discovery and replication of five novel loci associated with Immunoglobulin G N-glycosylation. Nat Commun 8:447
Shen X, Pettersson M, Rönnegård L, Carlborg O (2012) Inheritance beyond plain heritability: variance-controlling genes in Arabidopsis thaliana. PLOS Genet 8(8):e1002839
The 1001 Genomes Consortium (2016) 1135 Genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166(2):481–491
Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ, Fontana MA (2018) Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat Genet 50:229–237
Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y, Martin AR (2021) Genome-wide association studies. Nat Rev Methods Prim 1:59
Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA (2017) 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet 101:5–22
Visscher PM, Yang J (2016) A plethora of pleiotropy across complex traits. Nat Genet 48(7):707–708
Wang B, Li Z, Xu W, Feng X, Wan Q, Zan Y (2017) Bivariate genomic analysis identifies a hidden locus associated with bacteria hypersensitive response in Arabidopsis thaliana. Sci Rep 7:45281
Watanabe K, Stringer S, Frei O, Mirkov MU, de Leeuw C, Polderman TJC (2019) A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet 51:1339–1348
Wellenreuther M, Hansson B (2016) Detecting polygenic evolution: problems, pitfalls, and promises. Trends Genet 32(3):155–164
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42(7):565–569
Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL (2014) Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46(2):100–106
Zan Y, Carlborg O (2019) A polygenic genetic architecture of flowering time in the worldwide Arabidopsis thaliana population. Mol Biol Evol 36:141–154
Zan Y, Shen X, Forsberg SKG, Carlborg O (2016) Genetic regulation of transcriptional Variation in natural Arabidopsis thaliana Accessions. G3 Genes Genomes Genet 6(8):2319–2328
Zeng J, Xue A, Jiang L, Lloyd-Jones LR, Wu Y, Wang H (2021) Widespread signatures of natural selection across human complex traits and functional genomic categories. Nat Commun 12:1164
Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44:821–824
Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods 11:407–409
Zhu X, Feng T, Tayo BO, Liang J, Young JH, Franceschini N (2015) Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am J Hum Genet 96(1):21–36
Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE (2016) Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet 48(5):481–487
Acknowledgements
XS was in receipt of a National Key Research and Development Program grant (No. 2022YFF1202100 & No. 2022YFF1202105), a National Natural Science Foundation of China (NSFC) grant (No. 12171495), a Natural Science Foundation of Guangdong Province grant (No. 2021A1515010866), and Swedish Research Council (Vetenskapsrådet) grants (No. 2014-00371, No. 2017-02543, No. 2022-01309). International collaboration within this work was partly supported by the Swedish Foundation for International Cooperation in Research and Higher Education (STINT) initiation grant to XS (No. IB2015-6000) and Karolinska Institutet travel grant (No. 2017-00534). The work from XF was supported by the China Postdoctoral Science Foundation (2023M740690). The work from TL was supported by the China Postdoctoral Science Foundation (2023M740696). The funders had no role in study design, data collection and analysis, the decision to publish, or the preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
XS initiated and coordinated the study. XS and YL supervised the study. XF and YZ performed the data analysis. ZN and XS contributed to statistical modeling and interpretation. TL, YY, JL, HC, WX, QW, DZ and ZZ contributed to data processing. XF, YZ and XS wrote the manuscript. All authors approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
XS is the founder of Quantix BioSciences. The other authors declare no competing interests.
Research Ethics statement
No approval from research ethics committees is required for this study because it utilizes publicly available data on the plant Arabidopsis thaliana.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Associate editor: Yuan-Ming Zhang.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Feng, X., Zan, Y., Li, T. et al. Dual-trait genomic analysis in highly stratified Arabidopsis thaliana populations using genome-wide association summary statistics. Heredity 133, 11–20 (2024). https://doi.org/10.1038/s41437-024-00688-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41437-024-00688-z