Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Dual-trait genomic analysis in highly stratified Arabidopsis thaliana populations using genome-wide association summary statistics

Abstract

Genome-wide association study (GWAS) is a powerful tool to identify genomic loci underlying complex traits. However, the application in natural populations comes with challenges, especially power loss due to population stratification. Here, we introduce a bivariate analysis approach to a GWAS dataset of Arabidopsis thaliana. We demonstrate the efficiency of dual-phenotype analysis to uncover hidden genetic loci masked by population structure via a series of simulations. In real data analysis, a common allele, strongly confounded with population structure, is discovered to be associated with late flowering and slow maturation of the plant. The discovered genetic effect on flowering time is further replicated in independent datasets. Using Mendelian randomization analysis based on summary statistics from our GWAS and expression QTL scans, we predicted and replicated a candidate gene AT1G11560 that potentially causes this association. Further analysis indicates that this locus is co-selected with flowering-time-related genes. The discovered pleiotropic genotype-phenotype map provides new insights into understanding the genetic correlation of complex traits.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Bivariate genome-wide association analysis of two developmental traits.
Fig. 2: Hexbin scatter plot comparing all Z-scores of the two traits across the genome, showing the bivariate statistical significance of the detected locus.
Fig. 3: The discovered locus is highly confounded with population structure.
Fig. 4: Comparisons of single-trait and dual-trait GWAS results of the target SNP under different simulated scenarios using 199 natural Arabidopsis thaliana inbred lines genotype data.
Fig. 5: Performance and computational efficiency comparisons between MultiABEL and GEMMA using simulations.
Fig. 6: Prioritized candidate genes at the detected locus for flowering time using SMR analysis.

Similar content being viewed by others

Data availability

All genotypes and phenotypes data we used in this study are publicly available from Atwell et al. (2010), Kawakatsu et al. (2016), Schmitz et al. (2013), The 1001 Genomes Consortium (2016). Atwell et al.’s dataset, which includes 199 natural Arabidopsis thaliana inbred lines containing 107 phenotypes and corresponding genotypes are publicly available at https://github.com/Gregor-Mendel-Institute/atpolydb/blob/master/miscellaneous_data/phenotype_published_raw.tsvand https://github.com/Gregor-Mendel-Institute/atpolydb/blob/master/250k_snp_data/call_method_75.tar.gz. The genotypes and phenotypes related to flowering time at 10 °C in A. thaliana within the 1001 Genomes project are publicly available at https://1001genomes.org and https://arapheno.1001genomes.org/phenotype/261. The expression data for A. thaliana is publicly available at the NCBI Gene Expression Omnibus (GEO) database under the accession numbers GSE80744 and GSE43858.

Code availability

The multivariate analysis method was implemented in the MultiABEL package. The source code is available on GitHub (https://github.com/xiashen/MultiABEL).

References

  • Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19(9):1655–1664

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y (2010) Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465(7298):627–631

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Aulchenko YS, Ripke S, Isaacs A, van Duijn CM (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics 23(10):1294–1296

    Article  CAS  PubMed  Google Scholar 

  • Baduel P, Leduque B, Ignace A, Gy I, Gil J, Loudet O (2021) Genetic and environmental modulation of transposition shapes the evolutionary potential of Arabidopsis thaliana. Genome Biol 22:138

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Brachi B, Faure N, Horton M, Flahauw E, Vazquez A, Nordborg M (2010) Linkage and association mapping of Arabidopsis thaliana flowering time in nature. PLOS Genet 6(5):e1000940

    Article  PubMed  PubMed Central  Google Scholar 

  • Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh PR (2015) An atlas of genetic correlations across human diseases and traits. Nat Genet 47:1236–1241

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Burgess S, Scott RA, Timpson NJ, Davey Smith G, Thompson SG, EPIC- InterAct Consortium (2015) Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur J Epidemiol 30:543

    Article  PubMed  PubMed Central  Google Scholar 

  • Casellas MJA, Pérez-Martín L, Busoms S, Boesten R, Llugany M, Aarts MGM (2023) A genome-wide association study identifies novel players in Na and Fe homeostasis in Arabidopsis thaliana under alkaline-salinity stress. Plant J 113:225–245

    Article  Google Scholar 

  • Chan EKF, Rowe HC, Kliebenstein DJ (2010) Understanding the evolution of defense metabolites in Arabidopsis thaliana using genome-wide association mapping. Genetics 185:991–1007

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Crawley MJ (2009) Plant Ecology. John Wiley & Sons, Chichester

    Google Scholar 

  • Dittmar EL, Oakley CG, Ågren J, Schemske DW (2014) Flowering time QTL in natural populations of Arabidopsis thaliana and implications for their adaptive value. Mol Ecol 23(17):4291–4303

    Article  PubMed  Google Scholar 

  • El-Soda M, Malosetti M, Zwaan BJ, Koornneef M, Aarts MG (2014) Genotype × environment interaction QTL mapping in plants: lessons from Arabidopsis. Trends Plant Sci 19:390–398

    Article  CAS  PubMed  Google Scholar 

  • Ferrero-Serrano A, Assmann SM (2019) Phenotypic and genome-wide association with the local environment of Arabidopsis. Nat Ecol Evol 3:274–285

    Article  PubMed  Google Scholar 

  • Grotzinger AD, Rhemtulla M, de Vlaming R, Ritchie SJ, Mallard TT, Hill WD (2019) Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat Hum Behav 3:513–525

    Article  PubMed  PubMed Central  Google Scholar 

  • Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6(2):95–108

    Article  CAS  PubMed  Google Scholar 

  • Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y (2010) Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 42(11):961–967

    Article  CAS  PubMed  Google Scholar 

  • Kang M, Wu H, Liu H, Liu W, Zhu M, Han Y (2023) The pan-genome and local adaptation of Arabidopsis thaliana. Nat Commun 14:6259

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kawakatsu T, Huang SSC, Jupe F, Sasaki E, Schmitz RJ, Urich MA (2016) Epigenomic diversity in a global collection of Arabidopsis thaliana accessions. Cell 166(2):492–505

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim J, Zhang Y, Pan W (2016) Powerful and adaptive testing for multi-trait and multi-SNP associations with GWAS and sequencing data. Genetics 203:715–731

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Korte A, Farlow A (2013) The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9:29

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li M, Zhang YW, Xiang Y, Liu MH, Zhang YM (2022a) IIIVmrMLM: The R and C++ tools associated with 3VmrMLM, a comprehensive GWAS method for dissecting quantitative traits. Mol Plant 15:1251–1253

    Article  PubMed  Google Scholar 

  • Li M, Zhang YW, Zhang ZC, Xiang Y, Liu MH, Zhou YH (2022b) A compressed variance component mixed model for detecting QTNs and QTN-by-environment and QTN-by-QTN interactions in genome-wide association studies. Mol Plant 15:630–650

    Article  CAS  PubMed  Google Scholar 

  • Li T, Ning Z, Yang Z, Zhai R, Zheng C, Xu W (2021) Total genetic contribution assessment across the human genome. Nat Commun 12:2845

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li Y, Huang Y, Bergelson J, Nordborg M, Borevitz JO (2010) Association mapping of local climate-sensitive quantitative trait loci in Arabidopsis thaliana. Proc Natl Acad Sci 107:21199–21204

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Liang Z, Qiu Y, Schnable JC (2020) Genome-phenome wide association in maize and Arabidopsis identifies a common molecular and evolutionary signature. Mol Plant 13:907–922

    Article  CAS  PubMed  Google Scholar 

  • Liu X, Tian D, Li C, Tang B, Wang Z, Zhang R (2023) GWAS Atlas: an updated knowledgebase integrating more curated associations in plants and animals. Nucleic Acids Res 51:D969–D976

    Article  CAS  PubMed  Google Scholar 

  • Long Q, Rabanal FA, Meng D, Huber CD, Farlow A, Platzer A (2013) Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden. Nat Genet 45(8):884–890

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ning Z, Pawitan Y, Shen X (2020) High-definition likelihood inference of genetic correlations across human complex traits. Nat Genet 52:859–864

    Article  CAS  PubMed  Google Scholar 

  • Ning Z, Tsepilov YA, Sharapov SZ, Wang Z, Grishenko AK, Feng X (2021) Nontrivial replication of loci detected by multi-trait methods. Front Genet 12:627989

    Article  PubMed  PubMed Central  Google Scholar 

  • Sasaki E, Köcher T, Filiault DL, Nordborg M (2021) Revisiting a GWAS peak in Arabidopsis thaliana reveals possible confounding by genetic heterogeneity. Heredity 127:245–252

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sasaki E, Zhang P, Atwell S, Meng D, Nordborg M (2015) "Missing" G x E variation controls flowering time in Arabidopsis thaliana. PLOS Genet 11:e1005597

    Article  PubMed  PubMed Central  Google Scholar 

  • Schmitz RJ, Schultz MD, Urich MA, Nery JR, Pelizzola M, Libiger O (2013) Patterns of population epigenomic diversity. Nature 495(7440):193–198

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Shen X, De Jonge J, Forsberg SKG, Pettersson ME, Sheng Z, Hennig L (2014) Natural CMT2 variation is associated with genome-wide methylation changes and temperature seasonality. PLOS Genet 10(12):e1004842

    Article  PubMed  PubMed Central  Google Scholar 

  • Shen X, Klarić L, Sharapov S, Mangino M, Ning Z, Wu D (2017) Multivariate discovery and replication of five novel loci associated with Immunoglobulin G N-glycosylation. Nat Commun 8:447

    Article  PubMed  PubMed Central  Google Scholar 

  • Shen X, Pettersson M, Rönnegård L, Carlborg O (2012) Inheritance beyond plain heritability: variance-controlling genes in Arabidopsis thaliana. PLOS Genet 8(8):e1002839

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • The 1001 Genomes Consortium (2016) 1135 Genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166(2):481–491

    Article  Google Scholar 

  • Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ, Fontana MA (2018) Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat Genet 50:229–237

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y, Martin AR (2021) Genome-wide association studies. Nat Rev Methods Prim 1:59

    Article  CAS  Google Scholar 

  • Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA (2017) 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet 101:5–22

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Visscher PM, Yang J (2016) A plethora of pleiotropy across complex traits. Nat Genet 48(7):707–708

    Article  CAS  PubMed  Google Scholar 

  • Wang B, Li Z, Xu W, Feng X, Wan Q, Zan Y (2017) Bivariate genomic analysis identifies a hidden locus associated with bacteria hypersensitive response in Arabidopsis thaliana. Sci Rep 7:45281

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Watanabe K, Stringer S, Frei O, Mirkov MU, de Leeuw C, Polderman TJC (2019) A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet 51:1339–1348

    Article  CAS  PubMed  Google Scholar 

  • Wellenreuther M, Hansson B (2016) Detecting polygenic evolution: problems, pitfalls, and promises. Trends Genet 32(3):155–164

    Article  CAS  PubMed  Google Scholar 

  • Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42(7):565–569

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL (2014) Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46(2):100–106

    Article  PubMed  PubMed Central  Google Scholar 

  • Zan Y, Carlborg O (2019) A polygenic genetic architecture of flowering time in the worldwide Arabidopsis thaliana population. Mol Biol Evol 36:141–154

    Article  CAS  PubMed  Google Scholar 

  • Zan Y, Shen X, Forsberg SKG, Carlborg O (2016) Genetic regulation of transcriptional Variation in natural Arabidopsis thaliana Accessions. G3 Genes Genomes Genet 6(8):2319–2328

    Article  CAS  Google Scholar 

  • Zeng J, Xue A, Jiang L, Lloyd-Jones LR, Wu Y, Wang H (2021) Widespread signatures of natural selection across human complex traits and functional genomic categories. Nat Commun 12:1164

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44:821–824

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods 11:407–409

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhu X, Feng T, Tayo BO, Liang J, Young JH, Franceschini N (2015) Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am J Hum Genet 96(1):21–36

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE (2016) Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet 48(5):481–487

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

XS was in receipt of a National Key Research and Development Program grant (No. 2022YFF1202100 & No. 2022YFF1202105), a National Natural Science Foundation of China (NSFC) grant (No. 12171495), a Natural Science Foundation of Guangdong Province grant (No. 2021A1515010866), and Swedish Research Council (Vetenskapsrådet) grants (No. 2014-00371, No. 2017-02543, No. 2022-01309). International collaboration within this work was partly supported by the Swedish Foundation for International Cooperation in Research and Higher Education (STINT) initiation grant to XS (No. IB2015-6000) and Karolinska Institutet travel grant (No. 2017-00534). The work from XF was supported by the China Postdoctoral Science Foundation (2023M740690). The work from TL was supported by the China Postdoctoral Science Foundation (2023M740696). The funders had no role in study design, data collection and analysis, the decision to publish, or the preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

XS initiated and coordinated the study. XS and YL supervised the study. XF and YZ performed the data analysis. ZN and XS contributed to statistical modeling and interpretation. TL, YY, JL, HC, WX, QW, DZ and ZZ contributed to data processing. XF, YZ and XS wrote the manuscript. All authors approved the final version of the manuscript.

Corresponding authors

Correspondence to Yang Liu or Xia Shen.

Ethics declarations

Competing interests

XS is the founder of Quantix BioSciences. The other authors declare no competing interests.

Research Ethics statement

No approval from research ethics committees is required for this study because it utilizes publicly available data on the plant Arabidopsis thaliana.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Associate editor: Yuan-Ming Zhang.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, X., Zan, Y., Li, T. et al. Dual-trait genomic analysis in highly stratified Arabidopsis thaliana populations using genome-wide association summary statistics. Heredity 133, 11–20 (2024). https://doi.org/10.1038/s41437-024-00688-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41437-024-00688-z

Search

Quick links