Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

LINE-1 transcription activates long-range gene expression

An Author Correction to this article was published on 09 July 2024

This article has been updated

Abstract

Long interspersed nuclear element-1 (LINE-1 or L1) is a retrotransposon group that constitutes 17% of the human genome and shows variable expression across cell types. However, the control of L1 expression and its function in gene regulation are incompletely understood. Here we show that L1 transcription activates long-range gene expression. Genome-wide CRISPR–Cas9 screening using a reporter driven by the L1 5′ UTR in human cells identifies functionally diverse genes affecting L1 expression. Unexpectedly, altering L1 expression by knockout of regulatory genes impacts distant gene expression. L1s can physically contact their distal target genes, with these interactions becoming stronger upon L1 activation and weaker when L1 is silenced. Remarkably, L1s contact and activate genes essential for zygotic genome activation (ZGA), and L1 knockdown impairs ZGA, leading to developmental arrest in mouse embryos. These results characterize the regulation and function of L1 in long-range gene activation and reveal its importance in mammalian ZGA.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Genome-wide screen for genes that control L1 expression in K562 cells.
Fig. 2: L1 transcription is associated with long-range gene expression.
Fig. 3: Systematic perturbation of L1s affects long-range gene expression.
Fig. 4: L1 co-activates with the L1-cognate genes in human embryogenesis.
Fig. 5: L1 transcription activates mouse ZGA gene expression.
Fig. 6: L1 transcription boosts ZGA in mouse early embryos.

Similar content being viewed by others

Data availability

All sequencing samples reported have been deposited at the Genome Sequence Archive (https://ngdc.cncb.ac.cn/gsa-human/) under the accession HRA003719. ENCSR384BDV, ENCSR423ERM, ENCSR608IXR, ENCSR563YIS, ENCSR844QNT, ENCSR129ROE, ENCSR858MPS, ENCSR895FDL, ENCSR064KUD, ENCSR547SBZ, ENCSR983SZZ, ENCSR135NXN and ENCSR201NQZ were downloaded from ENCODE (https://www.encodeproject.org/). GSE213909, GSE143546, GSE36552, GSE101571, GSE86938, GSE45719, GSE64332, GSE151227, GSE126242, GSE157262, GSE145309, GSE98063, GSE100939, GSE144400 and GSE82185 were downloaded from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo). HRA002355 was downloaded from Genome Sequence Archive. The accession numbers are also listed in Supplementary Table 9. Human reference genome (hg38) and mouse reference (mm10) were downloaded from GENCODE (https://www.gencodegenes.org). Macaque reference (rheMac8) was downloaded from University of California, Santa Cruz (https://genome.ucsc.edu). Additional experimental materials used in this study, including plasmids and engineered cell lines, are available upon request. Source data are provided with this paper.

Code availability

All the data were analyzed using published pipelines with parameters described in Methods. The code used for figure generation is available on GitHub (https://github.com/AssumeAssume/Li_et_al_2024) and Zenodo (https://doi.org/10.5281/zenodo.11113925)78.

Change history

References

  1. Beck, C. R., Garcia-Perez, J. L., Badge, R. M. & Moran, J. V. LINE-1 elements in structural variation and disease. Annu. Rev. Genom. Hum. Genet. 12, 187–215 (2011).

    Article  CAS  Google Scholar 

  2. Ivancevic, A. M., Kortschak, R. D., Bertozzi, T. & Adelson, D. L. LINEs between species: evolutionary dynamics of LINE-1 retrotransposons across the eukaryotic tree of life. Genome Biol. Evol. 8, 3301–3322 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Philippe, C. et al. Activation of individual L1 retrotransposon instances is restricted to cell-type dependent permissive loci. eLife 5, e13926 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Faulkner, G. J. et al. The regulated retrotransposon transcriptome of mammalian cells. Nat. Genet. 41, 563–571 (2009).

    Article  CAS  PubMed  Google Scholar 

  5. Belancio, V. P., Roy-Engel, A. M., Pochampally, R. R. & Deininger, P. Somatic expression of LINE-1 elements in human tissues. Nucleic Acids Res. 38, 3909–3922 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Stow, E. C. et al. Organ-, sex- and age-dependent patterns of endogenous L1 mRNA expression at a single locus resolution. Nucleic Acids Res. 49, 5813–5831 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kaul, T., Morales, M. E., Sartor, A. O., Belancio, V. P. & Deininger, P. Comparative analysis on the expression of L1 loci using various RNA-seq preparations. Mob. DNA 11, 2 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Jachowicz, J. W. et al. LINE-1 activation after fertilization regulates global chromatin accessibility in the early mouse embryo. Nat. Genet. 49, 1502–1510 (2017).

    Article  CAS  PubMed  Google Scholar 

  9. Percharde, M. et al. A LINE1-nucleolin partnership regulates early development and ESC identity. Cell 174, 391–405 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. De Cecco, M. et al. L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature 566, 73–78 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Thomas, C. A., Paquola, A. C. M. & Muotri, A. R. LINE-1 retrotransposition in the nervous system. Annu. Rev. Cell Dev. Biol. 28, 555–573 (2012).

    Article  CAS  PubMed  Google Scholar 

  12. Payer, L. M. & Burns, K. H. Transposable elements in human genetic disease. Nat. Rev. Genet. 20, 760–772 (2019).

    Article  CAS  PubMed  Google Scholar 

  13. Lu, J. Y. et al. Homotypic clustering of L1 and B1/Alu repeats compartmentalizes the 3D genome. Cell Res. 31, 613–630 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Han, J. S., Szak, S. T. & Boeke, J. D. Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature 429, 268–274 (2004).

    Article  CAS  PubMed  Google Scholar 

  15. Attig, J. et al. Heteromeric RNP assembly at LINEs controls lineage-specific RNA processing. Cell 174, 1067–1081 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Liu, N. et al. Selective silencing of euchromatic L1s revealed by genome-wide screens for L1 regulators. Nature 553, 228–232 (2018).

    Article  CAS  PubMed  Google Scholar 

  17. Fueyo, R., Judd, J., Feschotte, C. & Wysocka, J. Roles of transposable elements in the regulation of mammalian transcription. Nat. Rev. Mol. Cell Biol. 23, 481–497 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Moran, J. V. et al. High frequency retrotransposition in cultured mammalian cells. Cell 87, 917–927 (1996).

    Article  CAS  PubMed  Google Scholar 

  19. Hong, Y. et al. SAFB restricts contact domain boundaries associated with L1 chimeric transcription Mol. Cell 84, 1637–1650.e10 (2024).

  20. Mohanta, A. & Chakrabarti, K. Dbr1 functions in mRNA processing, intron turnover and human diseases. Biochimie 180, 134–142 (2021).

    Article  CAS  PubMed  Google Scholar 

  21. Szczepińska, T. et al. DIS3 shapes the RNA polymerase II transcriptome in humans by degrading a variety of unwanted transcripts. Genome Res. 25, 1622–1633 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Calo, E. & Wysocka, J. Modification of enhancer chromatin: what, how, and why? Mol. Cell 49, 825–837 (2013).

    Article  CAS  PubMed  Google Scholar 

  23. Field, A. & Adelman, K. Evaluating enhancer function and transcription. Annu. Rev. Biochem. 89, 213–234 (2020).

    Article  CAS  PubMed  Google Scholar 

  24. Arnold, C. D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).

    Article  CAS  PubMed  Google Scholar 

  25. Li, W., Notani, D. & Rosenfeld, M. G. Enhancers as non-coding RNA transcription units: recent insights and future perspectives. Nat. Rev. Genet. 17, 207–223 (2016).

    Article  CAS  PubMed  Google Scholar 

  26. Schoenfelder, S. & Fraser, P. Long-range enhancer–promoter contacts in gene expression control. Nat. Rev. Genet. 20, 437–455 (2019).

    Article  CAS  PubMed  Google Scholar 

  27. Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Pontis, J. et al. Hominoid-specific transposable elements and KZFPs facilitate human embryonic genome activation and control transcription in naive human ESCs. Cell Stem Cell 24, 724–735 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Fuentes, D. R., Swigut, T. & Wysocka, J. Systematic perturbation of retroviral LTRs reveals widespread long-range effects on human gene regulation. eLife 7, e35989 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Todd, C. D., Deniz, Ö., Taylor, D. & Branco, M. R. Functional evaluation of transposable elements as enhancers in mouse embryonic and trophoblast stem cells. eLife 8, e44344 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Deniz, Ö. et al. Endogenous retroviruses are a source of enhancers with oncogenic potential in acute myeloid leukaemia. Nat. Commun. 11, 3506 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Boxer, L. D., Barajas, B., Tao, S., Zhang, J. & Khavari, P. A. ZNF750 interacts with KLF4 and RCOR1, KDM1A, and CTBP1/2 chromatin regulators to repress epidermal progenitor genes and induce differentiation genes. Genes Dev. 28, 2013–2026 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Chinnadurai, G. CtBP, an unconventional transcriptional corepressor in development and oncogenesis. Mol. Cell 9, 213–224 (2002).

    Article  CAS  PubMed  Google Scholar 

  34. Shi, Y. et al. Coordinated histone modifications mediated by a CtBP co-repressor complex. Nature 422, 735–738 (2003).

    Article  CAS  PubMed  Google Scholar 

  35. Son, H. et al. Neuritin produces antidepressant actions and blocks the neuronal and behavioral deficits caused by chronic stress. Proc. Natl Acad. Sci. USA 109, 11378–11383 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Lee, J.-E. et al. H3K4 mono- and di-methyltransferase MLL4 is required for enhancer activation during cell differentiation. eLife 2, e01503 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Liu, N. & Pan, T. N6-methyladenosine-encoded epitranscriptomics. Nat. Struct. Mol. Biol. 23, 98–102 (2016).

    Article  CAS  PubMed  Google Scholar 

  38. Zhang, S. et al. DeepLoop robustly maps chromatin interactions from sparse allele-resolved or single-cell Hi-C data at kilobase resolution. Nat. Genet. 54, 1013–1025 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Busslinger, G. A. et al. Cohesin is positioned in mammalian genomes by transcription, CTCF and Wapl. Nature 544, 503–507 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Henn, A. et al. SIX2 gene haploinsufficiency leads to a recognizable phenotype with ptosis, frontonasal dysplasia, and conductive hearing loss. Clin. Dysmorphol. 27, 27–30 (2018).

  41. Liu, Z. et al. A NIK-SIX signalling axis controls inflammation by targeted silencing of non-canonical NF-κB. Nature 568, 249–253 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Rodriguez-Terrones, D. & Torres-Padilla, M.-E. Nimble and ready to mingle: transposon outbursts of early development. Trends Genet. 34, 806–820 (2018).

    Article  CAS  PubMed  Google Scholar 

  43. Macfarlan, T. S. et al. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487, 57–63 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Abe, K. et al. Minor zygotic gene activation is essential for mouse preimplantation development. Proc. Natl Acad. Sci. USA 115, E6780–E6788 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Liu, B. et al. The landscape of RNA Pol II binding reveals a stepwise transition during ZGA. Nature 587, 139–144 (2020).

    Article  CAS  PubMed  Google Scholar 

  46. Lee, M. T. et al. Nanog, Pou5f1 and SoxB1 activate zygotic gene expression during the maternal-to-zygotic transition. Nature 503, 360–364 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Qiu, J. J. et al. Delay of ZGA initiation occurred in 2-cell blocked mouse embryos. Cell Res. 13, 179–185 (2003).

    Article  CAS  PubMed  Google Scholar 

  48. Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014).

    Article  CAS  PubMed  Google Scholar 

  49. Du, Z. et al. Allelic reprogramming of 3D chromatin architecture during early mammalian development. Nature 547, 232–235 (2017).

    Article  CAS  PubMed  Google Scholar 

  50. Zhang, K. et al. Analysis of genome architecture during SCNT reveals a role of cohesin in impeding minor ZGA. Mol. Cell 79, 234–250 (2020).

    Article  CAS  PubMed  Google Scholar 

  51. Le, R. et al. Dcaf11 activates Zscan4-mediated alternative telomere lengthening in early embryos and embryonic stem cells. Cell Stem Cell 28, 732–747 (2021).

    Article  CAS  PubMed  Google Scholar 

  52. Yan, Y.-L. et al. DPPA2/4 and SUMO E3 ligase PIAS4 opposingly regulate zygotic transcriptional program. PLoS Biol. 17, e3000324 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Falco, G. et al. Zscan4: a novel gene expressed exclusively in late 2-cell embryos and embryonic stem cells. Dev. Biol. 307, 539–550 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Pal, D. et al. H4K16ac activates the transcription of transposable elements and contributes to their cis-regulatory function. Nat. Struct. Mol. Biol. 30, 935–947 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Meng, S. et al. Young LINE-1 transposon 5′ UTRs marked by elongation factor ELL3 function as enhancers to regulate naïve pluripotency in embryonic stem cells. Nat. Cell Biol. 25, 1319–1331 (2023).

    Article  CAS  PubMed  Google Scholar 

  56. Buttler, C. A., Ramirez, D., Dowell, R. D. & Chuong, E. B. An intronic LINE-1 regulates IFNAR1 expression in human immune cells. Mobile DNA 14, 20 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Andersson, R. & Sandelin, A. Determinants of enhancer and promoter activities of regulatory elements. Nat. Rev. Genet. 21, 71–87 (2020).

    Article  CAS  PubMed  Google Scholar 

  58. Li, G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Heinz, S., Romanoski, C. E., Benner, C. & Glass, C. K. The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 16, 144–154 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Beraldi, R., Pittoggi, C., Sciamanna, I., Mattei, E. & Spadafora, C. Expression of LINE-1 retroposons is essential for murine preimplantation development. Mol. Reprod. Dev. 73, 279–287 (2006).

    Article  CAS  PubMed  Google Scholar 

  61. Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Li, W. et al. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biol. 16, 281 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).

    Article  CAS  PubMed  Google Scholar 

  67. Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Korotkevich, G. et al. Fast gene set enrichment analysis. Preprint at bioRxiv https://doi.org/10.1101/060012 (2021).

  70. Karimzadeh, M., Ernst, C., Kundaje, A. & Hoffman, M. M. Umap and Bismap: quantifying genome and methylome mappability. Nucleic Acids Res. 46, e120 (2018).

    PubMed  PubMed Central  Google Scholar 

  71. Kent, W. J., Zweig, A. S., Barber, G., Hinrichs, A. S. & Karolchik, D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Zou, Z., Ohta, T., Miura, F. & Oki, S. ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP–seq, ATAC–seq and bisulfite-seq data. Nucleic Acids Res. 50, W175–W182 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Peng, T. et al. STARR-seq identifies active, chromatin-masked, and dormant enhancers in pluripotent mouse embryonic stem cells. Genome Biol. 21, 243 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Krijger, P. H. L., Geeven, G., Bianchi, V., Hilvering, C. R. E. & de Laat, W. 4C-seq from beginning to end: a detailed protocol for sample preparation and data analysis. Methods 170, 17–32 (2020).

    Article  CAS  PubMed  Google Scholar 

  77. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Li, X. LINE-1 transcription activates long-range gene expression. Zenodo https://doi.org/10.5281/zenodo.11113925 (2024).

  79. De Melo Costa, V. R., Pfeuffer, J., Louloupi, A., Ørom, U. A. V. & Piro, R. M. SPLICE-q: a Python tool for genome-wide quantification of splicing efficiency. BMC Bioinformatics 22, 368 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  80. DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Taggart, A. J. et al. Large-scale analysis of branchpoint usage across species and cell lines. Genome Res. 27, 639–649 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Wang, G. et al. CRISPR-GEMM pooled mutagenic screening identifies KMT2D as a major modulator of immune checkpoint blockade. Cancer Discov. 10, 1912–1933 (2020).

  83. Xu, W. et al. METTL3 regulates heterochromatin in mouse embryonic stem cells. Nature 591, 317–321 (2021).

  84. Chen, C. et al. Nuclear m6A reader YTHDC1 regulates the scaffold function of LINE1 RNA in mouse ESCs and early embryos. Protein Cell 12, 455–474 (2021).

  85. Chelmicki, T. et al. m6A RNA methylation regulates the fate of endogenous retroviruses. Nature 591, 312–316 (2021).

  86. Park, S.-J., Shirahige, K., Ohsugi, M. & Nakai, K. DBTMEE: a database of transcriptome in mouse early embryos. Nucleic Acids Res. 43, D771–D776 (2015).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We thank Z. Chen, J. Yan and all members of the Liu Laboratory for their comments and discussions. We thank GenePlus for the next-generation sequencing support. This work was supported by the National Key Research and Development Program (grant 2022YFA1302700 to N.L.) and the National Natural Science Foundation of China (grants 32350011, 32270593 and 32070631 to N.L. and 32300448 to L.B.). N.L. acknowledges support from the Benyuan Fund—Young Investigator Exploration Grant in Life Sciences, Tsinghua University Initiative Scientific Research Program, Beijing Frontier Research Center for Biological Structure and Tsinghua-Peking Center for Life Sciences.

Author information

Authors and Affiliations

Authors

Contributions

X.L., L.B., Y.W., Y.H., Z.Z., Y.F., X.Y., Y.T., C.H., Y.Z., X. Sun, J.X.H.L., J.Z., Z.C., Q.X., A.M., X. Shen, W.X. and N.L. designed and performed experiments and analyzed data. N.L., X.L., L.B., Y.W. and Y.H. wrote the paper with input from other authors. N.L. directed and supervised the study.

Corresponding author

Correspondence to Nian Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Genome-wide CRISPR–Cas9 screening for genes that control L1 expression in K562 cells.

a, Heatmap showing the relative expression of individual full-length L1s across three indicated human cell lines. Z-score transformed transcripts per million (TPM) values were used to represent L1 expression. b, L1 RNA quantification in K562 using the strand-specific polyA+ RNA-seq. Data are representative of two independent experiments. The center value is the mean. P, two-tailed t test. n.s., non-significant. c, Fluorescence-activated cell sorting results show that over 99% of the L1-GFP K562 cell clones express blue fluorescent protein (BFP)-Cas9 and L1-GFP. d, Chromosome maps showing the genomic distribution of 51 L1-GFP reporter insertions in the K562 screening cell line, measured by long-read sequencing. The individual genomic locations are listed in Supplementary Table 1. e, Normalized L1-GFP expression (left) and line plots showing GFP intensity (right) in control and mutant K562 as indicated. The center value is the mean. n = 2 biological replicates for MPP8, n = 3 biological replicates for SAFB and control. Knockout of known L1 suppressors (SAFB, MPP8) increased L1 expression. f, Correlation of the sgRNA counts of each sample in K562 genome-wide screens, relative to Fig. 1a. g, Enrichment values of the top 80 L1 suppressors (blue) and the top 80 activators (red) were calculated from two independent K562 genome-wide screens with 4 independent sgRNAs per gene. Diamond represents the mean value; whisker represents the fold range for sgRNAs. Previously characterized L1 regulators are marked with stars. h, Cytoscape for visualizing protein–protein interactions for L1 suppressors (blue) and L1 activators (red). i,j, Venn diagram showing the number of candidate L1 activators (i) and suppressors (j) identified in this screen against L1 expression (blue), and that identified from one previous screen against L1 retrotransposition16 (yellow) in K562. k,l, Gene ontology (biological process) enrichment analysis of the unique activators (k) and unique suppressors (l) identified in this screen against L1 expression, and that identified from one previous screen against L1 retrotransposition in K562 (ref. 16). P, one-tailed hypergeometric test.

Extended Data Fig. 2 Intricate control of L1 expression in K562.

a, Bar plots showing the percentage of unique-mapped and multiple-mapped reads in indicated full-length L1PA subfamilies. b, Box plot showing the percentage of unique-mapped reads in individual full-length L1PAs (left), and the numbers of all elements and those that surpass the 5-read-count cutoff within each L1 subfamily (right). Center lines represent the median value, box limits represent the 25th and 75th percentiles and whiskers denote minima and maxima (1.5× the interquartile range). n = 2 biological replicates. c, Representative genome browser snapshots of the RNA-seq reads alignment over L1s. The color scale indicates the mapping quality score MAPQ for each read pair. MAPQ = −10 log10(p), where p is the probability that true alignment belongs elsewhere. d, Heatmaps showing expression change of significantly changed L1PAs using unique mapping (top) or multiple mapping (bottom) in indicated mutant K562. n = 2 biological replicates per gene. P < 0.05, DESeq2 analysis. e, Mean splicing efficiency of the RNA-seq samples, calculated using SPLICE-q79. f, Intronic read fraction of the RNA-seq samples, calculated using RNA-SeQC80. Together with e, these results indicated the high quality of our RNA-seq library. g, Bubble plots showing expression change of L1, LTR and SINE subfamilies in indicated mutant K562. Multi-mapped reads were used to quantify transposons at the subfamily level. Colored circles represent the log2 fold change, with areas proportional to P value (DESeq2 analysis). LTR, long terminal repeat; SINE, short interspersed nuclear element. n = 2 or 3 biological replicates per gene. h, Heatmap showing expression change at individual L1 locus in indicated mutant K562 (labeled and aligned same as in d). Hierarchical clustering was performed by ‘ward.D2’ method. RNA-seq reads were unique-mapped to individual L1 locus. Only significantly changed full-length L1s in at least one KO experiment were included (n = 222). P < 0.05, DESeq2 analysis. i,j, RNA-seq genome browser snapshots at six L1 loci from control and indicated mutant K562, illustrating that candidate L1 suppressors (i) and activators (j) selectively and cooperatively control L1 expression. The experiment was repeated once with similar results.

Extended Data Fig. 3 L1 restriction by DBR1 and DIS3.

a, Western blotting showing the level of DBR1 and endogenous L1 ORF1p in the K562 cells infected with two independent Cas9/sgRNAs targeting DBR1 and two independent non-target control Cas9/sgRNAs, independently repeated once with similar results. HSP90 is loaded as control. b, Donut chart showing that DBR1-regulated L1s (P < 0.05, DESeq2 analysis) are predominantly localized in introns in K562. c, Size distribution of DBR1-regulated L1s (P < 0.05, DESeq2 analysis) and all genomic L1s (based on hg38 RepeatMasker annotation) in K562. P value, two-tailed Kolmogorov–Smirnov test. d, The read counts of total lariats and L1-containing lariats, obtained from the ribosome-depleted RNA-seq data by a lariat read-count analysis strategy81 and normalized against unmapped singlets in control and DBR1 knockout K562 cells. n = 2 biological replicates. The center value is the mean. e, A working model that DBR1 restricts intronic L1 expression by debranching lariats into linear RNAs that are subsequently degraded. In DBR1 mutant cells, the level of L1-containing lariats is elevated, and the L1 within the lariat is proposed to be translated into L1 proteins. f, Donut chart showing genomic distribution of the DIS3-restricted L1s (P < 0.05, DESeq2 analysis) in K562 cells. g, Density plot showing size distribution of the DIS3-regulated L1s (P value < 0.05, DESeq2 analysis) and all genomic L1s (based on hg38 repeatmasker annotation) in K562 cells. P value, two-tailed Kolmogorov–Smirnov test. h, Schematic of DIS3 domains and the two point mutations to inactivate the PIN and RNB domains, respectively21. i, RNA-seq showing the relative expression level of full-length L1s in DIS3 knockdown HEK293 cells, rescued with the intact DIS3 or DIS3 mutants with inactive PIN or RNB domains, as indicated. RNA-seq reads were unique-mapped to individual L1 locus. As indicated, the PIN domain with endonucleolytic activity accounts for the L1 restriction. j, Aggregated line plot showing average enrichment of the DIS3 PAR-CLIP signal over full-length L1s. Genome-wide mappability displays genome-wide 100-bp-short-read alignability (data from UCSC). The results indicate that DIS3 directly binds L1 RNAs.

Source data

Extended Data Fig. 4 Pervasive L1s harbor enhancer-like features.

a,b, Dot plot/box plot of STARR-seq read counts (left) and enrichment (right) at the L1 subfamily (a) and L1 individual (b) level using K562 STARR-seq datasets (2 PE100 and 1 PE150). The STARR-seq reads were unique-mapped to individual L1 locus. Center lines represent the median value, box limits represent the 25th and 75th percentiles and whiskers denote minima and maxima (1.5× the interquartile range). n = 2 biological replicates. CPM, read counts per million mapped reads. c, Scatter plot showing counts of STARR-seq peaks and −log10(P value) for all repeats at the subfamily level in indicated cell lines. Dots, repeat subfamilies; large dots, L1 subfamilies; color, repeat classes. P, hypergeometric test. d, Aggregated line plots showing STARR-seq signals over full-length L1s at the subfamily level in indicated cell lines. e, Heatmaps of STARR-seq signals over full-length L1s in indicated cell lines, sorted and aligned as in Fig. 2d. f, Heatmap showing the proportion of full-length and non-full-length L1s that exhibit STARR-seq signals in six indicated human cell lines. The L1 subfamilies are classified into two groups according to their length: longer than 6 kb (full-length) or shorter (non-full-length). While nearly all full-length L1s contain L1 5′ UTR and overlap with STARR-seq signals, some non-full-length L1s (such as L1P2 and L1PBa1) also contain L1 5′ UTR and overlap with STARR-seq peaks. g, Enrichment of DNA binding motifs at the full-length L1s with STARR-seq signals in K562, with the full-length L1s not overlapped with STARR-seq peaks as control. Plotted is the log2 transformed fold enrichment and log10 transformed P value of each protein binding motif (hypergeometric test). The enriched motifs for transcription factors are specified. Zf, zinc finger. h, Analysis of the ChIP-Atlas data showing that transcription factors bind full-length L1s. Colored circles represent log2 fold enrichment, with areas proportional to Q value (hypergeometric test). For a single protein, log2 fold enrichment is calculated using the median value from different cells or experiments. Q-values are adjusted with the Benjamini–Hochberg method.

Extended Data Fig. 5 L1 silencing by CTBP1 restricts distal gene expression.

a, Western blotting showing that CTBP1 KO increases L1 protein expression in K562. b, FACS showing CTBP1 KO increases L1-GFP expression in K562. Center line, mean. n = 2 biological replicates. c, qRT–PCR data showing normalized expression of L1Hs, L1PA15-16, L1PB and L1M subfamilies in control (n = 3) and CTBP1 KO cells (n = 3) with or without reverse transcription (RT). P, two-tailed t test. d, Box plot showing that CTBP1 KO mainly derepresses L1PA1–3 among all repeats in K562. Box plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. e, Density plot showing size distribution of the CTBP1-regulated L1s (P < 0.05, DESeq2 analysis) and other L1s in K562. P, two-tailed Kolmogorov–Smirnov test. f, Aggregated line plots showing indicated ChIP enrichments over the CTBP1-regulated L1PAs (P < 0.05, DESeq2 analysis) and all full-length (FL) L1s in K562. g, CTBP1 KO increases expression of genes surrounding the derepressed L1s. Plots show median and interquartile range (IQR) of gene expression change alongside derepressed L1s (black box plots), compared with background (red). h, Bar plot showing the number of full-length L1s detected by H3K27ac ChIP–seq and polyA+ RNA-seq (TPM > 0.05) in K562. i, Zoom-in views of the RNA-seq genome browser tracks at NRN1, its cognate distal L1PA2 and one nearby FARS2 locus, in control and CTBP1 KO K562, independently repeated once with similar results. j, RT–qPCR showing normalized expression of NRN1 and its nearby FARS2 in control (n = 3) and CTBP1 KO cells (n = 3). P, two-tailed t test. n.s., not significant. k,l, Deleting the L1PA2 element, as shown in i and confirmed by the gel image (k), decreases NRN1 expression in NCCIT (l). n = 4 biological replicates × 3 technical replicates. ***P < 0.001; unpaired two-tailed t-test. Error bars, standard deviation (s.d.). m,n, Representative Hi-C interaction matrix and ChIP–seq IGV tracks around NRN1-L1 in wild-type (WT) condition (m) and Sanger sequencing results showing the L1 deletion sites with sgRNA target sequences in red (n).

Source data

Extended Data Fig. 6 Diverse control of L1 influences distal gene expression.

a, Western blotting showing KMT2D KO increases L1 protein expression in K562. b, FACS showing KMT2D KO increases L1-GFP expression in K562. Center line, mean. n = 2 biological replicates. c, Density plot showing size distribution of the KMT2D-regulated L1s (P < 0.05, DESeq2 analysis) and other L1s in K562. P, two-tailed Kolmogorov–Smirnov test. d, Bubble plot showing expression changes of different L1 subfamilies in mouse MA1L cells. Data from ref. 82. e, Heatmaps showing indicated ChIP–seq enrichments in control and KMT2D KO K562 cells, sorted by the increase of H3K4me1 signals upon KMT2D KO and centered on the H3K4me1 peaks. f, Heatmaps showing H3K4me1 ChIP–seq enrichment over transposable elements, centered and sorted as in e. The yellow rectangle indicates the full-length L1s with increased H3K4me1 peaks upon KMT2D knockout in K562. g, Heatmaps showing indicated ChIP enrichments in control and KMT2D KO K562 over full-length L1s, sorted by WT H3K4me1 ChIP and aligned at the L1 5′ end. h, Frequency histogram of absolute distances from each L1 to the nearest gene with increased (red) and decreased expression (blue) upon KMT2D KO in K562. Background expectations of genome-wide L1 distribution to the nearest upregulated (green) and downregulated (black) gene upon KMT2D KO are shown, with dashed line indicating 95% confidence interval. P, two-tailed Kolmogorov–Smirnov test; n.s, non-significant. i, Box plots showing expression change for genes with an L1 which shows increased H3K27ac ChIP signals upon KMT2D KO in K562, within indicated genomic distance. Plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. Related statistical significance analysis is in Supplementary Table 5. j, Bubble plots showing expression changes of different L1 subfamilies in mESC. Data from refs. 83,84,85. k, Scatter plot showing METTL3/14 ChIP–seq enrichment over repeats in MCF-7 cells. Dots represent repeat subfamilies, colored by repeat classes. P, hypergeometric test. l, Box plots showing expression changes for genes with an L1 which shows decreased expression upon METTL3 KO in K562, within indicated genomic distance. Plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. Related statistical significance analysis is in Supplementary Table 5.

Source data

Extended Data Fig. 7 Systematic L1 perturbation influences pervasive long-range gene expression.

a, Schematic of triple sgRNAs targeting L1Hs 5′ UTR, used in CRISPRa/i. b, RT–qPCR results showing L1Hs expression upon CRISPRa/i in NCCIT. P, unpaired two-tailed t-test. Data represent median ± SEM. n = 2 biological replicates × 3 technical replicates. RT, reverse transcription. NC, non-target control. c, Bar plots showing counts of indicated L1s (blue + yellow + gray), counts of L1s targeted by triple sgRNAs (blue + yellow) and counts of L1s with increased H3K27ac peaks upon L1 CRISPRa (blue) in NCCIT. d, Bar plots showing counts of indicated L1s with H3K27ac peaks in NCCIT (blue + yellow + gray), counts of L1s targeted by triple sgRNAs (blue + yellow) and counts of L1s with decreased H3K27ac peaks upon L1 CRISPRi (blue). e, Gene expression change upon L1 CRISPRa/i in NCCIT. Red dots, genes whose expression increased upon CRISPRa (padj < 0.1, DESeq2 analysis) and decreased or remained silenced upon CRISPRi (termed as the L1-regulated genes). f, Plotted as in e, with genes separated into deciles by distance from the nearest L1 with increased H3K27ac peaks upon L1 CRISPRa. g, Box plot showing gene expression change in different deciles, binned as in f. n = 2 biological replicates. Center lines represent the median value, box limits represent the 25th and 75th percentiles and whiskers denote minima and maxima (1.5× the interquartile range). h,i, Representative examples of the L1-initiated chimeric transcripts in NCCIT. j, Box plots showing expression changes for the genes with Mll3/4 binding sites within indicated genomic distances in mESCs. Plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. This suggests that the enhancer activity of L1 (shown in Fig. 3d,e) is comparable to that of canonical enhancers regulated by Mll3/4. k, RNA-seq browser tracks at NRN1 and its cognate L1PA2 locus. l, RNA-seq browser tracks at SESN3, its cognate L1Hs and nearby gene locus. m, RT–qPCR results show that CRISPRa targeting the SESN3-L1, rather than one adjacent locus, increases SESN3 expression. Data represent median ± SEM. n = 2 biological replicates × 3 technical replicates. P, two-tailed t-test. ****P < 0.0001. n, IGV tracks in NCCIT after CRISPRa against L1 (data from this study) and LTR5Hs (result from ref. 29). This suggests the importance of L1 transcription, rather than the adjacent LTR5Hs, in the long-range gene activation of HMGCS1.

Source data

Extended Data Fig. 8 L1 transcription facilitates the long-range L1–gene contacts.

a,b, Scatter plot comparing genome-wide compartment scores upon L1 CRISPRa (a) and CRISPRi (b) in NCCIT. NC, non-target control. c, Stacked bar plot showing compartment changes upon L1 CRISPRa/i in NCCIT cells. d, Distribution of the distance between L1s and their looped cognate genes in NCCIT. e, Pie-chart representing number and proportion of L1s with the same or opposite orientation (i), and upstream or downstream (ii) of their looped genes identified in L1 CRISPRa (up) and CRISPRi (down) NCCIT. f, Enrichment analysis of transcription factor binding motifs at the 5′ UTRs of the active L1s associated with genes compared with those that do not. g,h, Box plot showing expression change of L1-looped genes compared to non-looped genes within 2 Mb of the H3K27ac- increased or decreased L1s in L1 CRISPRa (g) and CRISPRi (h), respectively. Plot shows the median and interquartile range (IQR), and whiskers are 1.5 × IQR. P, two-tailed Mann–Whitney–Wilcoxon test. i,j, Scatter plot showing correlation between the fold change of Pol II ChIP–seq signal at individual L1 and the expression fold change of the L1-paired genes in NCCIT upon L1 CRISPRa (i) and CRISPRi (j). k,l, Aggregated line plots showing ChIP enrichments of Pol II (k) and RAD21 (l) over the full-length L1s that contact their cognate gene promoters (left), and the respective overall ChIP–seq peaks (right) in L1 CRISPRa (blue) and non-target control (NC, gray) NCCIT cells. m, Bar plot showing the ChIP–seq enrichment of transcription factors and histone marks at the L1s that make physical contacts with their cognate genes. ChIP–seq datasets in the human embryonic stem cells (H9 hESC) were from ENCODE. n, Hallmark enrichment analysis for the L1-regulated ZGA genes. P, hypergeometric test. o,p, Scatter plot showing correlation between the H3K27ac fold changes at individual L1 locus and the fold changes of paired gene expression in NCCIT upon L1 CRISPRa (o) and CRISPRi (p).

Extended Data Fig. 9 L1 transcription enhances mouse 2-cell (2C) gene expression.

a, Scatter plot showing STARR-seq peak counts and −log10(P) for all repeats in mESCs. Dots, repeat subfamilies; large dots, L1 subfamilies; color, repeat classes. P, hypergeometric test. b, Aggregated contacts between L1s and their looped genes at indicated mouse embryos using the Hi-C data49. c, Box plot quantifying the L1–gene contacts as described in b. Plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. **P < 0.01, two-tailed t-test. d, Bubble plot showing overlap between the L1-contacted genes (c) and the list of 2C genes43 and DBTMEE v2 transcriptome categories86. P, hypergeometric test. e, Volcano plots showing repeat expression changes upon L1 perturbation in mESCs. f, Aggregated Hi-C contacts between L1s and their looped genes and all looping events in control and L1 CRISPRi mESCs. g, Bubble plot showing overlap between all DEGs (differentially expressed genes) by indicated L1 perturbation with the list of 2C genes43 and DBTMEE v2 transcriptome categories86. P, hypergeometric test. h, Gene set enrichment analysis (GSEA) among all Hallmark gene sets upon indicated L1 perturbation. NES, normalized enrichment scores. i, Gene set enrichment analysis (GSEA) showing that 2C-specific genes (defined in ref. 43) enrichment in mESCs after indicated L1 perturbation. j, Heatmap showing expression changes of 2C-specific genes and MERVLs upon indicated L1 perturbation. MERVL counted at subfamily level. k, Representative bar plots showing gene expression changes for indicated 2C genes upon indicated L1 perturbation. Data are representative of two or three independent experiments and are median ± SEM. *P < 0.1, **P < 0.01, ***P < 0.001, ****P < 0.0001, one-tailed t test. l, Box plot showing distance distribution of 2C genes sets to the nearest full-length L1 elements. Plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. P, two-tailed Mann–Whitney–Wilcoxon test. m, Deleting an L1MdA element, as shown in Fig. 5f and validated by gel image (left), decreases Zscan4a/4c/ps-1 expression. n = 2 biological replicates × 3 technical replicates. ***P < 0.001; unpaired two-tailed t-test. Error bars, standard deviation (s.d.).

Extended Data Fig. 10 L1 knockdown impairs ZGA and arrests mouse embryonic development at 2–4 cell stage.

a, Gene set enrichment analysis (GSEA) showing that 2C-specific genes were significantly downregulated in the 2-cell mouse embryos after L1 ASO treatment. NES, normalized enrichment score. b, Scatter diagram and box plot (inside) showing the relationship between the distance of gene transcription start site (TSS) to the nearest full-length L1s and the gene expression change upon L1 ASO treatment in mouse 2C embryos. The two vertical lines indicate ±512 kb, respectively. The top density plot shows the distance distribution of the expressed 2C-specific genes (red) and other genes (gray). The box plot shows the median and interquartile range (IQR) of gene expression change, and whiskers are 1.5× IQR. P, two-tailed Mann–Whitney–Wilcoxon test. The 2C-specific genes are significantly enriched within ±512 kb of the full-length L1s, and 96 of them are significantly downregulated upon L1 ASO in mouse 2C embryos. c, Examples of gene clusters of 2C-specific genes, with the derepressed ones in L1-ASO 2-cell mouse embryos labeled in red. d, Chromosome maps illustrating the genomic distribution of the 2C-specific genes, with the derepressed ones in L1-ASO 2-cell mouse embryos labeled in red. e, Representative phase-contrast images of 2.5 and 3.5 dpc mouse embryos in uninjected, control ASO and L1 ASO conditions, from 3 independent experiments. Scale bars, 100 µm. f, Percentage of mouse embryos at 4.5 dpc in SCR (n = 45), RC-L1 ASO (n = 44) and L1 ASO (n = 39). ***P < 0.0001, chi-square test. n.s., not significant. SCR ASO, scramble non-targeting ASO. RC-L1 ASO, the ASO with reverse complementary sequence to the L1 ASO sequence. g, MA plots of expression change of the 2C-specific genes in 2-cell mouse embryos after microinjection of RC-L1 ASO (left) and L1 ASO (right) into zygote, relative to the SCR ASO control. h, Model showing that L1 knockdown causes ZGA defects and development arrest at 2–4 cell stage before blastocyst formation in mouse early embryos. The figure is created with BioRender.com.

Source data

Supplementary information

Supplementary Information

Supplementary Methods and Supplementary Figs. 1 and 2.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Table 1: The list of 51 L1-reporter insertion sites in the screening cell line. Supplementary Table 2: The list of genes that suppress L1 expression in K562, as revealed by our genome-wide CRISPR screen. Supplementary Table 3: The list of genes that activate L1 expression in K562 cells, as revealed by our genome-wide CRISPR–Cas9 screen. Supplementary Table 4: The sequences of oligos used in this paper. Supplementary Table 5: Statistical analysis for gene expression changes. Supplementary Table 6: The mapping information for RNA-seq datasets generated in this paper. Supplementary Table 7: The mapping information for ChIP–seq datasets generated in this paper. Supplementary Table 8: The mapping information for Hi-C datasets generated in this paper. Supplementary Table 9: The list of public datasets used in this paper.

Source data

Source Data Figs. 2, 3 and 5 and Extended Data Figs. 3 and 7

Statistical source data for Figs. 2c, 3h and 5a,c,d,i, and Extended Data Figs. 3c,g and 7m.

Source Data Fig. 3

Unprocessed western blots and/or gels for Fig. 3g.

Source Data Extended Data Fig. 3

Unprocessed western blots for Extended Data Fig. 3a.

Source Data Extended Data Fig. 5

Unprocessed western blots for Extended Data Fig. 5a,k.

Source Data Extended Data Fig. 6

Unprocessed western blots for Extended Data Fig. 6a.

Source Data Extended Data Fig. 10

Unprocessed western blots for Extended Data Fig. 10m.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Bie, L., Wang, Y. et al. LINE-1 transcription activates long-range gene expression. Nat Genet 56, 1494–1502 (2024). https://doi.org/10.1038/s41588-024-01789-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-024-01789-5

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research