Abstract
Long interspersed nuclear element-1 (LINE-1 or L1) is a retrotransposon group that constitutes 17% of the human genome and shows variable expression across cell types. However, the control of L1 expression and its function in gene regulation are incompletely understood. Here we show that L1 transcription activates long-range gene expression. Genome-wide CRISPR–Cas9 screening using a reporter driven by the L1 5′ UTR in human cells identifies functionally diverse genes affecting L1 expression. Unexpectedly, altering L1 expression by knockout of regulatory genes impacts distant gene expression. L1s can physically contact their distal target genes, with these interactions becoming stronger upon L1 activation and weaker when L1 is silenced. Remarkably, L1s contact and activate genes essential for zygotic genome activation (ZGA), and L1 knockdown impairs ZGA, leading to developmental arrest in mouse embryos. These results characterize the regulation and function of L1 in long-range gene activation and reveal its importance in mammalian ZGA.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41588-024-01789-5/MediaObjects/41588_2024_1789_Fig1_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41588-024-01789-5/MediaObjects/41588_2024_1789_Fig2_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41588-024-01789-5/MediaObjects/41588_2024_1789_Fig3_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41588-024-01789-5/MediaObjects/41588_2024_1789_Fig4_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41588-024-01789-5/MediaObjects/41588_2024_1789_Fig5_HTML.png)
![](https://cdn.statically.io/img/media.springernature.com/m312/springer-static/image/art%3A10.1038%2Fs41588-024-01789-5/MediaObjects/41588_2024_1789_Fig6_HTML.png)
Similar content being viewed by others
Data availability
All sequencing samples reported have been deposited at the Genome Sequence Archive (https://ngdc.cncb.ac.cn/gsa-human/) under the accession HRA003719. ENCSR384BDV, ENCSR423ERM, ENCSR608IXR, ENCSR563YIS, ENCSR844QNT, ENCSR129ROE, ENCSR858MPS, ENCSR895FDL, ENCSR064KUD, ENCSR547SBZ, ENCSR983SZZ, ENCSR135NXN and ENCSR201NQZ were downloaded from ENCODE (https://www.encodeproject.org/). GSE213909, GSE143546, GSE36552, GSE101571, GSE86938, GSE45719, GSE64332, GSE151227, GSE126242, GSE157262, GSE145309, GSE98063, GSE100939, GSE144400 and GSE82185 were downloaded from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo). HRA002355 was downloaded from Genome Sequence Archive. The accession numbers are also listed in Supplementary Table 9. Human reference genome (hg38) and mouse reference (mm10) were downloaded from GENCODE (https://www.gencodegenes.org). Macaque reference (rheMac8) was downloaded from University of California, Santa Cruz (https://genome.ucsc.edu). Additional experimental materials used in this study, including plasmids and engineered cell lines, are available upon request. Source data are provided with this paper.
Code availability
All the data were analyzed using published pipelines with parameters described in Methods. The code used for figure generation is available on GitHub (https://github.com/AssumeAssume/Li_et_al_2024) and Zenodo (https://doi.org/10.5281/zenodo.11113925)78.
Change history
References
Beck, C. R., Garcia-Perez, J. L., Badge, R. M. & Moran, J. V. LINE-1 elements in structural variation and disease. Annu. Rev. Genom. Hum. Genet. 12, 187–215 (2011).
Ivancevic, A. M., Kortschak, R. D., Bertozzi, T. & Adelson, D. L. LINEs between species: evolutionary dynamics of LINE-1 retrotransposons across the eukaryotic tree of life. Genome Biol. Evol. 8, 3301–3322 (2016).
Philippe, C. et al. Activation of individual L1 retrotransposon instances is restricted to cell-type dependent permissive loci. eLife 5, e13926 (2016).
Faulkner, G. J. et al. The regulated retrotransposon transcriptome of mammalian cells. Nat. Genet. 41, 563–571 (2009).
Belancio, V. P., Roy-Engel, A. M., Pochampally, R. R. & Deininger, P. Somatic expression of LINE-1 elements in human tissues. Nucleic Acids Res. 38, 3909–3922 (2010).
Stow, E. C. et al. Organ-, sex- and age-dependent patterns of endogenous L1 mRNA expression at a single locus resolution. Nucleic Acids Res. 49, 5813–5831 (2021).
Kaul, T., Morales, M. E., Sartor, A. O., Belancio, V. P. & Deininger, P. Comparative analysis on the expression of L1 loci using various RNA-seq preparations. Mob. DNA 11, 2 (2020).
Jachowicz, J. W. et al. LINE-1 activation after fertilization regulates global chromatin accessibility in the early mouse embryo. Nat. Genet. 49, 1502–1510 (2017).
Percharde, M. et al. A LINE1-nucleolin partnership regulates early development and ESC identity. Cell 174, 391–405 (2018).
De Cecco, M. et al. L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature 566, 73–78 (2019).
Thomas, C. A., Paquola, A. C. M. & Muotri, A. R. LINE-1 retrotransposition in the nervous system. Annu. Rev. Cell Dev. Biol. 28, 555–573 (2012).
Payer, L. M. & Burns, K. H. Transposable elements in human genetic disease. Nat. Rev. Genet. 20, 760–772 (2019).
Lu, J. Y. et al. Homotypic clustering of L1 and B1/Alu repeats compartmentalizes the 3D genome. Cell Res. 31, 613–630 (2021).
Han, J. S., Szak, S. T. & Boeke, J. D. Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature 429, 268–274 (2004).
Attig, J. et al. Heteromeric RNP assembly at LINEs controls lineage-specific RNA processing. Cell 174, 1067–1081 (2018).
Liu, N. et al. Selective silencing of euchromatic L1s revealed by genome-wide screens for L1 regulators. Nature 553, 228–232 (2018).
Fueyo, R., Judd, J., Feschotte, C. & Wysocka, J. Roles of transposable elements in the regulation of mammalian transcription. Nat. Rev. Mol. Cell Biol. 23, 481–497 (2022).
Moran, J. V. et al. High frequency retrotransposition in cultured mammalian cells. Cell 87, 917–927 (1996).
Hong, Y. et al. SAFB restricts contact domain boundaries associated with L1 chimeric transcription Mol. Cell 84, 1637–1650.e10 (2024).
Mohanta, A. & Chakrabarti, K. Dbr1 functions in mRNA processing, intron turnover and human diseases. Biochimie 180, 134–142 (2021).
Szczepińska, T. et al. DIS3 shapes the RNA polymerase II transcriptome in humans by degrading a variety of unwanted transcripts. Genome Res. 25, 1622–1633 (2015).
Calo, E. & Wysocka, J. Modification of enhancer chromatin: what, how, and why? Mol. Cell 49, 825–837 (2013).
Field, A. & Adelman, K. Evaluating enhancer function and transcription. Annu. Rev. Biochem. 89, 213–234 (2020).
Arnold, C. D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).
Li, W., Notani, D. & Rosenfeld, M. G. Enhancers as non-coding RNA transcription units: recent insights and future perspectives. Nat. Rev. Genet. 17, 207–223 (2016).
Schoenfelder, S. & Fraser, P. Long-range enhancer–promoter contacts in gene expression control. Nat. Rev. Genet. 20, 437–455 (2019).
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016).
Pontis, J. et al. Hominoid-specific transposable elements and KZFPs facilitate human embryonic genome activation and control transcription in naive human ESCs. Cell Stem Cell 24, 724–735 (2019).
Fuentes, D. R., Swigut, T. & Wysocka, J. Systematic perturbation of retroviral LTRs reveals widespread long-range effects on human gene regulation. eLife 7, e35989 (2018).
Todd, C. D., Deniz, Ö., Taylor, D. & Branco, M. R. Functional evaluation of transposable elements as enhancers in mouse embryonic and trophoblast stem cells. eLife 8, e44344 (2019).
Deniz, Ö. et al. Endogenous retroviruses are a source of enhancers with oncogenic potential in acute myeloid leukaemia. Nat. Commun. 11, 3506 (2020).
Boxer, L. D., Barajas, B., Tao, S., Zhang, J. & Khavari, P. A. ZNF750 interacts with KLF4 and RCOR1, KDM1A, and CTBP1/2 chromatin regulators to repress epidermal progenitor genes and induce differentiation genes. Genes Dev. 28, 2013–2026 (2014).
Chinnadurai, G. CtBP, an unconventional transcriptional corepressor in development and oncogenesis. Mol. Cell 9, 213–224 (2002).
Shi, Y. et al. Coordinated histone modifications mediated by a CtBP co-repressor complex. Nature 422, 735–738 (2003).
Son, H. et al. Neuritin produces antidepressant actions and blocks the neuronal and behavioral deficits caused by chronic stress. Proc. Natl Acad. Sci. USA 109, 11378–11383 (2012).
Lee, J.-E. et al. H3K4 mono- and di-methyltransferase MLL4 is required for enhancer activation during cell differentiation. eLife 2, e01503 (2013).
Liu, N. & Pan, T. N6-methyladenosine-encoded epitranscriptomics. Nat. Struct. Mol. Biol. 23, 98–102 (2016).
Zhang, S. et al. DeepLoop robustly maps chromatin interactions from sparse allele-resolved or single-cell Hi-C data at kilobase resolution. Nat. Genet. 54, 1013–1025 (2022).
Busslinger, G. A. et al. Cohesin is positioned in mammalian genomes by transcription, CTCF and Wapl. Nature 544, 503–507 (2017).
Henn, A. et al. SIX2 gene haploinsufficiency leads to a recognizable phenotype with ptosis, frontonasal dysplasia, and conductive hearing loss. Clin. Dysmorphol. 27, 27–30 (2018).
Liu, Z. et al. A NIK-SIX signalling axis controls inflammation by targeted silencing of non-canonical NF-κB. Nature 568, 249–253 (2019).
Rodriguez-Terrones, D. & Torres-Padilla, M.-E. Nimble and ready to mingle: transposon outbursts of early development. Trends Genet. 34, 806–820 (2018).
Macfarlan, T. S. et al. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487, 57–63 (2012).
Abe, K. et al. Minor zygotic gene activation is essential for mouse preimplantation development. Proc. Natl Acad. Sci. USA 115, E6780–E6788 (2018).
Liu, B. et al. The landscape of RNA Pol II binding reveals a stepwise transition during ZGA. Nature 587, 139–144 (2020).
Lee, M. T. et al. Nanog, Pou5f1 and SoxB1 activate zygotic gene expression during the maternal-to-zygotic transition. Nature 503, 360–364 (2013).
Qiu, J. J. et al. Delay of ZGA initiation occurred in 2-cell blocked mouse embryos. Cell Res. 13, 179–185 (2003).
Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014).
Du, Z. et al. Allelic reprogramming of 3D chromatin architecture during early mammalian development. Nature 547, 232–235 (2017).
Zhang, K. et al. Analysis of genome architecture during SCNT reveals a role of cohesin in impeding minor ZGA. Mol. Cell 79, 234–250 (2020).
Le, R. et al. Dcaf11 activates Zscan4-mediated alternative telomere lengthening in early embryos and embryonic stem cells. Cell Stem Cell 28, 732–747 (2021).
Yan, Y.-L. et al. DPPA2/4 and SUMO E3 ligase PIAS4 opposingly regulate zygotic transcriptional program. PLoS Biol. 17, e3000324 (2019).
Falco, G. et al. Zscan4: a novel gene expressed exclusively in late 2-cell embryos and embryonic stem cells. Dev. Biol. 307, 539–550 (2007).
Pal, D. et al. H4K16ac activates the transcription of transposable elements and contributes to their cis-regulatory function. Nat. Struct. Mol. Biol. 30, 935–947 (2023).
Meng, S. et al. Young LINE-1 transposon 5′ UTRs marked by elongation factor ELL3 function as enhancers to regulate naïve pluripotency in embryonic stem cells. Nat. Cell Biol. 25, 1319–1331 (2023).
Buttler, C. A., Ramirez, D., Dowell, R. D. & Chuong, E. B. An intronic LINE-1 regulates IFNAR1 expression in human immune cells. Mobile DNA 14, 20 (2023).
Andersson, R. & Sandelin, A. Determinants of enhancer and promoter activities of regulatory elements. Nat. Rev. Genet. 21, 71–87 (2020).
Li, G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012).
Heinz, S., Romanoski, C. E., Benner, C. & Glass, C. K. The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 16, 144–154 (2015).
Beraldi, R., Pittoggi, C., Sciamanna, I., Mattei, E. & Spadafora, C. Expression of LINE-1 retroposons is essential for murine preimplantation development. Mol. Reprod. Dev. 73, 279–287 (2006).
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Li, W. et al. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biol. 16, 281 (2015).
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523 (2019).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Korotkevich, G. et al. Fast gene set enrichment analysis. Preprint at bioRxiv https://doi.org/10.1101/060012 (2021).
Karimzadeh, M., Ernst, C., Kundaje, A. & Hoffman, M. M. Umap and Bismap: quantifying genome and methylome mappability. Nucleic Acids Res. 46, e120 (2018).
Kent, W. J., Zweig, A. S., Barber, G., Hinrichs, A. S. & Karolchik, D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207 (2010).
Zou, Z., Ohta, T., Miura, F. & Oki, S. ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP–seq, ATAC–seq and bisulfite-seq data. Nucleic Acids Res. 50, W175–W182 (2022).
Peng, T. et al. STARR-seq identifies active, chromatin-masked, and dormant enhancers in pluripotent mouse embryonic stem cells. Genome Biol. 21, 243 (2020).
Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Krijger, P. H. L., Geeven, G., Bianchi, V., Hilvering, C. R. E. & de Laat, W. 4C-seq from beginning to end: a detailed protocol for sample preparation and data analysis. Methods 170, 17–32 (2020).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Li, X. LINE-1 transcription activates long-range gene expression. Zenodo https://doi.org/10.5281/zenodo.11113925 (2024).
De Melo Costa, V. R., Pfeuffer, J., Louloupi, A., Ørom, U. A. V. & Piro, R. M. SPLICE-q: a Python tool for genome-wide quantification of splicing efficiency. BMC Bioinformatics 22, 368 (2021).
DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012).
Taggart, A. J. et al. Large-scale analysis of branchpoint usage across species and cell lines. Genome Res. 27, 639–649 (2017).
Wang, G. et al. CRISPR-GEMM pooled mutagenic screening identifies KMT2D as a major modulator of immune checkpoint blockade. Cancer Discov. 10, 1912–1933 (2020).
Xu, W. et al. METTL3 regulates heterochromatin in mouse embryonic stem cells. Nature 591, 317–321 (2021).
Chen, C. et al. Nuclear m6A reader YTHDC1 regulates the scaffold function of LINE1 RNA in mouse ESCs and early embryos. Protein Cell 12, 455–474 (2021).
Chelmicki, T. et al. m6A RNA methylation regulates the fate of endogenous retroviruses. Nature 591, 312–316 (2021).
Park, S.-J., Shirahige, K., Ohsugi, M. & Nakai, K. DBTMEE: a database of transcriptome in mouse early embryos. Nucleic Acids Res. 43, D771–D776 (2015).
Acknowledgements
We thank Z. Chen, J. Yan and all members of the Liu Laboratory for their comments and discussions. We thank GenePlus for the next-generation sequencing support. This work was supported by the National Key Research and Development Program (grant 2022YFA1302700 to N.L.) and the National Natural Science Foundation of China (grants 32350011, 32270593 and 32070631 to N.L. and 32300448 to L.B.). N.L. acknowledges support from the Benyuan Fund—Young Investigator Exploration Grant in Life Sciences, Tsinghua University Initiative Scientific Research Program, Beijing Frontier Research Center for Biological Structure and Tsinghua-Peking Center for Life Sciences.
Author information
Authors and Affiliations
Contributions
X.L., L.B., Y.W., Y.H., Z.Z., Y.F., X.Y., Y.T., C.H., Y.Z., X. Sun, J.X.H.L., J.Z., Z.C., Q.X., A.M., X. Shen, W.X. and N.L. designed and performed experiments and analyzed data. N.L., X.L., L.B., Y.W. and Y.H. wrote the paper with input from other authors. N.L. directed and supervised the study.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Genome-wide CRISPR–Cas9 screening for genes that control L1 expression in K562 cells.
a, Heatmap showing the relative expression of individual full-length L1s across three indicated human cell lines. Z-score transformed transcripts per million (TPM) values were used to represent L1 expression. b, L1 RNA quantification in K562 using the strand-specific polyA+ RNA-seq. Data are representative of two independent experiments. The center value is the mean. P, two-tailed t test. n.s., non-significant. c, Fluorescence-activated cell sorting results show that over 99% of the L1-GFP K562 cell clones express blue fluorescent protein (BFP)-Cas9 and L1-GFP. d, Chromosome maps showing the genomic distribution of 51 L1-GFP reporter insertions in the K562 screening cell line, measured by long-read sequencing. The individual genomic locations are listed in Supplementary Table 1. e, Normalized L1-GFP expression (left) and line plots showing GFP intensity (right) in control and mutant K562 as indicated. The center value is the mean. n = 2 biological replicates for MPP8, n = 3 biological replicates for SAFB and control. Knockout of known L1 suppressors (SAFB, MPP8) increased L1 expression. f, Correlation of the sgRNA counts of each sample in K562 genome-wide screens, relative to Fig. 1a. g, Enrichment values of the top 80 L1 suppressors (blue) and the top 80 activators (red) were calculated from two independent K562 genome-wide screens with 4 independent sgRNAs per gene. Diamond represents the mean value; whisker represents the fold range for sgRNAs. Previously characterized L1 regulators are marked with stars. h, Cytoscape for visualizing protein–protein interactions for L1 suppressors (blue) and L1 activators (red). i,j, Venn diagram showing the number of candidate L1 activators (i) and suppressors (j) identified in this screen against L1 expression (blue), and that identified from one previous screen against L1 retrotransposition16 (yellow) in K562. k,l, Gene ontology (biological process) enrichment analysis of the unique activators (k) and unique suppressors (l) identified in this screen against L1 expression, and that identified from one previous screen against L1 retrotransposition in K562 (ref. 16). P, one-tailed hypergeometric test.
Extended Data Fig. 2 Intricate control of L1 expression in K562.
a, Bar plots showing the percentage of unique-mapped and multiple-mapped reads in indicated full-length L1PA subfamilies. b, Box plot showing the percentage of unique-mapped reads in individual full-length L1PAs (left), and the numbers of all elements and those that surpass the 5-read-count cutoff within each L1 subfamily (right). Center lines represent the median value, box limits represent the 25th and 75th percentiles and whiskers denote minima and maxima (1.5× the interquartile range). n = 2 biological replicates. c, Representative genome browser snapshots of the RNA-seq reads alignment over L1s. The color scale indicates the mapping quality score MAPQ for each read pair. MAPQ = −10 log10(p), where p is the probability that true alignment belongs elsewhere. d, Heatmaps showing expression change of significantly changed L1PAs using unique mapping (top) or multiple mapping (bottom) in indicated mutant K562. n = 2 biological replicates per gene. P < 0.05, DESeq2 analysis. e, Mean splicing efficiency of the RNA-seq samples, calculated using SPLICE-q79. f, Intronic read fraction of the RNA-seq samples, calculated using RNA-SeQC80. Together with e, these results indicated the high quality of our RNA-seq library. g, Bubble plots showing expression change of L1, LTR and SINE subfamilies in indicated mutant K562. Multi-mapped reads were used to quantify transposons at the subfamily level. Colored circles represent the log2 fold change, with areas proportional to P value (DESeq2 analysis). LTR, long terminal repeat; SINE, short interspersed nuclear element. n = 2 or 3 biological replicates per gene. h, Heatmap showing expression change at individual L1 locus in indicated mutant K562 (labeled and aligned same as in d). Hierarchical clustering was performed by ‘ward.D2’ method. RNA-seq reads were unique-mapped to individual L1 locus. Only significantly changed full-length L1s in at least one KO experiment were included (n = 222). P < 0.05, DESeq2 analysis. i,j, RNA-seq genome browser snapshots at six L1 loci from control and indicated mutant K562, illustrating that candidate L1 suppressors (i) and activators (j) selectively and cooperatively control L1 expression. The experiment was repeated once with similar results.
Extended Data Fig. 3 L1 restriction by DBR1 and DIS3.
a, Western blotting showing the level of DBR1 and endogenous L1 ORF1p in the K562 cells infected with two independent Cas9/sgRNAs targeting DBR1 and two independent non-target control Cas9/sgRNAs, independently repeated once with similar results. HSP90 is loaded as control. b, Donut chart showing that DBR1-regulated L1s (P < 0.05, DESeq2 analysis) are predominantly localized in introns in K562. c, Size distribution of DBR1-regulated L1s (P < 0.05, DESeq2 analysis) and all genomic L1s (based on hg38 RepeatMasker annotation) in K562. P value, two-tailed Kolmogorov–Smirnov test. d, The read counts of total lariats and L1-containing lariats, obtained from the ribosome-depleted RNA-seq data by a lariat read-count analysis strategy81 and normalized against unmapped singlets in control and DBR1 knockout K562 cells. n = 2 biological replicates. The center value is the mean. e, A working model that DBR1 restricts intronic L1 expression by debranching lariats into linear RNAs that are subsequently degraded. In DBR1 mutant cells, the level of L1-containing lariats is elevated, and the L1 within the lariat is proposed to be translated into L1 proteins. f, Donut chart showing genomic distribution of the DIS3-restricted L1s (P < 0.05, DESeq2 analysis) in K562 cells. g, Density plot showing size distribution of the DIS3-regulated L1s (P value < 0.05, DESeq2 analysis) and all genomic L1s (based on hg38 repeatmasker annotation) in K562 cells. P value, two-tailed Kolmogorov–Smirnov test. h, Schematic of DIS3 domains and the two point mutations to inactivate the PIN and RNB domains, respectively21. i, RNA-seq showing the relative expression level of full-length L1s in DIS3 knockdown HEK293 cells, rescued with the intact DIS3 or DIS3 mutants with inactive PIN or RNB domains, as indicated. RNA-seq reads were unique-mapped to individual L1 locus. As indicated, the PIN domain with endonucleolytic activity accounts for the L1 restriction. j, Aggregated line plot showing average enrichment of the DIS3 PAR-CLIP signal over full-length L1s. Genome-wide mappability displays genome-wide 100-bp-short-read alignability (data from UCSC). The results indicate that DIS3 directly binds L1 RNAs.
Extended Data Fig. 4 Pervasive L1s harbor enhancer-like features.
a,b, Dot plot/box plot of STARR-seq read counts (left) and enrichment (right) at the L1 subfamily (a) and L1 individual (b) level using K562 STARR-seq datasets (2 PE100 and 1 PE150). The STARR-seq reads were unique-mapped to individual L1 locus. Center lines represent the median value, box limits represent the 25th and 75th percentiles and whiskers denote minima and maxima (1.5× the interquartile range). n = 2 biological replicates. CPM, read counts per million mapped reads. c, Scatter plot showing counts of STARR-seq peaks and −log10(P value) for all repeats at the subfamily level in indicated cell lines. Dots, repeat subfamilies; large dots, L1 subfamilies; color, repeat classes. P, hypergeometric test. d, Aggregated line plots showing STARR-seq signals over full-length L1s at the subfamily level in indicated cell lines. e, Heatmaps of STARR-seq signals over full-length L1s in indicated cell lines, sorted and aligned as in Fig. 2d. f, Heatmap showing the proportion of full-length and non-full-length L1s that exhibit STARR-seq signals in six indicated human cell lines. The L1 subfamilies are classified into two groups according to their length: longer than 6 kb (full-length) or shorter (non-full-length). While nearly all full-length L1s contain L1 5′ UTR and overlap with STARR-seq signals, some non-full-length L1s (such as L1P2 and L1PBa1) also contain L1 5′ UTR and overlap with STARR-seq peaks. g, Enrichment of DNA binding motifs at the full-length L1s with STARR-seq signals in K562, with the full-length L1s not overlapped with STARR-seq peaks as control. Plotted is the log2 transformed fold enrichment and log10 transformed P value of each protein binding motif (hypergeometric test). The enriched motifs for transcription factors are specified. Zf, zinc finger. h, Analysis of the ChIP-Atlas data showing that transcription factors bind full-length L1s. Colored circles represent log2 fold enrichment, with areas proportional to Q value (hypergeometric test). For a single protein, log2 fold enrichment is calculated using the median value from different cells or experiments. Q-values are adjusted with the Benjamini–Hochberg method.
Extended Data Fig. 5 L1 silencing by CTBP1 restricts distal gene expression.
a, Western blotting showing that CTBP1 KO increases L1 protein expression in K562. b, FACS showing CTBP1 KO increases L1-GFP expression in K562. Center line, mean. n = 2 biological replicates. c, qRT–PCR data showing normalized expression of L1Hs, L1PA15-16, L1PB and L1M subfamilies in control (n = 3) and CTBP1 KO cells (n = 3) with or without reverse transcription (RT). P, two-tailed t test. d, Box plot showing that CTBP1 KO mainly derepresses L1PA1–3 among all repeats in K562. Box plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. e, Density plot showing size distribution of the CTBP1-regulated L1s (P < 0.05, DESeq2 analysis) and other L1s in K562. P, two-tailed Kolmogorov–Smirnov test. f, Aggregated line plots showing indicated ChIP enrichments over the CTBP1-regulated L1PAs (P < 0.05, DESeq2 analysis) and all full-length (FL) L1s in K562. g, CTBP1 KO increases expression of genes surrounding the derepressed L1s. Plots show median and interquartile range (IQR) of gene expression change alongside derepressed L1s (black box plots), compared with background (red). h, Bar plot showing the number of full-length L1s detected by H3K27ac ChIP–seq and polyA+ RNA-seq (TPM > 0.05) in K562. i, Zoom-in views of the RNA-seq genome browser tracks at NRN1, its cognate distal L1PA2 and one nearby FARS2 locus, in control and CTBP1 KO K562, independently repeated once with similar results. j, RT–qPCR showing normalized expression of NRN1 and its nearby FARS2 in control (n = 3) and CTBP1 KO cells (n = 3). P, two-tailed t test. n.s., not significant. k,l, Deleting the L1PA2 element, as shown in i and confirmed by the gel image (k), decreases NRN1 expression in NCCIT (l). n = 4 biological replicates × 3 technical replicates. ***P < 0.001; unpaired two-tailed t-test. Error bars, standard deviation (s.d.). m,n, Representative Hi-C interaction matrix and ChIP–seq IGV tracks around NRN1-L1 in wild-type (WT) condition (m) and Sanger sequencing results showing the L1 deletion sites with sgRNA target sequences in red (n).
Extended Data Fig. 6 Diverse control of L1 influences distal gene expression.
a, Western blotting showing KMT2D KO increases L1 protein expression in K562. b, FACS showing KMT2D KO increases L1-GFP expression in K562. Center line, mean. n = 2 biological replicates. c, Density plot showing size distribution of the KMT2D-regulated L1s (P < 0.05, DESeq2 analysis) and other L1s in K562. P, two-tailed Kolmogorov–Smirnov test. d, Bubble plot showing expression changes of different L1 subfamilies in mouse MA1L cells. Data from ref. 82. e, Heatmaps showing indicated ChIP–seq enrichments in control and KMT2D KO K562 cells, sorted by the increase of H3K4me1 signals upon KMT2D KO and centered on the H3K4me1 peaks. f, Heatmaps showing H3K4me1 ChIP–seq enrichment over transposable elements, centered and sorted as in e. The yellow rectangle indicates the full-length L1s with increased H3K4me1 peaks upon KMT2D knockout in K562. g, Heatmaps showing indicated ChIP enrichments in control and KMT2D KO K562 over full-length L1s, sorted by WT H3K4me1 ChIP and aligned at the L1 5′ end. h, Frequency histogram of absolute distances from each L1 to the nearest gene with increased (red) and decreased expression (blue) upon KMT2D KO in K562. Background expectations of genome-wide L1 distribution to the nearest upregulated (green) and downregulated (black) gene upon KMT2D KO are shown, with dashed line indicating 95% confidence interval. P, two-tailed Kolmogorov–Smirnov test; n.s, non-significant. i, Box plots showing expression change for genes with an L1 which shows increased H3K27ac ChIP signals upon KMT2D KO in K562, within indicated genomic distance. Plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. Related statistical significance analysis is in Supplementary Table 5. j, Bubble plots showing expression changes of different L1 subfamilies in mESC. Data from refs. 83,84,85. k, Scatter plot showing METTL3/14 ChIP–seq enrichment over repeats in MCF-7 cells. Dots represent repeat subfamilies, colored by repeat classes. P, hypergeometric test. l, Box plots showing expression changes for genes with an L1 which shows decreased expression upon METTL3 KO in K562, within indicated genomic distance. Plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. Related statistical significance analysis is in Supplementary Table 5.
Extended Data Fig. 7 Systematic L1 perturbation influences pervasive long-range gene expression.
a, Schematic of triple sgRNAs targeting L1Hs 5′ UTR, used in CRISPRa/i. b, RT–qPCR results showing L1Hs expression upon CRISPRa/i in NCCIT. P, unpaired two-tailed t-test. Data represent median ± SEM. n = 2 biological replicates × 3 technical replicates. RT, reverse transcription. NC, non-target control. c, Bar plots showing counts of indicated L1s (blue + yellow + gray), counts of L1s targeted by triple sgRNAs (blue + yellow) and counts of L1s with increased H3K27ac peaks upon L1 CRISPRa (blue) in NCCIT. d, Bar plots showing counts of indicated L1s with H3K27ac peaks in NCCIT (blue + yellow + gray), counts of L1s targeted by triple sgRNAs (blue + yellow) and counts of L1s with decreased H3K27ac peaks upon L1 CRISPRi (blue). e, Gene expression change upon L1 CRISPRa/i in NCCIT. Red dots, genes whose expression increased upon CRISPRa (padj < 0.1, DESeq2 analysis) and decreased or remained silenced upon CRISPRi (termed as the L1-regulated genes). f, Plotted as in e, with genes separated into deciles by distance from the nearest L1 with increased H3K27ac peaks upon L1 CRISPRa. g, Box plot showing gene expression change in different deciles, binned as in f. n = 2 biological replicates. Center lines represent the median value, box limits represent the 25th and 75th percentiles and whiskers denote minima and maxima (1.5× the interquartile range). h,i, Representative examples of the L1-initiated chimeric transcripts in NCCIT. j, Box plots showing expression changes for the genes with Mll3/4 binding sites within indicated genomic distances in mESCs. Plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. This suggests that the enhancer activity of L1 (shown in Fig. 3d,e) is comparable to that of canonical enhancers regulated by Mll3/4. k, RNA-seq browser tracks at NRN1 and its cognate L1PA2 locus. l, RNA-seq browser tracks at SESN3, its cognate L1Hs and nearby gene locus. m, RT–qPCR results show that CRISPRa targeting the SESN3-L1, rather than one adjacent locus, increases SESN3 expression. Data represent median ± SEM. n = 2 biological replicates × 3 technical replicates. P, two-tailed t-test. ****P < 0.0001. n, IGV tracks in NCCIT after CRISPRa against L1 (data from this study) and LTR5Hs (result from ref. 29). This suggests the importance of L1 transcription, rather than the adjacent LTR5Hs, in the long-range gene activation of HMGCS1.
Extended Data Fig. 8 L1 transcription facilitates the long-range L1–gene contacts.
a,b, Scatter plot comparing genome-wide compartment scores upon L1 CRISPRa (a) and CRISPRi (b) in NCCIT. NC, non-target control. c, Stacked bar plot showing compartment changes upon L1 CRISPRa/i in NCCIT cells. d, Distribution of the distance between L1s and their looped cognate genes in NCCIT. e, Pie-chart representing number and proportion of L1s with the same or opposite orientation (i), and upstream or downstream (ii) of their looped genes identified in L1 CRISPRa (up) and CRISPRi (down) NCCIT. f, Enrichment analysis of transcription factor binding motifs at the 5′ UTRs of the active L1s associated with genes compared with those that do not. g,h, Box plot showing expression change of L1-looped genes compared to non-looped genes within 2 Mb of the H3K27ac- increased or decreased L1s in L1 CRISPRa (g) and CRISPRi (h), respectively. Plot shows the median and interquartile range (IQR), and whiskers are 1.5 × IQR. P, two-tailed Mann–Whitney–Wilcoxon test. i,j, Scatter plot showing correlation between the fold change of Pol II ChIP–seq signal at individual L1 and the expression fold change of the L1-paired genes in NCCIT upon L1 CRISPRa (i) and CRISPRi (j). k,l, Aggregated line plots showing ChIP enrichments of Pol II (k) and RAD21 (l) over the full-length L1s that contact their cognate gene promoters (left), and the respective overall ChIP–seq peaks (right) in L1 CRISPRa (blue) and non-target control (NC, gray) NCCIT cells. m, Bar plot showing the ChIP–seq enrichment of transcription factors and histone marks at the L1s that make physical contacts with their cognate genes. ChIP–seq datasets in the human embryonic stem cells (H9 hESC) were from ENCODE. n, Hallmark enrichment analysis for the L1-regulated ZGA genes. P, hypergeometric test. o,p, Scatter plot showing correlation between the H3K27ac fold changes at individual L1 locus and the fold changes of paired gene expression in NCCIT upon L1 CRISPRa (o) and CRISPRi (p).
Extended Data Fig. 9 L1 transcription enhances mouse 2-cell (2C) gene expression.
a, Scatter plot showing STARR-seq peak counts and −log10(P) for all repeats in mESCs. Dots, repeat subfamilies; large dots, L1 subfamilies; color, repeat classes. P, hypergeometric test. b, Aggregated contacts between L1s and their looped genes at indicated mouse embryos using the Hi-C data49. c, Box plot quantifying the L1–gene contacts as described in b. Plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. **P < 0.01, two-tailed t-test. d, Bubble plot showing overlap between the L1-contacted genes (c) and the list of 2C genes43 and DBTMEE v2 transcriptome categories86. P, hypergeometric test. e, Volcano plots showing repeat expression changes upon L1 perturbation in mESCs. f, Aggregated Hi-C contacts between L1s and their looped genes and all looping events in control and L1 CRISPRi mESCs. g, Bubble plot showing overlap between all DEGs (differentially expressed genes) by indicated L1 perturbation with the list of 2C genes43 and DBTMEE v2 transcriptome categories86. P, hypergeometric test. h, Gene set enrichment analysis (GSEA) among all Hallmark gene sets upon indicated L1 perturbation. NES, normalized enrichment scores. i, Gene set enrichment analysis (GSEA) showing that 2C-specific genes (defined in ref. 43) enrichment in mESCs after indicated L1 perturbation. j, Heatmap showing expression changes of 2C-specific genes and MERVLs upon indicated L1 perturbation. MERVL counted at subfamily level. k, Representative bar plots showing gene expression changes for indicated 2C genes upon indicated L1 perturbation. Data are representative of two or three independent experiments and are median ± SEM. *P < 0.1, **P < 0.01, ***P < 0.001, ****P < 0.0001, one-tailed t test. l, Box plot showing distance distribution of 2C genes sets to the nearest full-length L1 elements. Plots show median and interquartile range (IQR), and whiskers are 1.5-fold IQR. P, two-tailed Mann–Whitney–Wilcoxon test. m, Deleting an L1MdA element, as shown in Fig. 5f and validated by gel image (left), decreases Zscan4a/4c/ps-1 expression. n = 2 biological replicates × 3 technical replicates. ***P < 0.001; unpaired two-tailed t-test. Error bars, standard deviation (s.d.).
Extended Data Fig. 10 L1 knockdown impairs ZGA and arrests mouse embryonic development at 2–4 cell stage.
a, Gene set enrichment analysis (GSEA) showing that 2C-specific genes were significantly downregulated in the 2-cell mouse embryos after L1 ASO treatment. NES, normalized enrichment score. b, Scatter diagram and box plot (inside) showing the relationship between the distance of gene transcription start site (TSS) to the nearest full-length L1s and the gene expression change upon L1 ASO treatment in mouse 2C embryos. The two vertical lines indicate ±512 kb, respectively. The top density plot shows the distance distribution of the expressed 2C-specific genes (red) and other genes (gray). The box plot shows the median and interquartile range (IQR) of gene expression change, and whiskers are 1.5× IQR. P, two-tailed Mann–Whitney–Wilcoxon test. The 2C-specific genes are significantly enriched within ±512 kb of the full-length L1s, and 96 of them are significantly downregulated upon L1 ASO in mouse 2C embryos. c, Examples of gene clusters of 2C-specific genes, with the derepressed ones in L1-ASO 2-cell mouse embryos labeled in red. d, Chromosome maps illustrating the genomic distribution of the 2C-specific genes, with the derepressed ones in L1-ASO 2-cell mouse embryos labeled in red. e, Representative phase-contrast images of 2.5 and 3.5 dpc mouse embryos in uninjected, control ASO and L1 ASO conditions, from 3 independent experiments. Scale bars, 100 µm. f, Percentage of mouse embryos at 4.5 dpc in SCR (n = 45), RC-L1 ASO (n = 44) and L1 ASO (n = 39). ***P < 0.0001, chi-square test. n.s., not significant. SCR ASO, scramble non-targeting ASO. RC-L1 ASO, the ASO with reverse complementary sequence to the L1 ASO sequence. g, MA plots of expression change of the 2C-specific genes in 2-cell mouse embryos after microinjection of RC-L1 ASO (left) and L1 ASO (right) into zygote, relative to the SCR ASO control. h, Model showing that L1 knockdown causes ZGA defects and development arrest at 2–4 cell stage before blastocyst formation in mouse early embryos. The figure is created with BioRender.com.
Supplementary information
Supplementary Information
Supplementary Methods and Supplementary Figs. 1 and 2.
Supplementary Tables
Supplementary Table 1: The list of 51 L1-reporter insertion sites in the screening cell line. Supplementary Table 2: The list of genes that suppress L1 expression in K562, as revealed by our genome-wide CRISPR screen. Supplementary Table 3: The list of genes that activate L1 expression in K562 cells, as revealed by our genome-wide CRISPR–Cas9 screen. Supplementary Table 4: The sequences of oligos used in this paper. Supplementary Table 5: Statistical analysis for gene expression changes. Supplementary Table 6: The mapping information for RNA-seq datasets generated in this paper. Supplementary Table 7: The mapping information for ChIP–seq datasets generated in this paper. Supplementary Table 8: The mapping information for Hi-C datasets generated in this paper. Supplementary Table 9: The list of public datasets used in this paper.
Source data
Source Data Figs. 2, 3 and 5 and Extended Data Figs. 3 and 7
Statistical source data for Figs. 2c, 3h and 5a,c,d,i, and Extended Data Figs. 3c,g and 7m.
Source Data Fig. 3
Unprocessed western blots and/or gels for Fig. 3g.
Source Data Extended Data Fig. 3
Unprocessed western blots for Extended Data Fig. 3a.
Source Data Extended Data Fig. 5
Unprocessed western blots for Extended Data Fig. 5a,k.
Source Data Extended Data Fig. 6
Unprocessed western blots for Extended Data Fig. 6a.
Source Data Extended Data Fig. 10
Unprocessed western blots for Extended Data Fig. 10m.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, X., Bie, L., Wang, Y. et al. LINE-1 transcription activates long-range gene expression. Nat Genet 56, 1494–1502 (2024). https://doi.org/10.1038/s41588-024-01789-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-024-01789-5
This article is cited by
-
LINE1 mediates long-range DNA interactions
Nature Genetics (2024)