Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Distinctive origin and evolution of endemic thistle of Korean volcanic island: Structural organization and phylogenetic relationships with complete chloroplast genome

  • Bongsang Kim,

    Roles Data curation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea, eGnome, Inc, Seoul, Republic of Korea

  • Yujung Lee,

    Roles Data curation, Software

    Affiliation eGnome, Inc, Seoul, Republic of Korea

  • Bomin Koh,

    Roles Conceptualization, Resources

    Affiliation eGnome, Inc, Seoul, Republic of Korea

  • So Yun Jhang,

    Roles Writing – review & editing

    Affiliations eGnome, Inc, Seoul, Republic of Korea, Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea

  • Chul Hee Lee,

    Roles Resources

    Affiliation County Office of Ulleung-gun, Gyeongsangbuk-do, Korea

  • Soonok Kim,

    Roles Conceptualization, Supervision

    Affiliation Microorganism Resources Division, National Institute of Biological Resources, Incheon, Republic of Korea

  • Won-Jae Chi,

    Roles Conceptualization

    Affiliation Microorganism Resources Division, National Institute of Biological Resources, Incheon, Republic of Korea

  • Seoae Cho,

    Roles Conceptualization

    Affiliation eGnome, Inc, Seoul, Republic of Korea

  • Heebal Kim ,

    Roles Supervision

    heebal@snu.ac.kr (HK); jwyu@egnome.co.kr (JY)

    Affiliations Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea, eGnome, Inc, Seoul, Republic of Korea, Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea

  • Jaewoong Yu

    Roles Project administration, Supervision

    heebal@snu.ac.kr (HK); jwyu@egnome.co.kr (JY)

    Affiliation eGnome, Inc, Seoul, Republic of Korea

Abstract

Unlike other Cirsium in Korea, Cirsium nipponicum (Island thistle) is distributed only on Ulleung Island, a volcanic island off the east coast of the Korean Peninsula, and a unique thistle with none or very small thorns. Although many researchers have questioned the origin and evolution of C. nipponicum, there is not much genomic information to estimate it. We thus assembled the complete chloroplast of C. nipponicum and reconstructed the phylogenetic relationships within the genus Cirsium. The chloroplast genome was 152,586 bp, encoding 133 genes consisting of 8 rRNA genes, 37 tRNA genes, and 88 protein-coding genes. We found 833 polymorphic sites and eight highly variable regions in chloroplast genomes of six Cirsium species by calculating nucleotide diversity, as well as 18 specific variable regions distinguished C. nipponicum from other Cirsium. As a result of phylogenetic analysis, C. nipponicum was closer to C. arvense and C. vulgare than native Cirsium in Korea: C. rhinoceros and C. japonicum. These results indicate that C. nipponicum is likely introduced through the north Eurasian root, not the mainland, and evolved independently in Ulleung Island. This study contributes to further understanding the evolutionary process and the biodiversity conservation of C. nipponicum on Ulleung Island.

Introduction

Cirsium nipponicum (Maxim.) Makino is a perennial flowering plant that can be found near the seashore and belongs to the Carduoideae subfamily in Asteraceae. Among eight Cirsium species that grow naturally in Korea [1], C. nipponicum, also known as island thistle, is predominantly found only on Ulleung Island, an oceanic volcanic island on the east coast of the Korean Peninsula, and has no or very small thorns on its leaves. Like other Cirsium species traditionally used as a medicinal plant in East Asia for their bioactivities, including hepatoprotective, antioxidant, and antidiabetic activities [27], dried C. nipponicum has also been used as a medicinal source. It is an abundant producer of polyphenols and flavonoids such as cirsimarin and pectolinarin with antioxidant and anti-inflammatory activity [3, 8, 9]. In addition, the leaves known to be different from other Cirsium are also used as a resource for vegetables. Based on the fact that other Caruoideae species like milk thistle, were studied to investigate medicinal effects [1012], studies were also conducted on C. nipponicum [3, 8, 13, 14].

Although several Cirsium species are distributed in Korea and neighboring countries (Fig 1A), the origin of the Korean C. nipponicum, which is distributed only on Ulleung Island, is not yet clear. Previous studies on phylogenetic relationships have shown that C. nipponicum is distinct from other endemic Cirsium [1, 15]. However, there is a limitation to understanding the biological differences based on genomic studies among Cirsium species, as few studies have been conducted using the DNA of C. nipponicum in recent decades. Furthermore, despite the presence of other comparative analyses with C. nipponicum, the phylogenetic analyses have also been performed in a limited way using combinations of morphological characteristics and only small portions of genomic DNA, such as DNA barcode regions, which are problematic even in the evolutionary process [16, 17].

thumbnail
Fig 1. Cirsium species distribution map and chloroplast genetic map.

(a) Geographical distribution of Cirsium species around Korea (source: Natural Earth). (b) Genetic map of the C. nipponicum. Genes drawn outside the circle are transcribed counterclockwise, and others inside are transcribed clockwise. (c) C. vulgare distributed near Ulleung Island, provided by Bio Resource Information Service (BRIS). (d) C. rhinoceros distributed near Ulleung Island, provided. (e) C. japonicum distributed near Ulleung Island, provided by National Institute of Biological Resources (NIBR). (f) C. arvense distributed near Ulleung Island, provided by NIBR.

https://doi.org/10.1371/journal.pone.0277471.g001

Islands are considered a prosperous region in terms of plant species diversity, and Ulleung Island is one of the biodiversity hot spots in Korea [18, 19]. Nonetheless, the current biological species in islands are under threat from the loss of native habitats and climate change, such that many plants in Ulleung Island are suffering from various forms of development [2022]. Under these circumstances, conservation work on endemic species of Ulleung Island, including C. nipponicum, has just begun, and at the same time, genome construction of these species is required. Since the development of next-generation sequencing [23] technology has enabled researchers to study and understand the genome from a broader and deeper perspective, the acquisition of genetic resources has been activated and the quality has also improved. In addition, many projects involving genomic data, such as genome skimming or DNA barcoding, have been accompanied. Therefore, we aimed to present the chloroplast genomic data of C. nipponicum based on future studies, as genomic data can complement small remaining challenges and provide an accurate method for the biological understanding and biodiversity of Ulleung Island.

Plastid genomes were sequenced before the nuclear genome in most plant organisms because of their conserved traits, such as gene contents, low recombination, self-replication, genome structure, small compact size, maternal inheritance, and moderate substitution rates for comparative analysis within related species [2426]. For those reasons, the study of the chloroplast genome is regarded as a valuable resource for investigating phylogenetic analysis, population genetics, or plant systematics. For example, previous studies using the chloroplast genome have inferred phylogenetic relationships in traditionally intricate groups of tribe Cardueae [27, 28]. Moreover, variable regions such as repeat sequences or intergenic spacer (IGS) in chloroplast genomes of many species have been explored as helpful information for effective strategies to conserve endangered species [29]. Hence, constructing the chloroplast genome of C. nipponicum will be of great help in studying the evolutionary process of Cirsium and its adaptation to specific environments.

In this study, we assembled a complete chloroplast genome of C. nipponicum for the first time through NGS paired-end data and compared its chloroplast genome with other previously published chloroplast genomes. Then, we identified the genetic structure of the C. nipponicum chloroplast genome and performed comparative analyses with other Cirsium species. As a result, repeat elements and highly variable regions within Cirsium species were detected to distinguish C. nipponicum from others and constructed phylogenetic trees to observe the evolutionary relationship among Carduoideae.

Methods

Plant material, DNA extraction, and sequencing

Fresh leaves of C. nipponicum were collected from a conservation garden in Ulleunggun Agriculture Technology Center, Ulleung-gun, Gyeonsangbuk-do, Korea (37°27’37.0"N 130°52’29.9"E) under guide of Chul Hee Lee (research officer of Ulleunggun Agriculture Technology Center). The plant materials produced and used in this study comply with Korean guidelines and legislation. All the experiments were carried in accordance with national and international guidelines. Genomic DNA of C. nipponicum was extracted from leaf tissues using a cetyl trimethylammonium bromide (CTAB)-based protocol [30]. A paired-end library with a 2 x 151 paired-end (PE) was constructed following the manufacturer’s instructions (Illumina, USA) and sequenced using HiSeq platform.

Read data processing and chloroplast genome assembly

Quality control of removing low-quality reads and adaptor sequences was performed using fastQC and Trimmomatic programs [31, 32]. The adapter sequences were removed, and the end of reads with Phred score less than 20 was trimmed. Afterward, high-quality reads were assembled using GetOrganelle-1.7.1 [33], and then annotated using PGA v3 and GeSeq based on the four reference chloroplast genomes: C. rhinoceros (NC_044423.1), C. eriophorum (NC_036966.1), C. vulgarae (NC_036967.1), and C. arvense (NC_036965.1) [34, 35]. The tRNA genes were verified with tRNAscan-SE v2.0.5 program, and further manual adjustment was performed with BLATN and BLATX [36, 37]. The annotated chloroplast genome of C. nipponicum was submitted to GenBank under accession number MW248139. The genome map was illustrated by Organellar Genome DRAW (OGDRAW) [38]. The irScan and IRscope identified inverted repeat regions [39] for genomes with no information about IR annotations [40, 41]. Sequences of all protein-coding genes were used to analyze codon preference. Relative synonymous codon usage (RSCU) was calculated based on the following equation [42]:

Xij is the number of occurrences of the jth codon for the ith amino acid, and ni is the number of alternative codons for the ith amino acid.

Repeat sequence identification

Simple-sequence repeats (SSRs) in the C. nipponicum chloroplast genome were determined using MISA with the minimal repeat numbers set to 10, 5, 4, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide, respectively [43]. REPuter was used to identify dispersed repeats, including forward, reverse, complement, and palindromic kinds of repeat sequences with a minimum size of 30 bp and hamming distance of 3 [44].

Divergent hotspot identification

The MAFFT alignment [45], followed by DNASP [46] was performed to compare the chloroplast genome of C. nipponicum with following five Cirsium species: C. japonicum, C. rhinoceros, C. eriophorum, C. vulgare, and C. arvense. In order to identify variant divergence regions, the multiple sequence alignments were analyzed to calculate nucleotide diversity with window length 600 and step size 200 options.

Phylogenetic analysis

Phylogenetic analyses were conducted using the Cirsium species with Cardueae tribe species and one Gerbera jamesonii as an outgroup. The multiple sequence alignment for 20 sequences listed in S1 Table was performed using MAFFT with 1.53 gap penalty and FFT-NS-2 default method [45]. PAUP and Modeltest were used for Bayesian inference [47, 48]. MrBayes [49] was implemented with 1,000,000 generations and 250,000 generations burn-in, as well as the maximum likelihood analysis to construct phylogenetic trees. IQ-tree was performed to estimate maximum likelihood with 1000 bootstrap replications [50].

Results

Chloroplast genome of C. nipponicum

We sequenced whole genomic paired-end data of C. nipponicum in 16,415,067,154 bp size. By trimming adapters and low-quality sequences, a total of 3,739,051,830 high-quality reads were used as GetOrganelle-1.7.1 [33] input for chloroplast genome assembly. Based on the seed reads identified from GetOrganelle with 88,093,650 bp in length and 577x in sequencing depth, chloroplast genomic DNA was assembled into a circular form (Fig 1B). The length of the assembled genome of C. nipponicum was 152,586 bp with quadripartite structures, consisting of a large single-copy (LSC) region of 83,520 bp and a small single-copy (SSC) region of 18,701 separated by two inverted repeats (IRa, IRb) of 25,191 bp each. The GC content of the C. nipponicum chloroplast genome was 37.69%, and that of LSC, SSC, and IRs regions were 35.83%, 37.49%, and 43.11%, respectively. LSC exhibited the lowest value of GC contents among the four regions of the chloroplast genome, and IR regions had the highest value.

Using PGA [35] and GeSeq [34] annotation tools, the chloroplast genome of C. nipponicum annotated 133 genes consisting of 8 rRNA genes, 37 tRNA genes, and 88 protein-coding genes (Table 1). In total of 133 genes, 18 genes including 7 tRNA genes (trnI-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA-UGC, trnR-ACG, trnN-GUU), 4 rRNA genes (rrn4.5, rrn5, rrn16, rrn23), and 7 protein-coding genes (rpl2, rpl23, rps7, rps12, ycf2, ycf15, ndhB) were duplicated in IR regions. Also, 11 protein-coding genes (rpl2, rpl16, rps12, rps16, rpoC1, atpF, ycf3, clpP, petB, petD, ndhA, and ndhB) contained exons and introns. The small subunit ribosomal protein 12 (rps12) gene was trans-spliced, where the first exon was located in the LSC region and others in the IR regions.

thumbnail
Table 1. List of annotated genes in the C. nipponicum chloroplast genome.

https://doi.org/10.1371/journal.pone.0277471.t001

To distinguish the C. nipponicum chloroplast genome within other Cirsium species, we compared five well-known chloroplast genomes from NCBI RefSeq Sequences and reassigned quadripartite structures: C. arvense (NC_03695.1), C. vulgare (NC_036967.1), C. eriophorum (NC_036966.1), C. rhinoceros (NC_044423.1), and C. japonicum var. spinosissimum (NC_050046.1). All of these species, except for C. eriophorum, were found in Korea. C. vulgare and C. arvense were exotic species distributed worldwide including Russia, China, and Japan, and the remaining two, C. rhinoceros and C. japonicum, were endemic to Korea. As a result of basic statistics from comparing each chloroplast genome, the C. nipponicum chloroplast genome showed the lowest GC content in the whole chloroplast genome among six Cirsium species, whereas the GC content in the SSC region showed the highest value (Table 2).

thumbnail
Table 2. Basic features of six Cirsium chloroplast genomes.

https://doi.org/10.1371/journal.pone.0277471.t002

Expansion and contraction of IR regions

Many studies have identified variations in the length of chloroplast genomes when comparing IR regions, including boundary junctions within the same genus species. Considering that the chloroplast genome is regarded as the most conserved region, the appearance of expansion and contraction in IR regions could be a part of the genome evolution. Thus, we performed IRscope [41] with six Cirsium species to investigate the differences in IR regions (Fig 2). As a result, the rps19 gene showed across a junction between LSC and IR regions in C. nipponicum, C. arvense, C. eriophorum, and C. japonicum. On the other hand, the rpl2 gene was across the junction in C. vulgare and C. rhinoceros. The gene pattern around the IR junction of C. nipponicum was similar to that of C. arvense and C. eriophorum. Subsequently, multiple sequence alignment using chloroplast genome based on C. nipponicum IR regions revealed four deletions-two in ycf2 gene, one in trnI-GAU gene, and one in the intergenic region between rrn5 and trnR-ACG-and two insertions in ycf2 gene and the same intergenic region as deletion (S2 Table).

thumbnail
Fig 2. Comparison of the IR regions and the junctions of LSC, IR, and SSC regions among chloroplast genomes of six Cirsium.

C. arvnes, C. vulgare, C. eriophorum, C. rhinoceros, C. japonicum have NC_036965.1, NC_036967.1, NC_036966.1, NC_044423.1, NC_050046 accession numbers respectively.

https://doi.org/10.1371/journal.pone.0277471.g002

Codon preference analysis

We analyzed the frequency of codon usage using the protein-coding genes of C. nipponicum, including the other five Cirsium species. As a result, isoleucine and leucine were the most abundant amino acids (10. 86%, 10.63%), while cysteine was the least encoded (1.12%) in C. nipponicum (S3 Table). The percentage of the amino acids in the other five Cirsium species showed the same pattern as C. nipponicum (S1 Fig). Furthermore, all amino acids were found in the six Cirsium chloroplast genomes and exhibited codon preference except methionine and tryptophan. As we calculate the relative synonymous codon usage (RSCU) of C. nipponicum to measure the extent of codon bias, there were 30 codons with high preference (RSCU > 1) and 32 codons with low preference (RSCU < 1) out of 64 codons encoded 20 amino acids. The highest value of the RSCU codon was UUA (1.80–1.83), and the lowest codon was AGC (0.35–0.38) in all chloroplast. The patterns of RSCU values were similar to C. vulgare (Fig 3). Twenty-nine codons with RSCU values greater than 1 were codons ending with A or U, whereas 29 out of 32 codons with RSCU values less than 1 were codons ending with G or C.

thumbnail
Fig 3. Heat map of relative synonymous codon usage values of chloroplast protein coding genes among six Cirsium species.

https://doi.org/10.1371/journal.pone.0277471.g003

Repeat sequence analysis

Repeat elements have essential roles in characterizing genomes with particular perspectives. Especially in terms of conservativeness in the chloroplast genome, it can be helpful in species identifications. We identified dispersed repeats in six Cirsium species using REPuter (Kurtz et al., 2001) software (Fig 4A and 4B). The dispersed repeats were detected in three types of repeats (forward, reverse, palindromic) and ranged from 30 to 58 bp in length. Among these species, C. nipponicum contained the largest number of repeats and only carried a reverse type of repeats. The total number of dispersed repeats in C. nipponicum was 49, consisting of 28 forward, two reverse, and 19 palindromic repeat sequences. These repeats were located in various regions: three non-coding genes, 24 coding genes, 18 intergenic, and four intergenic spacers (IGSs) (S4 Table).

thumbnail
Fig 4. The number and type of repeats in six Cirsium species.

a. Frequency of three types dispersed repeats; b. Frequency of dispersed repeats by length; c. Frequency of simple sequence repeats (SSRs) motifs in different types; d. Frequency of four SSRs types.

https://doi.org/10.1371/journal.pone.0277471.g004

In addition to dispersed repeats, simple sequence repeats (SSRs), also known as microsatellites, were investigated using the MISA program [43]. There were 40 to 54 SSRs in Cirsium species, and mono-, di-, tri-, and tetra-nucleotide were detected in all Cirsium chloroplast genomes (Fig 4C and 4D). Most SSRs consisted of mono-nucleotide with the A/T motif, but the C/G motif was presented only in C. arvense and C. japonicuum. Moreover, C. nipponicum showed 43 SSRs with 26 mono-, 4 di-, 4 tri-, and 9 tetra-nucleotides. They were located in LSC regions (74%) and SSC regions (21%), and only a few in IR regions (5%) (S5 Table).

Divergence of hotspot regions

Highly variable regions in chloroplast genomes have been widely used in species identification studies. Since only morphological characteristics in Cirsium limit the distinction between each species, we performed multiple sequence alignment using six Cirsium species to find highly variable regions. As a result, there were 833 polymorphic sites, and the nucleotide diversity was calculated over the whole chloroplast genome (Fig 5). Among six Cirsium species, Pi values ranged from 0 to 0.01367 with an average of 0.00195. The highly variable regions that contain polymorphic sites were considered when Pi values were greater than 0.00743. The number of regions exceeding a given threshold was eight, with highly variable sites only in LSC and SSC regions (S6 Table). Three of the eight highly variable regions were located in coding sequences (trnD-GUC, ndhF, and ycf1), and the remaining five regions were spanned intergenic regions. Moreover, 18 specific variations were identified, mainly focusing on distinguishing C. nipponicum from other species (S7 Table). The regions that contained these specific substitutions were also in LSC and SSC regions.

thumbnail
Fig 5. Sliding window of nucleotide diversity from the alignment of six Cirsium plastomes.

https://doi.org/10.1371/journal.pone.0277471.g005

Phylogenetic analysis and species resolution

For a better understanding of the phylogenetic relationship among C. nipponicum and other species across tribes, phylogenetic analysis with two methods, Bayesian inference (BI) and maximum likelihood (ML), was conducted with Gerbera jamesonii as an outgroup. First, we achieved 20 complete chloroplast genomes from NCBI and then estimated the substitution model, known as the DNA sequence evolution model. Based on the best substitution models, TVM+F+I+G4 in the ML method and GTR+I+G in the BI method were applied to construct phylogenetic trees, and both results showed the same topology structure (Fig 6A). In the subtribe level, three Carlininae species and 17 Carduinae species were separated into each clade, and C. vulgare was the closest to C. nipponicum. In addition, we used matK and rbcL sequences to get more information about relationships between species and the resolution of speciation (Figs 6B and S2 and S3). Phylogenetic trees constructed by matK gene sequences showed similar results to complete chloroplast genome trees. The matK gene trees based on BI and ML methods had the same patterns of topology structure, and species were clustered by subtribe and genus levels. However, Cirsium species were split with low bootstrap values in the ML tree. When using rbcL gene sequences, we obtained six more sequences, 4 Sanger and 2 Illumina sequencing platforms, from the NCBI database to find the relationship of C. nipponicum sampled from Ulleung Island with others, especially with other Korean Cirsium and C. nipponicum KC589829.1, distributed in Japan. As a result, two of the rbcL trees had similar but low bootstrap values, especially low posterior probabilities around Cirsium species (S3 Fig). Moreover, Japanese C. nipponicum KNC589829.1 was close to Japanese C. tanakae and Korean C. japonicum, not to Ulleung Island C. nipponicum; however, C. nipponicum from Ulleung Island was still close to C. vulgare and C. arvense. The trees made by three sequence types revealed that C. nipponicum was far from C. japonicum and C. rhinoceros compared to C. vulgare and C. arvense in phylogenetic relationships.

thumbnail
Fig 6. Phylogenetic trees based on the whole chloroplast genomes and the rbcL.

(a) Phylogenetic relationship based on whole chloroplast genomes inferred by maximum likelihood (ML) with numbers beside the nodes representing the ML bootstrap values and Bayesian inference posterior probabilities; (b) Phylogenetic relationship based on the rbcL inferred by ML with numbers besides the nodes representing the ML bootstrap values.

https://doi.org/10.1371/journal.pone.0277471.g006

Discussion

Although the advances in high-throughput sequencing technologies has facilitated rapid progress in the field of genomics as well as chloroplast genetics [23], limited chloroplast genomes of Cirsium species were available. Herein, we present the complete chloroplast genome of C. nipponicum for the first time and provide convincing evidence for the distinctive origin and evolution of C. nipponicum by analyzing genome structure and phylogenetic relationships among Cirsium species. As a result, GC contents in the IR regions of six Cirsium species were higher than both LSC and SSC regions, indicating the presence of rRNA [51, 52]. Besides, when considering that the GC content of the SSC region in C. nipponicum is relatively higher than others, GC-biased gene conversion (gBGC) related to intraplastomic recombination could be proposed as another cause of GC content pattern [5355]. These GC content patterns and repeat elements are helpful in identifying speciation because of their polymorphism [56]. Identifying speciation based on a molecular marker such as a barcode system is important to the efficiency of species protection and management [29]. For DNA primer candidates, we found some repeats in several genes, including ndhA, ycf1, and near rRNA and IGS (S5 and S6 Tables). Furthermore, as these SSRs and dispersed repeats affect the genetic investigations such as population or phylogenetic relationship [15, 57], this study suggests its applicability to the evolution mechanism of Cirsium, especially in genetic structures of chloroplast genomes.

The codon usage bias is commonly observed in genomes of all organisms, including plants, such that understanding the evolutionary significance of its phenomenon was a common interest among biologists. The usage of synonymous codons for amino acids is not random, but it has bias [58], which is related to highly expressed genes and even plays a role in the evolution of chloroplast genomes [59, 60]. Since the chloroplast genome of plants is well-known to have the codon usage bias, the analysis of RSCU in the chloroplast of C. nipponicum can help understand genetic features and evolutionary process [61, 62]. Our results showed that the patterns of RSCU in C. nipponicum were more similar to C. vulgare than C. rhinoceros and C. japonicum (Fig 3). Hence, the preference for synonymous codons may imply a part of chloroplast genome evolution in Cirsium species.

We used five whole chloroplast genomes of Cirsium species available in the NCBI RefSeq database, considering the data validation and updates to reflect current knowledge, to perform comparative analyses. Compared with a previous study of three Carduus species that belong to the same subtribe as Cirsium, which reported nucleotide diversity with an average of 0.003442 and a peak of 0.0171 [63], our study showed that Cirsium species are more stable and conservative than Carduus species. Furthermore, the variation analysis results were consistent with the general feature, such that IR regions in the chloroplast of angiosperm were the most conserved region (S6 Table). Interestingly, when comparing the IR regions, C. niponnicum was close to C. japonicum, whereas the whole chloroplast genome was close to C. vulgare. Despite that expansion and contraction in IR regions are essential to the evolutionary process in chloroplast genome size [64, 65], variation in whole regions was more related to speciation within Cirsium species than in IR region. Recently, many researchers have used barcode systems for species separation using meta-barcode or universal mini-barcodes called matK and rbcL [66]. However, our constructed phylogenetic trees with matK and rbcL genes separately presented a low bootstrap value of ML and probability of BI, which indicate an unreliable topology, especially in matK (S2A Fig). Thus, we believe that phylogenetic trees using mini-barcodes could not be an appropriate method for speciation within Cirsium species.

As C. nipponicum is predominantly located on Ulleung Island, we initially thought it could be evolutionary similar to those close to the mainland or Japan, just like other plants growing on Ulleung Island. Ulleung Island is located about 137 km off the east coast of the Korean peninsula and was formed approximately 2 million years ago (Mya) [67, 68]. It is known to have about 600 taxa of vascular plants on Ulleung Island and is suggested to be derived and evolved from a founder population from the land close to the island, a mode of speciation known as anagenetic speciation [69]. However, our results showed that C. nipponicum was not grouped with two Korean species, C. rhinoceros and C. japonicum, or two Japanese species, C. nipponicum and C. tanakae (Figs 6 and S3). Moreover, C. nipponicum from Ulleung Island was more closely related to C. vulgare than others. The patterns of morphological characters in C. nipponicum are also distinct from other Cirsium species, such as C. japonicum and C. rhinoceros [1]. Additionally, the leaf shape of C. nipponicum is morphologically most similar to that of C. vulgare among the other Cirsium species around Ulleung Island (Fig 1C–1F). Therefore, C. nipponicum in Ulleung Island may not be originated from endemic species of Japan or Korea, but it may instead be derived from Russia [70], given the distribution of C. vulgare that is not distributed in Korea.

Based on the fact that the Cirsium species is known as a cosmopolitan [71], the probability of its dispersal to Ulleung Island can be inferred in several ways. One of the most effective methods to disperse the seeds of the family Asteraceae has been suggested as wind [72]. Although westerly winds are the dominant winds in Ulleung Island, dispersing by wind may be limited considering that there is no C. vulgare in the Korean peninsula, which is registered as invasive species by the Korean government [73]. Ocean currents are another possibility of dispersing, suggesting that dispersal of Fangus via floating masses from the north and south to Ulleung Island is possible [69]. Lastly, the dispersal of migratory birds traveling to Ulleung Island is another possibility. It has been reported that transporting seeds by birds may occur in Northeast China, Far East Russia, and Southern Korea and Japan [69] to the extent that reports of waterfowls passing through Ulleung Island were identified [74]. Some of these waterfowls were regarded as important vectors of exotic plant species [75]. Thus, endozoochory by waterfowls can be suggested as a factor explaining the dispersal of C. nipponicum on Ulleung Island. This study suggested that C. nipponicum of Ulleung Island originated from Cirsum other than Korean or Japansese endemic Cirsium, and has been adapted to the Ulleung Island environment.

Supporting information

S1 Fig. Amino acid percentages of six Cirsium chloroplast genomes.

https://doi.org/10.1371/journal.pone.0277471.s001

(TIF)

S2 Fig. Phylogenetic tree based on the matK.

(a) Phylogenetic tree based on the matK inferred by Maximum likelihood (ML) with number beside the nodes representing the ML bootstrap values. (b) Phylogenetic tree based on the matK inferred by Bayesian inference (BI) with numbers beside the nodes representing the BI posterior probabilities.

https://doi.org/10.1371/journal.pone.0277471.s002

(TIF)

S3 Fig. Phylogenetic tree based on the rbcL inferred by BI.

Numbers beside the nodes represent the BI posterior probabilities.

https://doi.org/10.1371/journal.pone.0277471.s003

(TIF)

S1 Table. The list of sequences used in phylogenetic analysis.

https://doi.org/10.1371/journal.pone.0277471.s004

(XLSX)

S2 Table. Result of multiple sequence alignment in IR regions based on C. nipponicum.

Ins and Del mean insertion and deletion respectively.

https://doi.org/10.1371/journal.pone.0277471.s005

(XLSX)

S3 Table. Codon frequency of all protein coding genes in chloroplast genome of C. nipponicum.

https://doi.org/10.1371/journal.pone.0277471.s006

(XLSX)

S4 Table. List of disperse repeats in C. nipponicum.

F, P, R mean the direction of repeats, forward, palindromic, reverse respectively.

https://doi.org/10.1371/journal.pone.0277471.s007

(XLSX)

S5 Table. The list of simple sequences repeats in C. nipponicum.

https://doi.org/10.1371/journal.pone.0277471.s008

(XLSX)

S6 Table. Highly variable regions detected in C. nipponicum.

https://doi.org/10.1371/journal.pone.0277471.s009

(XLSX)

S7 Table. C. nipponicum specific variable regions.

https://doi.org/10.1371/journal.pone.0277471.s010

(XLSX)

References

  1. 1. Song M-J, Kim H. Taxonomic study on Cirsium Miller (Asteraceae) in Korea based on external morphology. Korean Journal of Plant Taxonomy. 2007;37(1):17–40.
  2. 2. Sung CK, Kimura T. Northeast Asia: World Scientific; 1996.
  3. 3. Lee J-H, Lee K-R. Phytochemical constituents of Cirsium nipponicum (MAX.) Makino. Korean Journal of Pharmacognosy. 2005;36(2):145–50.
  4. 4. Yin J, Heo SI, Wang MH. Antioxidant and antidiabetic activities of extracts from Cirsium japonicum roots. Nutr Res Pract. 2008;2(4):247–51. pmid:20016726
  5. 5. Liao Z, Chen X, Wu M. Antidiabetic effect of flavones from Cirsium japonicum DC in diabetic rats. Archives of Pharmacal Research. 2010;33(3):353–62. pmid:20361298
  6. 6. Ge H, Turhong M, Abudkrem M, Tang Y. Fingerprint analysis of Cirsium japonicum DC. using high performance liquid chromatography. Journal of pharmaceutical analysis. 2013;3(4):278–84. pmid:29403828
  7. 7. Peng-Cheng L, Lin-Lin J, Zhong X-J, Jin-Jie L, Xin W, Shang X-Y, et al. Taraxastane-type triterpenoids from the medicinal and edible plant Cirsium setosum. Chinese journal of natural medicines. 2019;17(1):22–6. pmid:30704619
  8. 8. Jeong DM, Jung HA, Choi JS. Comparative antioxidant activity and HPLC profiles of some selected Korean thistles. Archives of pharmacal research. 2008;31(1):28–33. pmid:18277604
  9. 9. Lim H, Son KH, Chang HW, Bae K, Kang SS, Kim HP. Anti-inflammatory activity of pectolinarigenin and pectolinarin isolated from Cirsium chanroenicum. Biological Pharmaceutical Bulletin. 2008;31(11):2063–7. pmid:18981574
  10. 10. Liu S, Luo X, Li D, Zhang J, Qiu D, Liu W, et al. Tumor inhibition and improved immunity in mice treated with flavone from Cirsium japonicum DC. International Immunopharmacology. 2006;6(9):1387–93. pmid:16846832
  11. 11. Jung HA, Abdul QA, Byun JS, Joung E-J, Gwon W-G, Lee M-S, et al. Protective effects of flavonoids isolated from Korean milk thistle Cirsium japonicum var. maackii (Maxim.) Matsum on tert-butyl hydroperoxide-induced hepatotoxicity in HepG2 cells. Journal of ethnopharmacology. 2017;209:62–72. pmid:28735729
  12. 12. Han H-S, Shin J-S, Lee S-B, Park JC, Lee K-T. Cirsimarin, a flavone glucoside from the aerial part of Cirsium japonicum var. ussuriense (Regel) Kitam. ex Ohwi, suppresses the JAK/STAT and IRF-3 signaling pathway in LPS-stimulated RAW 264.7 macrophages. Chemico-biological interactions. 2018;293:38–47. pmid:30053449
  13. 13. Do J-C, Jung K-Y, Son K-H. Isolation of pectolinarin from the aerial parts of Cirsium nipponicum. Korean Journal of Pharmacognosy. 1994;25(1):73–5.
  14. 14. Lee S-O, Lee H-J, Yu M-H, Im H-G, Lee I-S. Total polyphenol contents and antioxidant activities of methanol extracts from vegetables produced in Ullung island. Korean Journal of Food Science and Technology. 2005;37(2):233–40.
  15. 15. Bae Y-M. Genetic relationship of some Cirsium plants of Korea. Journal of Life Science. 2015;25(2):243–8.
  16. 16. Barres L, Sanmartín I, Anderson CL, Susanna A, Buerki S, Galbany‐Casals M, et al. Reconstructing the evolution and biogeographic history of tribe Cardueae (Compositae). American Journal of Botany. 2013;100(5):867–82. pmid:23624927
  17. 17. Ackerfield J, Susanna A, Funk V, Kelch D, Park DS, Thornhill AH, et al. A prickly puzzle: Generic delimitations in the Carduus‐Cirsium group (Compositae: Cardueae: Carduinae). Taxon. 2020;69(4):715–38.
  18. 18. Francisco-Ortega J, Wang F-G, Wang Z-S, Xing F-W, Liu H, Xu H, et al. Endemic seed plant species from Hainan Island: a checklist. The Botanical Review. 2010;76(3):295–345.
  19. 19. Oh S-H, Chen L, Kim S-H, Kim Y-D, Shin H. Phylogenetic relationship of Physocarpus insularis (Rosaceae) endemic on Ulleung Island: Implications for conservation biology. Journal of plant biology. 2010;53(1):94–105.
  20. 20. Whitehead DR, Jones CE. Small islands and the equilibrium theory of insular biogeography. Evolution. 1969:171–9. pmid:28562965
  21. 21. Pyšek P, Richardson DM. The biogeography of naturalization in alien plants. Journal of Biogeography. 2006;33(12):2040–50.
  22. 22. Chung J-M, Shin J-K, Kim H-M. Diversity of vascular plants native to the Ulleungdo and Dokdo Islands in Korea. Journal of Asia-Pacific Biodiversity. 2020;13(4):701–8.
  23. 23. Ge Y, Dong X, Wu B, Wang N, Chen D, Chen H, et al. Evolutionary analysis of six chloroplast genomes from three Persea americana ecological races: Insights into sequence divergences and phylogenetic relationships. PloS one. 2019;14(9):e0221827. pmid:31532782
  24. 24. Wicke S, Schneeweiss GM, Depamphilis CW, Müller KF, Quandt D. The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant molecular biology. 2011;76(3):273–97. pmid:21424877
  25. 25. Duchene D, Bromham L. Rates of molecular evolution and diversification in plants: chloroplast substitution rates correlate with species-richness in the Proteaceae. BMC evolutionary biology. 2013;13(1):1–11. pmid:23497266
  26. 26. Smith DR. Mutation rates in plastid genomes: they are lower than you might think. Genome Biology and Evolution. 2015;7(5):1227–34. pmid:25869380
  27. 27. Herrando-Moraira S, Calleja JA, Galbany-Casals M, Garcia-Jacas N, Liu J-Q, López-Alvarado J, et al. Nuclear and plastid DNA phylogeny of tribe Cardueae (Compositae) with Hyb-Seq data: A new subtribal classification and a temporal diversification framework. Molecular phylogenetics evolution. 2019;137:313–32. pmid:31059792
  28. 28. Xu L-S, Herrando Moraira S, Susanna de la Serna A, Galbany-Casals M, Chen Y-S. Phylogeny, origin and dispersal of Saussurea (Asteraceae) based on chloroplast genome data. Molecular Phylogenetics and Evolution. 2019;141:106613. Epub 2019/09/17. pmid:31525421
  29. 29. Vu H-T, Tran N, Nguyen T-D, Vu Q-L, Bui M-H, Le M-T, et al. Complete chloroplast genome of Paphiopedilum delenatii and phylogenetic relationships among Orchidaceae. Plants (Basel). 2020;9(1):61. Epub 2020/01/08. pmid:31906501; PubMed Central PMCID: PMC7020410.
  30. 30. Inglis PW, Pappas MCR, Resende LV, Grattapaglia D. Fast and inexpensive protocols for consistent extraction of high quality DNA and RNA from challenging plant and fungal samples for high-throughput SNP genotyping and sequencing applications. PLoS One. 2018;13(10):e0206085. Epub 2018/10/20. pmid:30335843; PubMed Central PMCID: PMC6193717.
  31. 31. Andrews S. FastQC: a quality control tool for high throughput sequence data. Babraham Bioinformatics, Babraham Institute, Cambridge, United Kingdom; 2010.
  32. 32. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20. pmid:24695404
  33. 33. Jin J-J, Yu W-B, Yang J-B, Song Y, Depamphilis CW, Yi T-S, et al. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome biology. 2020;21(1):1–31. pmid:32912315
  34. 34. Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, et al. GeSeq–versatile and accurate annotation of organelle genomes. Nucleic acids research. 2017;45(W1):W6–W11.
  35. 35. Qu X-J, Moore MJ, Li D-Z, Yi T-S. PGA: a software package for rapid, accurate, and flexible batch annotation of plastomes. Plant Methods. 2019;15(1):1–12. pmid:31139240
  36. 36. Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic acids research. 1997;25(5):955–64. pmid:9023104
  37. 37. Kent WJ. BLAT—the BLAST-like alignment tool. Genome research. 2002;12(4):656–64. pmid:11932250
  38. 38. Lohse M, Drechsel O, Kahlau S, Bock R. OrganellarGenomeDRAW—a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucleic acids research. 2013;41(W1):W575–W81. pmid:23609545
  39. 39. Holland RA, Kirschvink JL, Doak TG, Wikelski M. Bats use magnetite to detect the earth’s magnetic field. PLoS One. 2008;3(2):e1676. pmid:18301753
  40. 40. Kandoth C, Ercal F, Frank RL, editors. A framework for automated enrichment of functionally significant inverted repeats in whole genomes. BMC bioinformatics; 2010: Springer.
  41. 41. Amiryousefi A, Hyvönen J, Poczai P. IRscope: an online program to visualize the junction sites of chloroplast genomes. Bioinformatics. 2018;34(17):3030–1. pmid:29659705
  42. 42. Sharp PM, Tuohy TM, Mosurski KR. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 1986;14(13):5125–43. Epub 1986/07/11. pmid:3526280; PubMed Central PMCID: PMC311530.
  43. 43. Beier S, Thiel T, Münch T, Scholz U, Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33(16):2583–5. pmid:28398459
  44. 44. Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic acids research. 2001;29(22):4633–42. pmid:11713313
  45. 45. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular biology evolution. 2013;30(4):772–80. pmid:23329690
  46. 46. Rozas J, Ferrer-Mata A, Sánchez-DelBarrio JC, Guirao-Rico S, Librado P, Ramos-Onsins SE, et al. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Molecular biology evolution. 2017;34(12):3299–302. pmid:29029172
  47. 47. Posada D, Crandall KA. Modeltest: testing the model of DNA substitution. Bioinformatics. 1998;14(9):817–8. pmid:9918953
  48. 48. Swofford DL. Phylogenetic analysis using parsimony. 1998.
  49. 49. Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology. 2012;61(3):539–42. pmid:22357727
  50. 50. Nguyen L-T, Schmidt HA, Von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular biology evolution. 2015;32(1):268–74. pmid:25371430
  51. 51. Galtier N, Lobry J. Relationships between genomic G+ C content, RNA secondary structures, and optimal growth temperature in prokaryotes. Journal of molecular evolution. 1997;44(6):632–6. pmid:9169555
  52. 52. Hurst LD, Merchant AR. High guanine–cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes. Proceedings of the Royal Society of London Series B: Biological Sciences. 2001;268(1466):493–7.
  53. 53. Walker JF, Jansen RK, Zanis MJ, Emery NC. Sources of inversion variation in the small single copy (SSC) region of chloroplast genomes. Am J Bot. 2015;102(11):1751–2. Epub 2015/11/08. pmid:26546126
  54. 54. Wu C-S, Chaw S-M. Evolutionary stasis in cycad plastomes and the first case of plastome GC-biased gene conversion. Genome biology evolution. 2015;7(7):2000–9. pmid:26116919
  55. 55. Niu Z, Xue Q, Wang H, Xie X, Zhu S, Liu W, et al. Mutational biases and GC-biased gene conversion affect GC content in the plastomes of Dendrobium genus. International journal of molecular sciences. 2017;18(11):2307. pmid:29099062
  56. 56. Nadeem MA, Nawaz MA, Shahid MQ, Doğan Y, Comertpay G, Yıldız M, et al. DNA molecular markers in plant breeding: current status and recent advancements in genomic selection and genome editing. Biotechnology Biotechnological Equipment. 2018;32(2):261–85.
  57. 57. Echt CS, DeVerno L, Anzidei M, Vendramin G. Chloroplast microsatellites reveal population genetic diversity in red pine, Pinus resinosa Ait. Molecular Ecology. 1998;7: 307–316.
  58. 58. Wu L, Nie L, Wang Q, Xu Z, Wang Y, He C, et al. Comparative and phylogenetic analyses of the chloroplast genomes of species of Paeoniaceae. Scientific Reports. 2021;11(1):1–16.
  59. 59. Li B, Lin F, Huang P, Guo W, Zheng Y. Complete Chloroplast Genome Sequence of Decaisnea insignis: Genome Organization, Genomic Resources and Comparative Analysis. Sci Rep. 2017;7(1):10073. Epub 2017/09/01. pmid:28855603; PubMed Central PMCID: PMC5577308.
  60. 60. Tan W, Gao H, Jiang W, Zhang H, Yu X, Liu E, et al. The complete chloroplast genome of Gleditsia sinensis and Gleditsia japonica: genome organization, comparative analysis, and development of taxon specific DNA mini-barcodes. Sci Rep. 2020;10(1):16309. Epub 2020/10/03. pmid:33005000; PubMed Central PMCID: PMC7529812.
  61. 61. Zhou Z, Dang Y, Zhou M, Li L, Yu C-h, Fu J, et al. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proceedings of the National Academy of Sciences. 2016;113(41):E6117–E25. pmid:27671647
  62. 62. Bautista MAC, Zheng Y, Hu Z, Deng Y, Chen T. Comparative Analysis of Complete Chloroplast Genome Sequences of Wild and Cultivated Bougainvillea (Nyctaginaceae). Plants. 2020;9(12):1671. pmid:33260641
  63. 63. Jung J, Do HDK, Hyun J, Kim C, Kim J-H. Comparative analysis and implications of the chloroplast genomes of three thistles (Carduus L., Asteraceae). PeerJ. 2021;9:e10687. pmid:33520461
  64. 64. Henriquez CL, Ahmed I, Carlsen MM, Zuluaga A, Croat TB, McKain MR. Evolutionary dynamics of chloroplast genomes in subfamily Aroideae (Araceae). Genomics. 2020;112(3):2349–60. pmid:31945463
  65. 65. Mehmood F, Shahzadi I, Waseem S, Mirza B, Ahmed I, Waheed MT. Chloroplast genome of Hibiscus rosa-sinensis (Malvaceae): comparative analyses and identification of mutational hotspots. Genomics. 2020;112(1):581–91. pmid:30998967
  66. 66. Ratnasingham S, Hebert PD. BOLD: The Barcode of Life Data System (http://www.barcodinglife.org). Molecular ecology notes. 2007;7(3):355–64.
  67. 67. Kim YK. Petrology of Ulreung volcanic island, Korea Part 1. Geology. The Journal of the Japanese Association of Mineralogists, Petrologists Economic Geologists. 1985;80(4):128–35.
  68. 68. Yang J, Pak J-H, Maki M, Kim S-C. Multiple origins and the population genetic structure of Rubus takesimensis (Rosaceae) on Ulleung Island: Implications for the genetic consequences of anagenetic speciation. PloS one. 2019;14(9):e0222707. pmid:31536553
  69. 69. Oh S-H, Youm J-W, Kim Y-I, Kim Y-D. Phylogeny and evolution of endemic species on Ulleungdo Island, Korea: The case of Fagus multinervis (Fagaceae). Systematic Botany. 2016;41(3):617–25.
  70. 70. Wallingford UCI. CABI, 2021. Invasive Species Compendium 2021. Available from: www.cabi.org/isc.
  71. 71. Susanna A, Garcia-Jacas N. Cardueae (Carduoideae). Systematics, Evolution Biogeography of Compositae Vienna: IAPT. 2009:293–313.
  72. 72. Sheldon J, Burrows F. The dispersal effectiveness of the achene–pappus units of selected Compositae in steady winds with convection. New Phytologist. 1973;72(3):665–75.
  73. 73. Jung SY, Lee J. W, Shin H. T, Kim S. J, An J. B, Heo T. I, et al. Invasive Alien Plants in South Korea. In: Pocheon KNA, editor. 2017.
  74. 74. Yu J-P, Jin S-D, Kim W-B, Kang J-H, Kim I-K, Kang T-H, et al. Characteristics of birds community in Ulleung Island, Korea. Journal of Asia-Pacific Biodiversity. 2013;6(1):175–87.
  75. 75. Brochet AL, Guillemain M, Fritz H, Gauthier‐Clerc M, Green AJ. The role of migratory ducks in the long‐distance dispersal of native plants and the spread of exotic plants in Europe. Ecography. 2009;32(6):919–28.