Introduction

Ovarian cancer is a deadly disease with an overall 5-year survival of 49% [1]. High-grade serous carcinoma (HGSC) is the most common subtype of epithelial ovarian cancer, accounting for 70–80% of malignant ovarian neoplasm, and it is uniquely characterized by driver TP53 mutations in most cases [2,3,4]. Germline BRCA1 or BRCA2 pathogenic variants (BRCApv) increase the lifetime ovarian cancer risk from 1.6 to 45 and 20% respectively [5,6,7] and identify a high-risk population that would most benefit from improvements in cancer risk assessment.

There is much interest in the use of next generation sequencing (NGS) to develop biomarkers for early cancer detection and cancer risk assessment using minimally invasive clinical samples (liquid biopsies) [8]. Cervical liquid-based cytology (LBC) samples, also known as Pap tests, are an attractive source, as they are minimally invasive and part of most patients' routine gynecologic healthcare. Unfortunately, the direct measurement of tumor-specific mutations from cells collected for LBC lacks sensitivity, even with high sequencing depth [9,10,11].

Advances in sequencing methods have more recently enabled the detection of pathogenic mutations in somatic cells with aging [12, 13]. TP53 mutations have been found to be clonally expanded in multiple human non-cancerous tissues, including blood, skin, esophagus, and endometrium [13,14,15]. Clonal hematopoiesis of indeterminate potential (CHIP) is one prominent example of this somatic mosaicism [16]. While somatic evolution is increasingly recognized as a fundamental part of normal aging, little is known about the association between somatic evolution and cancer risk [17]. This is particularly critical in the context of TP53 somatic evolution, given the prominent role of TP53 mutations in cancer. A main reason for this gap in knowledge is the difficulty to detect low frequency mutations in normal tissues and liquid biopsies, but this limitation can be overcome by using high-resolution ultra-deep duplex sequencing (DS) [18,19,20,21].

Preliminary studies using DS have revealed extensive TP53 clonal evolution in gynecological liquid biopsies and potential association of these somatic mutations with ovarian cancer [10, 21, 22]. However, the contribution of TP53 mutant hematopoietic clones and the germline status of patients has not properly been considered. Here we perform a novel, comprehensive characterization of TP53 somatic mutations in blood and cervical LBC from individuals with and without HGSC and BRCA germline mutations. We confirm the lack of sensitivity of LBC samples for HGSC detection but provide evidence of increased TP53 clonal evolution in cervical LBC samples of BRCApv carriers with HGSC, which could be potentially explored for cancer risk assessment in this susceptible population.

Results

LBC and blood samples of patients with and without HGSC carry multiple pathogenic TP53 mutations, with only 5.4% of LBC mutations concordant with blood

This case-control study included a total of 70 individuals that underwent gynecological surgery at the University of Washington and had a cervical LBC and blood sample collected at the time of surgery. The study included two groups of patients: individuals with a pathogenic germline variant in BRCA1 or BRCA2 (BRCApv) and individuals without (BRCAwt) (Supplementary Table S1). Cancer cases (individuals with HGSC) and benign controls (individuals without HGSC) were selected to be age-matched (Table 1). DNA extracted from LBC and peripheral blood samples was analyzed for TP53 mutations using ultra-sensitive duplex sequencing to a target duplex depth of ~3000x. To ensure comparability of data across samples, 6 samples (5 LBC, 1 blood) with mean coding duplex depth higher than the 95% Confidence Interval (CI) were downsampled, and two LBC samples with mean coding duplex depth lower than the 95% CI were discarded from analysis (Methods). The final average coding duplex depth for blood was 2989x (1946–4136) and for LBC was 2883x (1808–4034). Duplex depth was comparable across patient groups for both LBC and blood samples (Supplementary Table S2).

Table 1 Clinical variables by patient group.

TP53 variants were identified in nearly all blood and LBC samples (Fig. 1). In individuals with HGSC with known TP53 driver mutation (n = 23), the tumor-specific mutation was identified in 7 LBC samples (3 BRCAwt and 4 BRCApv) yielding a sensitivity of 30.4% (7/23), which is comparable to prior studies [9,10,11] (Supplementary Table S3). Tumor-specific mutations were not identified in any of the blood samples. However, both blood and LBC samples contained many other TP53 mutations, most of which were pathogenic as defined by the algorithms AlphaMissense [23] and Seshat [24] (Supplementary Methods). While LBC and blood samples were sequenced at comparable depths (Supplementary Table S2), LBC samples contained more TP53 mutations than blood (368 vs. 194) and a higher percent was pathogenic (by AlphaMissense: 68% vs. 60%, Fisher’s exact text p = 0.063; by Seshat: 61% vs. 49%, Fisher’s exact test p = 0.007).

Fig. 1: Summary of clinical data and coding TP53 mutations found in liquid-based cytology (LBC) and blood leukocyte samples.
figure 1

Patients in each group are listed by increasing age with cancer stage, body mass index (BMI), smoking history, and prior chemotherapy use indicated with colored squares. For each patient, all unique coding mutations found in LBC and blood samples are displayed with the number of mutant duplex reads indicated within. Mutations are color coded based on pathogenicity and clone size (large clones defined as ≥ 2 mutant duplex reads). Tumor-specific mutations were identified in 7 Pap samples from 23 patients with known TP53 tumor mutations (sensitivity of 30.4%) and are highlighted with an outlined box. Concordant mutations in blood and LBC samples are highlighted with an outlined circle and, for each group, the number of concordant mutations is indicated with a Venn diagram below the plot. BRCAwt: noncarriers of germline pathogenic variants in BRCA1/2. BRCApv: carriers of germline pathogenic variants in BRCA1/2.

In duplex sequencing, each duplex read corresponds to an original DNA molecule. While most mutations were observed in a single duplex read, about 10% of mutations in blood and LBC samples were represented by multiple duplex reads, indicating large clones (Fig. 1). TP53 mutations forming large clones were more likely to be pathogenic compared to the mutations identified in single duplex reads, in blood (AlphaMissense: 94% vs. 56%, p = 0.002; Seshat: 89% vs. 45%, p < 0.001) as well as LBC samples (AlphaMissense: 95% vs. 65%, p < 0.001; Seshat: 90% vs. 58%, p < 0.001). The enrichment of pathogenic mutations in large clones suggests that these clonal expansions are driven by positively selected mutations. Pathogenic large clones were more common in the blood and LBC samples of older individuals and in BRCApv carriers with HGSC, but the numbers were small and comparisons were not statistically significant (Supplementary Fig. S1).

Out of 368 mutations identified in LBC, only 20 (5.4%) were shared between the two sample types. Most shared mutations were pathogenic (95% by AlphaMissense, 85% by Seshat), 60% were located in hotspot codons, and 55% presented as large clones (Supplementary Table S4). These results suggest their probable origin as TP53 mutant clones in blood (clonal hematopoiesis) sampled in the cervical LBC, although some concordant events could also result from convergent evolution. The 20 shared mutations were found in 18 individuals (two had two shared mutations). Individuals with concordant mutations in blood and LBC samples were more likely to be older (44% of patients with concordance were older than 65 vs. 13% of those without concordance, p = 0.016) and were more likely to have had prior chemotherapy (33% of patients with concordance vs 6% without, p = 0.007). Overall, these results indicate that most pathogenic TP53 mutations in LBC and blood are unique but a small subset of mutations from blood can be identified in LBC, especially in older patients and those with prior chemotherapy.

TP53 mutations identified in blood and LBC samples resemble mutations found in cancer

TP53 mutations found in blood and LBC samples were then compared to TP53 mutations reported in the COSMIC database [25] (Fig. 2). The distributions of mutation type (Fig. 2A) and spectrum (Fig. 2B) resembled those found in COSMIC, especially for mutations identified in LBCs. Most mutations were missense, and the most frequent substitution was C > T, in agreement with our prior findings in gynecological samples [10, 20,21,22]. The type and spectrum of mutations was relatively similar across groups of patients for blood and LBC samples (Supplementary Fig. S2). When TP53 substitutions in blood and LBC samples were plotted along the coding region of the gene, we observed a pattern of mutation clusters very similar to the pattern in COSMIC (Fig. 2C, D), indicating cancer-like positive selection of specific TP53 mutations in blood and LBC samples.

Fig. 2: Comparison of TP53 mutations in blood and liquid-based cytology (LBC) samples with TP53 mutations found in cancers.
figure 2

A Mutation type distribution. Number of mutations found per sample type are indicated in the x-axis. COSMIC mutations correspond to TP53 mutations identified in human cancer. B Mutation spectrum distribution. As in A. but including only substitutions. C Blood sample TP53 mutations codon location (top panel) compared to COSMIC (bottom panel). Y-axis indicates mutation counts. X-axis indicates codon numbers. Main p53 protein domains are color-coded. D LBC sample TP53 mutations codon location (top panel) compared to COSMIC (bottom panel). E Percentage of hotspot mutations in blood and LBC samples compared to expected percentage if mutations were random (no selection). Hotspot mutations in COSMIC are showed for reference. Hotspot codons are defined as codons containing >1% of COSMIC substitutions (25 codons) and no selection represents the probability of a random mutation to occur in a hotspot codon. p-values correspond to Fisher’s exact tests. F As in E but showing the percentage of pathogenic mutations (as defined by AlphaMissense) in blood and LBC samples compared to expected percentage with no selection and reported pathogenic mutations in COSMIC. p-values correspond to Fisher’s exact tests.

We further explored this finding by quantifying the percentage of substitutions in blood and LBC samples that occurred in TP53 hotspot codons (defined as codons containing >1% of COSMIC substitutions) (Supplementary Methods and Supplementary Table S5). While these hotspot codons only represent 6.6% of the coding region of the gene, more than a quarter of TP53 substitutions in blood and LBC occurred in those locations (Fig. 2E). This enrichment was highly significant (p < 0.001 for blood and LBC) and represented about half of the proportion of TP53 hotspot mutations found in cancers (55.8%). Similarly, the proportion of pathogenic mutations in blood and LBC was also significantly higher than what would be expected randomly, and about half of the proportion in cancer (Fig. 2F). Overall, these results indicate the presence of positive selected TP53 mutations in blood and LBC, resembling the composition of TP53 mutations found in cancer.

LBC samples harbor higher frequency of TP53 mutations than blood, especially in BRCApv carriers with HGSC

To further investigate differences in TP53 mutations while adjusting for sequencing depth, we calculated TP53 mutation frequency (MF) as the number of TP53 mutations divided by the total number of nucleotides sequenced for each sample (Supplementary Table S6). Because we sequenced a considerable portion of non-coding DNA, non-coding TP53 MF was also calculated. Coding TP53 MF was significantly higher in LBC samples than in blood for all patient groups (Fig. 3). The largest differences were observed for BRCApv carriers with cancer, who had the highest levels of TP53 mutations in LBC samples. Interestingly, the frequency of non-coding mutations was not significantly different between blood and LBC samples except for BRCApv carriers with cancer. Non-coding MF measures mutagenesis, whereas coding MF measures the combined effect of mutagenesis and clonal expansion of functional mutations. Thus, these results indicate that LBC samples contain higher frequency of TP53 clonal expansions than blood, which might reflect the extensive clonal evolution of the endometrium previously reported by others [13, 26, 27]. In BRCApv carriers with HGSC, however, additional factors might account for increased baseline mutagenesis in non-coding regions.

Fig. 3: TP53 mutation frequency comparison between blood (BLO) and liquid-based cytology (LBC) samples by group.
figure 3

Coding and non-coding mutation frequency was calculated by dividing the number of mutations by the number of duplex nucleotides sequenced in coding or non-coding regions, respectively. Each dot corresponds to one patient. Overlying box plots display the quartiles with whiskers extending up to 1.5x the interquartile range. p-values correspond to Wilcoxon signed rank tests and are color-coded in red if below 0.05. BRCAwt: non carriers of germline pathogenic variants in BRCA1/2. BRCApv: carriers of germline pathogenic variants in BRCA1/2.

TP53 somatic mutations are associated with age, obesity, and smoking

We next explored whether coding TP53 MF and total TP53 MF (including coding and non-coding mutations) measured in blood and LBCs were associated with clinical variables (Table 2, Supplementary Fig. S3). As expected based on prior findings [12, 28], coding TP53 MF was significantly associated with increasing age in blood as well as in LBC samples. This association remained significant when including non-coding mutations in LBC samples but not in blood samples. CA-125, a HGSC serum biomarker, and stage of HGSC were not associated with coding or total TP53 MF in blood or LBC. Interestingly, patients with high BMI had increased levels of total TP53 MF in blood. This association was significant when BMI was used as a numerical variable or a categorical variable with cut-off of 30, which corresponds to the threshold for obesity in women. In LBC, coding and total TP53 MF were not associated with BMI but were associated with smoking and menopause (Table 2, Supplementary Fig. S3). We did not find significant associations between coding or total TP53 MF and prior chemotherapy exposure, but there were only 8 patients with prior chemotherapy.

Table 2 Clinical associations of coding and total TP53 mutation frequencies.

LBC samples of BRCApv carriers with HGSC show significant enrichment of TP53 clonal expansions by multiple metrics, independent of blood or tumor mutations present in the sample

Our study was specifically designed to investigate differences in TP53 clonal evolution in patients with and without HGSC taking into consideration their germline mutation status. Thus, we performed comparison testing separately for the BRCAwt group, which included patients aged 40 to 85, and the BRCApv group, which included patients aged 35 to 65 (patients P01 and P60 excluded as ages were out of range). In LBC samples, BRCApv carriers with HGSC had significantly higher coding TP53 MF and total TP53 MF than BRCApv carriers without cancer (p = 0.044 and p = 0.008, respectively). These associations were not observed in blood samples or for BRCAwt individuals (Fig. 4A). In addition, BRCApv carriers with HGSC also had significantly higher coding TP53 mutation burden (p = 0.022), which is a metric that considers the number of mutations as well as the size of the clones by counting the total number of duplex mutant reads in a sample (Methods).

Fig. 4: Comparison of TP53 mutation frequencies in blood and liquid-based cytology (LBC) samples of patients with and without cancer.
figure 4

A Overall TP53 mutation frequency metrics. Plots display group comparisons for TP53 coding mutation frequency, TP53 total mutation frequency (includes coding and non-coding mutations), and TP53 coding mutation burden (includes total number of mutant reads in coding positions). B Pathogenic TP53 mutation frequency metrics. Plots display group comparisons for hotspot, AlphaMissense (AM) pathogenic, and Seshat Pathogenic TP53 coding mutation frequencies. Each dot corresponds to one patient. BRCAwt patients include ages 40 to 85 and BRCApv patients include ages 35 to 65. Overlying box plots display the quartiles with whiskers extending up to 1.5x the interquartile range. p-values correspond to Mann-Whitney U tests and are color-coded in red if below 0.05. BRCAwt: non carriers of germline pathogenic variants in BRCA1/2. BRCApv: carriers of germline pathogenic variants in BRCA1/2.

To further explore these associations, we calculated 3 additional metrics that take into consideration different features of TP53 mutations: hotspot TP53 MF, AlphaMissense (AM) pathogenic TP53 MF, and Seshat pathogenic TP53 MF (Methods). For the 3 metrics, BRCApv carriers with HGSC had significantly higher values than BRCApv carriers without cancer (Fig. 4B). These results indicate that for BRCApv carriers, the presence of HGSC is associated with increased TP53 somatic evolution as reflected by more TP53 mutations, larger clones, and higher pathogenicity in LBC samples. Interestingly, an increase of pathogenic TP53 MF was also observed in blood (p = 0.028 when measured by AM and p = 0.054 when measured by Seshat). For patients without germline BRCApv, there were no significant associations between cancer and pathogenic TP53 mutations measured in blood or LBC samples (Fig. 4B).

To test whether these differences were driven by TP53 mutant leukocyte clones present in LBC samples (Fig. 1), we removed concordant blood/LBC mutations for a sensitivity analysis. In LBC samples, total TP53 MF, coding TP53 mutation burden, and hotspot TP53 MF were still significantly higher in BRCApv carriers with cancer compared with those without (p < 0.05 in both cases, Supplementary Table S7). We also tested whether these associations were exclusively driven by the tumor mutations identified in LBC. When tumor mutations were removed, the p-values increased, as expected, but remained significant or near significant for total TP53 MF (p = 0.013), coding TP53 mutation burden (p = 0.054), and hotspot TP53 MF (p = 0.056) (Supplementary Table S7). These results indicate that BRCApv carriers with cancer harbor an excess of TP53 mutant clones which are not the drivers of the patient’s cancer.

To determine whether germline BRCApv affected TP53 clonal evolution independently of cancer, somatic TP53 mutations were compared in BRCApv carriers and non-carriers without HGSC (Supplementary Fig. S4). Age was restricted to those 65 years or younger to reduce age-related confounding in the comparison testing. No significant differences in any of the six different TP53 mutation metrics were found in either blood or LBC samples between BRCApv carriers and non-carriers. This result indicates that germline BRCApv do not appear to increase TP53 clonal expansion in LBC samples in patients without cancer and therefore additional factors might contribute to the higher level of expansion observed in LBC samples when HGSC develops.

For BRCApv carriers, TP53 mutation frequency in LBC samples predicts ovarian cancer independently of age but sample size is limited for multiple covariate testing

Exploratory logistic regression models were built to test the use of TP53 mutations in blood and LBC samples as biomarkers for cancer risk (Supplementary Table S8). Sample size limited within-group modeling, especially in the patients with germline BRCApv, thus cancer cases and non-cancer cases were initially combined regardless of mutational status. When considering all patients and adjusting only for age, we observed that an increase in coding TP53 MF and total TP53 MF in LBC samples leads to significant increases in the odds of HGSC (OR 1.88, 95% CI 1.05, 3.35 and OR 2.19, 95% CI 1.10, 4.36 respectively) but these effects become non-significant when adjusting for the rest of covariates (age, BMI, smoking, CA-125, and BRCApv status). When considering only patients with germline BRCApv, again age-adjusted models showed significant increases in the odds of HGSC for higher levels of coding TP53 MF and total TP53 MF (OR 3.71, 95% CI 1.14–12.14 and OR 4.97, 95% CI 1.13–21.92 respectively). Further adjustment with covariates was unfortunately limited due to the small number of cases.

Discussion

In this study we have leveraged ultra-deep, high-resolution sequencing to provide a novel in-depth characterization of the somatic TP53 mutational landscape found in cervical LBC and blood samples of individuals with and without germline BRCApv and HGSC. We have confirmed prior results by us and others [9,10,11] that indicate that ovarian cancer mutations are found in cervical LBC samples with low sensitivity (30% in this study), which argues against its utility for ovarian cancer detection. However, we have demonstrated, in agreement with our prior study [10], that pathogenic TP53 mutations are abundant in cervical LBC and increase with age, providing a useful metric to study TP53 somatic evolution in the gynecological tract and its association with cancer risk. Ultra-deep sequencing of matching blood has revealed that TP53 clonal expansions in LBC are more frequent than in blood and independent of CHIP clones. In addition, we demonstrate a significant association between LBC TP53 mutations and ovarian cancer in BRCApv carriers, which is robust across different measures of clonal evolution and might have clinical value for cancer risk assessment in patients with genetic susceptibility to ovarian cancer.

Our findings fit into a growing body of literature examining the utility of novel biomarkers in liquid biopsies for ovarian cancer diagnosis, prognosis, and risk assessment. Autoantibodies, cell-free plasma DNA, microRNA, and DNA methylation signatures are all areas of active biomarker development [8, 29,30,31,32]. The use of LBC for early ovarian cancer detection has been explored by us and others with mixed results [9,10,11, 33, 34]. While Arildsen et al. reported 75% sensitivity to find the tumor mutation [33], Kinde et al., Wang et al. and Van Bommel et al. reported sensitivities of 41%, 33%, and 29%, respectively [9, 11, 34]. Our first study agreed with the limited sensitivity (38%) but indicated potential value in the analysis of TP53 clonal expansions [10]. Our current study confirms these findings and indicates that the association between TP53 clonal expansions in cervical LBC and ovarian cancer is restricted to BRCApv carriers. This is the population at highest risk of developing cancer and therefore the individuals who would most benefit from a clinical biomarker for risk stratification. In BRCApv carriers, we also find a significant increase of pathogenic TP53 mutations in blood in association with ovarian cancer. Combining LBC and blood sample analysis in future larger studies could potentially improve the personalized assessment of ovarian cancer risk in BRCApv carriers.

Importantly, TP53 mutations in cervical LBC were not associated with stage or prior chemotherapy but were associated with increasing age, smoking, and menopause, which are factors previously linked with clonal expansions [12, 13]. In addition, in blood, TP53 mutations were associated with age and obesity, which are predisposing factors for clonal hematopoiesis [35, 36]. In BRCApv patients, the association between TP53 mutations in LBC and cancer remained significant after adjustment for age but the sample size was too small to adjust for the rest of covariates. Future studies investigating the value of TP53 clonal expansions for cancer risk assessment should be powered to control for the effect of these covariates.

Somatic clonal evolution has been recently recognized as a prevalent biological event linked to aging and potentially predisposing to cancer [12, 13, 37]. Aside from the connection between CHIP and AML [36] however, little is known about whether mutations in cancer driver genes within otherwise normal cells are related to the development of cancer. This gap in knowledge is mainly due to the difficulty of measuring low frequency mutations in normal tissue. Using ultra-sensitive duplex sequencing, we previously demonstrated the presence of pathogenic TP53 mutations peritoneal fluid, cervical LBC, uterine lavage, and gynecological tissues of individuals with and without HGSC [10, 20,21,22]. However, individuals with HGSC had more pathogenic TP53 clones in LBC [10] and uterine lavage [22] consistent with this study and indicating that TP53 clonal expansions in the gynecological tract might be linked to cancer progression. We observed similar associations in the normal colon of patients with colon cancer or polyps [38] and others have demonstrated associations between clonal expansions and genetic risk factors (e.g. cancer predisposition mutations in ATM) and environmental cancer risk factors (e.g. smoking, chronic inflammation, low parity) [13, 37, 39] suggesting a link between clonal evolution and cancer predisposition.

Most TP53 mutant clones in LBC samples are not from hematopoietic origin. Some mutant cells could derive from p53 foci in the fallopian tube epithelium, which are known precursors of HGSC and might disseminate early in progression [40, 41]. p53 foci have been identified in as many as 27% of fallopian tubes removed at risk-reducing salpingo-oophorectomy [42]. In addition, p53 foci have been reported to be most abundant in BRCApv carriers with ovarian cancer [43], which agrees with our findings of increased TP53 MF in this group of patients. Other TP53 mutations in LBCs likely derive from endometrial epithelium, which is a tissue with abundant clonal expansions driven by TP53 mutations and other cancer driver genes [26, 27]. Interestingly, our study revealed a higher frequency of TP53 mutations in LBC than blood for all patient groups, about two-fold in average, which is consistent with the estimated mutation rate in endometrium being about double the rate in blood (in endometrium 29 (23–34) mutations per genome per year vs 14.2 (6.1–22.4) in blood) [13]. The small proportion of concordant clones found in both blood and cervical LBC samples is expected due to the presence of leukocytes and the fact that clonal hematopoiesis is detected in >90% of adults when using highly sensitive methods [36, 44]. Importantly, after removal of concordant mutations blood/LBC, pathogenic TP53 mutations were still increased in BRCApv carriers with cancer compared to individuals without cancer, indicating that clonal hematopoiesis is not a major contributor to TP53 clonal expansions in LBC samples. Similarly, the removal of the tumor mutation from LBC mutation counts weakened significance but did not completely remove the association between TP53 mutations and cancer for BRCApv carriers. Our data suggests that BRCApv carriers that develop HGSC may have an excess of pathogenic TP53 mutations in LBC samples. This could indicate increased clonal evolution in the reproductive tract of these patients, potentially offering insights for cancer risk assessment.

Our study has some limitations. It was designed as an exploratory case-control study and thus sample size and power in detecting subgroup differences is limited. While we tried to optimize age-matching, some patients were out of range and comparison testing required age restrictions. In addition, potential confounders of somatic clonal expansion including smoking, obesity, and prior chemotherapy can only be properly adjusted for with larger sample sizes. Furthermore, variable DNA quality and the use of sonication in the library preparation could create artificial mutational background that could dilute differences between groups.

In conclusion, ultra-deep sequencing has revealed an excess of TP53 clonal expansions in LBC samples from BRCApv carriers that developed ovarian cancer. LBC samples might reflect increased clonal change in the reproductive tract that corresponds to cancer risk in BRCApv carriers. About a third of BRCApv carriers would develop ovarian cancer in their lifetime and these findings offer a venue of investigation to better risk stratify individuals with a minimally invasive sample. In addition, we provide further evidence of the link between TP53 somatic evolution and cancer, which might be relevant to other TP53-driven cancers.

Methods

Case selection

A case-control study was designed to include 20 patients without germline BRCApv and without HGSC, 20 patients without germline BRCApv and with HCSC, 20 patients with germline BRCApv without HGSC, and 10 patients with germline BRCApv diagnosed with HGSC (Supplementary Table S1), all of whom underwent gynecologic surgery at the University of Washington, with LBC and blood collected at the time of surgery. All patients had provided informed consent approved by the Human Subjects Division of the Institutional Review Board to be enrolled in the institutional tissue bank and participate in associated research studies. Cases and controls were selected for age matching within the groups of BRCApv germline carriers and non-carriers. Those with a history of cancer other than breast were excluded. Clinical information including exposure to chemotherapy, smoking, BMI, and preoperative CA-125 levels were recorded.

Sample processing and DNA extraction

Blood samples were collected perioperatively. LBC samples were collected in the operating room at the time of surgery using an endocervical cytobrush (Thinprep, Hologic, MA, USA) according to manufacturer’s protocol. DNA was extracted from the blood buffy coat with a salting out method, including RBC lysis with subsequent proteinase K digestion carried out at 37˚C overnight. LBCs were centrifuged at 1000 g for 3 min to form a cell pellet and stored at −80˚C until extraction. The cell pellet was thawed and DNA extracted using the Qiagen DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA, USA). Lysis and proteinase K digestion was carried out at 37˚C for two hours and the optional RNAse step was followed. Extracted DNA was kept at −80˚C prior to library preparation.

Sequencing and code availability

Duplex sequencing, an ultra-accurate error-correction sequencing approach [18, 19], was used to deeply sequence coding and surrounding non-coding regions of TP53. Library preparation was completed using the TwinStrand Biosciences Duplex Sequencing Kit (Seattle, WA) following manufacturer protocols, with 200 ng of starting DNA sonicated to an average length of 300 bp and ligated to sequencing adapters that include double-stranded molecular barcodes for error correction. The coding region of TP53 was captured with 120 bp biotinylated probes (TP53 human panel v1.0, TwinStrand Biosciences) using two rounds of hybridization capture. Sequencing data was analyzed as previously described using Duplex Seq Pipeline v2.1.2 from https://github.com/Kennedy-Lab-UW/Duplex-Seq-Pipeline. For each DNA molecule, raw reads were grouped to produce the two complementary single-strand consensus sequences, which were then compared to produce highly accurate duplex reads. Duplex reads were aligned to the human genome reference hg38, and hard clipped 40 bp at the 5 prime end to remove potential false mutations originated as consequence of damaged ends. Variant calling was done with the samtools mpileup-based variant caller and VCF file outputs were then converted to MAF files using Vcf2Maf script (https://github.com/mskcc/vcf2maf) with VEP version 104 and canonical TP53 transcript NM_000546.6 (ENST00000269305).

Mutational analysis and calculation of mutation frequencies

For all samples in the study, MAF files were concatenated and analyzed with R version 4.3.0 using packages listed in Supplementary Table S9. Variant Allele Frequency (VAF) was calculated by dividing the number of duplex mutant reads by the number of total duplex reads covering the variant position. Variants were classified as missense, nonsense, silent, indels, and splice based on VEP annotations [45]. Pathogenicity was determined using the AlphaMissense algorithm [23] and Seshat algorithm [24]. Variant filters, SNP calling, pathogenicity assessment, COSMIC data analysis, and hotspot codon determination are described in Supplementary Methods. For each sample we calculated coding MF as the number of coding mutations divided by the total number of duplex coding nucleotides sequenced and non-coding MF as the number of non-coding mutations divided by the total number of duplex non-coding nucleotides sequenced. Total MF was calculated by including the number of coding and non-coding mutations divided by the total number of duplex nucleotides sequenced. We also calculated TP53 mutation burden, hotspot TP53 MF, AlphaMissense pathogenic TP53 MF, and Seshat pathogenic MF to capture different aspects of TP53 clonal evolution in the study samples (Supplementary Methods).

Statistical analysis

Comparison of quantitative variables across groups was performed by Mann-Whitney U test, and correlations were tested with Spearman’s rank test. For paired comparisons between blood and LBC, the Wilcoxon signed-rank test with continuity correction was used. Associations between categorical variables were tested with Fisher’s exact test. Logistic regression was used to assess the association of cancer vs no cancer with TP53 mutations in blood and LBC. Covariates for the multivariable model were selected based on biological plausibility and included: age at surgery, smoking (ever vs never smoked), body mass index (BMI), plasma CA-125, and presence of BRCA1 or BRCA2. All tests were two-sided at α level (type 1 error rate) of 0.05. Statistical analyses and graphics were performed with Stata/SE 14.2 [46], SPSS version 28.0 [47], and R version 4.3.0 [48] (Supplementary Table S9).