Extended Data Fig. 8: Function scores for missense variants outperform predictions from computational models. | Nature Genetics

Extended Data Fig. 8: Function scores for missense variants outperform predictions from computational models.

From: Saturation genome editing maps the functional spectrum of pathogenic VHL alleles

Extended Data Fig. 8

a,b, ROC curves indicate the performance of different metrics at distinguishing disease-associated missense variants in VHL. The metrics evaluated were SGE function scores, REVEL scores5, boostDM scores from the VHL-ccRCC model8, EVE scores7, VARITY R scores9 and CADD scores25. Missense SNVs were included if scored by all metrics (that is, those present in SGE data from p.M54 to p.A207). In a, n = 65 missense variants deemed ‘pathogenic’ in ClinVar were distinguished from n = 87 missense SNVs deemed neutral (as in Fig. 4g). In b, missense variants present in the gold-standard set of ccRCC-associated SNVs (n = 73) were classified against the same neutral set of variants as in (a). c, Function scores for n = 953 missense SNVs are plotted versus scores from each computational predictor, colored by ClinVar status. d, Function scores were used to define two sets of unseen variants (that is, those absent from ClinVar, cBioPortal, population sequencing and VHLdb). Each metric was assessed on its ability to distinguish unseen missense SNVs with function scores below −0.479 (n = 19) from the set of missense SNVs with function scores closest to 0 (n = 100). e, Missense variants classified by SGE as LOF1/LOF2 or neutral were grouped by whether they were discordantly classified by 0, 1 to 2 or all 3 top variant effect predictors (VARITY, EVE and REVEL). Function scores, EVE scores, vertebrate phyloP scores and FoldX predictions are shown across groups (boxplot: center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; all points shown except n = 22 SNVs with FoldX scores greater than 12.0 in the right panel).

Back to article page