Introduction

Every year, over 35,000 people worldwide undergo donor nephrectomy to provide a kidney for transplantation to a recipient in need1. While living donor kidney transplantation (KT) constitutes one-third of all kidney transplants in the United States, the proportion of living donor KTs exceeds 60% in Asian countries, including South Korea2,3. Since living kidney donors are undergoing nephrectomy for reasons unrelated to their own health, it is crucial to ensure their long-term safety after the procedure4,5. Although kidney donors have a slightly increased risk of end-stage renal disease (ESRD) compared to matched healthy non-donors, the absolute risk remains very low6,7. Yet, due to the loss of half the nephrons after nephrectomy, kidney donation may lead to long-term deterioration of kidney function8,9.

To minimize risks for kidney donors, appropriate selection of the kidney for transplant through assessment of pre-donation glomerular filtration rate (GFR), differential renal function, and vascular anatomy is essential10,11,12,13,14. If renal function is suitable and there are no abnormalities in the kidney parenchyma and vascular structure, the kidney with lower function is typically selected for donation13,14. While nuclear renography has traditionally been used to measure differential renal function14,15, several studies have proposed methods for estimating split renal function based on kidney volume proportions determined by computerized tomography (CT) volumetry12,16,17,18. Moreover, a recent study suggested that CT volumetry is superior to nuclear renography in predicting postoperative renal function in living kidney donors19.

Although predicting the risk of ESRD for individual donor candidates is challenging, multiple demographic and laboratory data can be combined to estimate the long-term risk of ESRD after donation (www.transplantmodels.com/esrdrisk)20. However, there are limited studies that focused on predicting individual ESRD risk after donation and estimating post-donation GFR according to the side of donation. Several factors, including race, sex, albuminuria, hypertension, smoking history, and diabetes, are known to contribute to renal dysfunction after kidney donation, but there are few studies comparing the relative risk based on the side of donation13,14. It has been reported that CT volumetry is superior to nuclear renography in predicting residual kidney function in living kidney donors19. In another study, the clinical characteristics of living donors and remnant kidney volume based on CT volumetry were analyzed to estimate the degree of compensation of the contralateral kidney after donation21; however, the number of study patients was small, and the prediction model was not validated with another cohort.

In this study, we aimed to develop and validate a prediction model for post-donation renal function using clinical variables from live kidney donors and the kidney volume measured by CT volumetry.

Results

Baseline characteristics

The baseline characteristics of kidney donors in the training and validation cohorts are presented in Table 1. Clinical variables, including the side of donation and remnant kidney volume, were compared between the two cohorts. In the training cohort, female donors (55.7% vs. 50.1%, p = 0.016) and donors with a recent smoking history (24.4% vs. 17.7%, p < 0.001) were more common compared to the validation cohort. Additionally, the training cohort had higher mean values of variables such as systolic blood pressure (124.0 ± 15.4 vs. 116.3 ± 13.5 mmHg, p < 0.001), diastolic blood pressure (80.0 ± 10.1 vs. 69.0 ± 10.0 mmHg, p < 0.001), eGFR (103.6 ± 13.0 vs. 100.5 ± 12.7 mL/min/m2, p < 0.001), remnant kidney volume (173.0 ± 30.3 vs. 167.4 ± 32.3 mL, p < 0.001), donated kidney volume (168.5 ± 29.4 vs. 165.6 ± 31.3 mL, p = 0.033), and remnant kidney proportion (50.7 ± 2.2% vs. 50.3 ± 2.9%, p < 0.001). The left kidney was used less commonly in the training cohort compared to the validation cohort (55.4% vs. 65.5%, p < 0.001).

Table 1 Baseline characteristics of the training cohort and the validation cohort.

Univariable and multivariable analysis in the training cohort and prediction scoring model

Univariable logistic regression analysis revealed that the primary endpoint (decrease in eGFR to less than 60 mL/min/1.73 m2 at six months post-donation) was significantly associated with male sex (Odds ratio [OR] 2.19, 95% Confidence interval [CI] 1.66–2.92, p < 0.001), age at operation (OR 1.11, 95% CI 1.09–1.13, p < 0.001), BMI (OR 1.07, 95% CI 1.03–1.12, p = 0.001), systolic blood pressure (OR 1.02, 95% CI 1.01–1.03, p < 0.001), diastolic blood pressure (OR 1.02, 95% CI 1.01–1.03, p = 0.008), HbA1c (OR 2.79, 95% CI 1.81–4.32, p < 0.001), eGFR (OR 0.86, 95% CI 0.84–0.88, p < 0.001), hypertension (OR 2.60, 95% CI 1.80–3.72, p < 0.001), remnant kidney volume (OR 0.99, 95% CI 0.98–0.99, p < 0.001), donated kidney volume (OR 0.99, 95% CI 0.99–1.00, p < 0.001), and remnant kidney proportion (OR 0.87, 95% CI 0.82–0.93, p < 0.001) (Table 2). Multivariable logistic regression analysis showed that male sex, older age at operation, lower eGFR before donation, and smaller RKP were independently associated with the primary endpoint (Table 2).

Table 2 Univariable and multivariable logistic regression in the training cohort.

Based on the results of multivariable analysis, we used four variables—sex, age at operation, eGFR, and RKP to stratify the risk of renal dysfunction after kidney donation; specifically, we divided the three continuous variables (age, eGFR, RKP) into five, four, and three categories, respectively (Table 3). We assigned a point of 1 for male sex, and the points for other variables were assigned using the rounded quotient of β (W − WREF)/βmale. The risk score is obtained by summing the points for four variables. The total number of donors according to the risk score in the training cohort is shown as blue bars on the histogram in Fig. 1A.

Table 3 Risk scores for event occurrence in living kidney donors.
Figure 1
figure 1

The total number of donors according to the risk score (A) and the probability of occurrence of the primary endpoint for each score (B) in the training cohort.

The probability of occurrence of the primary endpoint was calculated for each score (Fig. 1B). The probability of occurrence of the primary endpoint increased steeply as the risk score increased. Specifically, while the predicted probability of the primary endpoint was less than 8% when the total score was less than 5, the probability increased to over 48% when the total score was more than 6 (Fig. 1B). The performance of this score-based prediction system using four variables was excellent in the training cohort (Fig. 2). The expected probabilities were similar to the observed probabilities (Fig. 2A), and the calibration according to the Hosmer–Lemeshow test showed considerable concordance (chi-squared statistic = 1.99, P value = 0.98) (Fig. 2B). The prediction model had good discriminative capacity, with a c-statistic value of 0.89 (95% CI 0.87–0.91) (Fig. 2C).

Figure 2
figure 2

The performance of the score-based prediction system in the training cohort (AC) and the validation cohort (DF). Observed and expected probabilities of the primary endpoint in the training cohort (A,D). The calibration according to the Hosmer–Lemeshow test (B,E). Receiver operating characteristic curves of the score-based prediction system to predict the primary endpoint (C,F).

Validation cohort

To assess the validity of the prediction model, the scoring system was applied to an external validation cohort. Although the observed probability was higher than the expected probability at the risk score of 9, there was a strong overall similarity between the observed and expected probabilities in the validation cohort (Fig. 2D). While the calibration was lower compared to the training cohort, the Hosmer–Lemeshow test still demonstrated good concordance (chi-squared statistic = 5.45, P value = 0.71) (Fig. 2E). Furthermore, the score-based prediction model performed well in terms of discrimination in the validation cohort as well (c-statistic 0.87, 95% CI 0.83–0.90) (Fig. 2F).

Discussion

In this study, we developed a simple prediction model comprising four variables (sex, age, eGFR, RKP) for renal deterioration after donor nephrectomy and validated the scoring system externally using an independent external cohort. Postoperative eGFR can be predicted by donor age, sex, preoperative eGFR, and RKP. Scores were assigned based on the degree to which each factor was related to the occurrence of the primary endpoint. Among the factors, preoperative eGFR and age had more significant influences on the primary endpoint. Unlike previous studies, RKP, determined by kidney volume from CT volumetry, was included as an adjustable factor to predict renal deterioration after donor nephrectomy. In other words, preoperative measurement of RKP by CT volumetry may be able to assist surgeons in deciding which kidney is more suitable for donation in order to preserve renal function as much as possible after donor nephrectomy. We expect that this scoring system will be useful for estimating renal function after donor nephrectomy.

Several studies have estimated the risk of ESRD in kidney donors. Grams et al. suggested that the long-term risk of ESRD in kidney donor candidates can be estimated using multiple demographic and health characteristics20; however, their model is designed to predict the ESRD risk if a person does not donate a kidney. In other words, the model does not predict kidney function after donation, and a total of 10 factors were used for prediction, which cannot be adjusted unless the donor is changed. Okumura et al. suggested a compensation prediction score using four factors, including age, sex, history of hypertension, and the ratio of the remnant kidney volume to body weight21. They defined favorable compensation as post-donation eGFR at 1 year being more than 60% of the pre-donation eGFR. However, for this study, only 133 living donors were enrolled at a single center and there was no external validation. Rook et al. predicted post-donation renal function impairment, defined as a GFR of ≤ 60 mL/min/1.73 m2, using pre-donation eGFR, BMI, and age22; however, this study evaluated only 125 donors from a single center and there was no external validation as well. Furthermore, they did not measure remnant kidney volume from CT volumetry.

It has already been reported that age, male sex, and lower preoperative eGFR are significantly associated with renal dysfunction and ESRD after donor nephrectomy21,22. However, these factors cannot be adjusted for the purpose of reducing the risk of ESRD. Therefore, we believe it is important that RKP was found as a significant factor in multivariable logistic regression analysis because RKP is the only adjustable factor for a live kidney donor candidate. The predicted probability of the primary endpoint increased steeply with a 1-point increase in the scoring system, especially when the total risk score was more than 5 (Fig. 1B). In other words, in donors with a higher total risk score, the RKP should be strictly preserved by selecting the smaller kidney for donation.

There are some limitations to this study. This was a retrospective observational study with possible selection biases and confounders. Nevertheless, the prediction model for post-donation renal function developed based on the cohort of 1628 patients showed robust results in external validation with 690 patients. Another limitation is that the scoring system was based on eGFR at 6 months post-donation rather than long-term clinical outcomes. However, it has been known that eGFR at 6 months post-donation is useful for predicting long-term renal function after kidney donation23. In addition, there might be a certain amount of error in kidney volume measurement between the training and validation cohorts. It should also be considered that we measured the total volume of each kidney rather than the volume of the cortex, which more directly reflects renal function.

In conclusion, we developed a simple scoring system comprising four variables (sex, age, eGFR, RKP) to predict renal function after living donor nephrectomy and validated its robustness in an independent external cohort. We expect that this model would be useful for estimating the risk of post-donation renal dysfunction and for determining the more appropriate side of the kidney to ensure the safety of kidney donors.

Methods

Study population and data sources

We conducted a multicenter retrospective cohort study involving adult patients (≥ 18 years old) who underwent donor nephrectomy between May 2005 and December 2019 at two tertiary referral centers (Asan Medical Center and Samsung Medical Center) in South Korea. Approval from the institutional review board (IRB) was obtained at each center (Approval numbers: Asan Medical Center IRB 2021-0465, Samsung Medical Center IRB 2021-07-013-001). Asan Medical Center IRB and Samsung Medical Center IRB waived written informed consent because of the retrospective and noninvasive nature of this study. clinical and research activities being reported are consistent with the Principles of the Declaration of Istanbul as outlined in the 'Declaration of Istanbul on Organ Trafficking and Transplant Tourism'. A total of 4295 individuals underwent donor nephrectomy at the two centers during the study period, among whom we excluded the following donors: (1) donors who did not have dynamic CT kidney volumetry before donor nephrectomy (n = 994), (2) donors who did not have postoperative serum creatinine measured 4–8 months following donor nephrectomy (n = 781), (3) donors with one or more renal stones (n = 198), and (4) donors with complicated cysts (more than IIF category by the Bosniak classification24) (n = 4). Finally, the study included 1628 patients in the training cohort (Asan Medical Center) and 690 patients in the validation cohort (Samsung Medical Center).

The primary endpoint was a decrease in estimated GFR (eGFR) to less than 60 mL/min/1.73 m2 at 6 months post-donation. This endpoint was determined based on previous reports in which chronic kidney disease was defined as kidney damage or glomerular filtration rate < 60 mL/min/1.73 m2 for three months or more25. And early post-donation renal function was associated with the subsequent risk of ESRD in living kidney donors23. Specifically, eGFR measured 6 months after donation was independently associated with ESRD risk, even after adjusting for pre-donation characteristics.

We investigated several factors that are known to be associated with decreased eGFR in living kidney donors, including sex, age, body mass index (BMI), blood pressure, hemoglobin A1c, preoperative eGFR, hypertension medication, and smoking history20. Remnant kidney proportion (RKP) was defined as the proportional remnant kidney volume per total kidney volume, measured by dynamic CT kidney volumetry (RKP = remnant kidney volume (mL)/total kidney volume (mL)).

Decision of donation side and postoperative management

The decision regarding which kidney to donate was made after considering factors such as vasculature, preoperative eGFR, split renal function assessed by renal scintigraphy (technetium-99m diethylenetriaminepentaacetic acid or Tc-99m dimercaptosuccinic acid), kidney volume measured by CT volumetry, and the presence of atypical or large renal cysts. If there was a considerable difference in the relative function determined by renal scintigraphy or kidney volume measured by CT volumetry between the two kidneys, the one with inferior function was selected for donation. A kidney with calcification or stenosis of the renal artery was also considered for donation. The left kidney was selected if there was no significant difference between the two kidneys.

Donors underwent hand-assisted laparoscopic surgery for nephrectomy and were discharged five days after surgery. Renal function was assessed using eGFR, obtained through the Chronic Kidney Disease Epidemiology Collaboration equation, at 1 week, 1 month, 3 months, 6 months, and 1 year after surgery.

Kidney volume measurement

Preoperative CT scans were performed using a 16- or 64-multidetector CT scanner (LightSpeed 16 or Optima CT660, GE Healthcare; Somatom Sensation 16, Siemens Healthcare). The scanning protocol consisted of three phases: unenhanced phase, corticomedullary phase (30 s after contrast injection), and nephrographic phase (90 s after contrast injection). The scanning parameters were as follows: pitch, 1.5; tube voltage, 120 kV; tube current, 210–240 mA; and slice thickness, 3–5 mm.

Kidney volume for each patient was measured using the GE Advantage Windows Workstation (version 3.0; General Electric Medical Systems, Milwaukee, WI, USA) in the training cohort, and the Aquaris iNtuition Viewer version 4.4.13 (TeraRecon; Durham, NC, USA) in the validation cohort. Kidney length was measured using coronal sections, and kidney volume was determined from contiguous slices. In coronal section images with parenchymal enhancement, the region of interest was drawn around the kidney, and slices were reconstructed at 1-mm intervals to obtain a 3D volume-rendered image of the kidney. Volume was calculated by multiplying the sum of areas from each slice by the reconstruction interval at the workstation.

Statistical analysis

Statistical analysis was performed using R Statistics ver. 4.04. Continuous variables were compared using the t-test, while categorical variables were compared using the Chi-squared test or Fisher's exact test, as appropriate. Clinical variables for event prediction were entered into the univariate analysis. To identify factors independently related to an event, a bootstrap statistical technique was used. Resampling was performed 1000 times, and backward elimination was conducted. Only factors present in more than 50% of the 1000 samples were selected as final items. A multivariable analysis was performed with the final variables, and the coefficient of each variable was incorporated into a scoring system. This point-scoring algorithm was applied to patients in the validation cohort, and the risk score was derived by summing the points corresponding to each variable. The scoring system was out of 10, and the probability of event occurrence for each score was calculated. To evaluate the performance of the scoring system, we used the Hosmer and Lemeshow goodness-of-fit (GOF) test, receiver operating characteristic (ROC) curves, and calibration curves.