Abstract, originally published in Epilepsia
Objective: To use clinically informed machine learning to derive prediction models for early and late premature death in epilepsy.
Methods: This was a population-based primary care observational cohort study. All patients meeting a case definition for incident epilepsy in the Health Improvement Network database for inclusive years 2000-2012 were included. A modified Delphi process identified 30 potential risk factors. Outcome was early (within 4 years of epilepsy diagnosis) and late (4 years or more from diagnosis) mortality. We used regularized logistic regression, support vector machines, Gaussian naive Bayes, and random forest classifiers to predict outcomes. We assessed model calibration, discrimination, and generalizability using the Brier score, mean area under the receiver operating characteristic curve (AUC) derived from stratified fivefold cross-validation, plotted calibration curves, and extracted measures of association where possible.
Results: We identified 10 499 presumed incident cases from 11 194 182 patients. All models performed comparably well following stratified fivefold cross-validation, with AUCs ranging from 0.73 to 0.81 and from 0.71 to 0.79 for early and late death, respectively. In addition to comorbid disease, social habits (alcoholism odds ratio [OR] for early death = 1.54, 95% confidence interval [CI] = 1.12-2.11 and OR for late death = 2.62, 95% CI = 1.66-4.16) and treatment patterns (OR for early death when no antiseizure medication [ASM] was prescribed at baseline = 1.33, 95% CI = 1.07-1.64 and OR for late death after receipt of enzyme-inducing ASM at baseline = 1.32, 95% CI = 1.04-1.66) were significantly associated with increased risk of premature death. Baseline ASM polytherapy (OR = 0.55, 95% CI = 0.36-0.85) was associated with reduced risk of early death.
Significance: Clinically informed models using routine electronic medical records can be used to predict early and late mortality in epilepsy, with moderate to high accuracy and evidence of generalizability. Medical, social, and treatment-related risk factors, such as delayed ASM prescription and baseline prescription of enzyme-inducing ASMs, were important predictors.