Development and application of natural language processing methods to medical causes of death for public health purposes
Introduction - Medical causes of death are recorded by physicians on death certificates in free-form text using a wide variety of expressions. Natural language processing (NLP) methods make it possible to analyze this data quickly. This article describes the approach taken to develop these methods and illustrates their use for public health alerts. Methods - The identification of high-performing methods is part of an international challenge. This challenge involves providing participating teams with a dataset—comprising free-text descriptions and ICD-10 codes, which are considered the gold standard—to develop their ICD-10 code prediction tools, and then independently evaluating the tools’ performance on a test set. Certain methods were used to classify free-text causes into groups relevant for reactive mortality surveillance. Results - The best results were obtained using neural networks on the U.S. dataset and with rule-based methods on the French dataset. A hybrid method, combining rules and support vector machine (SVM) classification, produced better or comparable results on both datasets. Analysis of the temporal evolution of four cause groupings for reactive mortality surveillance highlighted expected (epidemics) and unusual events. Discussion - The challenge experience and the application for alert-oriented surveillance demonstrate the value and performance of NLP methods in supporting the reactive use of mortality data for public health.
Author(s): Robert Aude, Baghdadi Yasmine, Zweigenbaum Pierre, Morgand Claire, Grouin Cyril, Lavergne Thomas, Névéol Aurélie, Fouillet Anne, Rey Grégoire
Publishing year: 2019
Pages: 603-609
Weekly Epidemiological Bulletin, 2019, n° 29-30, p. 603-609
In relation to
Our latest news
news
2026 “Sexual Behavior” Survey (ERAS) for men who have sex with men
news
Hervé Maisonneuve has been appointed scientific integrity officer for a...
news