INFORMATIK 2012 P-208, 1237-1251 (2012).

Information extraction from unstructured electronic health records and integration into a data warehouse

Georg Fette , Maximilian Ertl , Anja Wörner , Peter Kluegl , Stefan Störk and Frank Puppe


For epidemiological research, the usage of standard electronic health records may be regarded a convenient way to obtain large amounts of medical data. Unfortunately, large parts of clinical reports are in written text form and cannot be used for statistical evaluations without appropriate preprocessing. This functionality is one of the main tasks in medical language processing. Here we present an approach to extract information from medical texts and a workflow to integrate this information into a clinical data warehouse. Our technique for information extraction is based on Conditional Random Fields and keyword matching with terminology-based disambiguation. Furthermore, we present an application of our data warehouse in a clinical study.

