A medication extraction framework for electronic health records
Abstract
This thesis addresses the problem of concept and relation extraction in medical documents. We present a medical concept and relation extraction system (medNERR) that incorporates hand-built rules and constrained conditional models.
We focus on two concept types (i.e., medications and medical conditions) and the pairwise administered-for relation between these two concepts. For medication extraction, we design a rule-based baseline medNERRgreedy med that identifies medications using the UMLS dictionary. We enhance medNERRgreedy med with information from topic models and additional corpus-derived heuristics, and show that the final medication extraction system outperforms the baseline and improves on state-of-the-art systems. For medical conditions extraction we design a Hidden Markov Model with conditional constraints. The conditional constraints frame world knowledge into a probabilistic model and help support model decisions. We approach relation extraction as a sequence labeling task, where we label the context between the medications and the medical concepts that are involved in an administered-for relation. We use a Hidden Markov Model with conditional constraints for labeling the relation context.
We show that the relation extraction system outperforms current state of the art systems and that its main advantage comes from the incorporation of domain knowledge through conditional constraints. We compare our sequence labeling approach for relation extraction to a classification approach and show that our approach improves final system performance.