DKEC: Domain Knowledge Enhanced Multi-Label Classification for Diagnosis Prediction

University of Virginia
EMNLP 2024
Overview of previous works
Figure 1. Pretrained models rely on broad, uncurated knowledge from pre-training, often missing task-specific information and label relationships; they're also expensive to fine-tune and deploy on resource-constrained devices. Prior work that models hierarchical label relations helps downstream tasks, but typically overlooks heterogeneous knowledge embedded in medical guidelines. Incorporating external domain knowledge can (i) enrich learning under few-shot supervision, mitigating data scarcity and limited model capacity, and (ii) act as training constraints via label relations to improve consistency and generalization.

Abstract

Multi-label text classification (MLTC) tasks in the medical domain often face the long-tail label distribution problem. Prior works have explored hierarchical label structures to find relevant information for few-shot classes, but mostly neglected to incorporate external knowledge from medical guidelines. This paper presents DKEC, Domain Knowledge Enhanced Classification for diagnosis prediction with two innovations: (1) automated construction of heterogeneous knowledge graphs from external sources to capture semantic relations among diverse medical entities, (2) incorporating the heterogeneous knowledge graphs in few-shot classification using a label-wise attention mechanism. We construct DKEC using three online medical knowledge sources and evaluate it on a real-world Emergency Medical Services (EMS) dataset and a public electronic health record (EHR) dataset. Results show that DKEC outperforms the state-of-the-art label-wise attention networks and transformer models of different sizes, particularly for the few-shot classes. More importantly, it helps the smaller language models achieve comparable performance to large language models.

Approach

Results

Video Presentation

Poster

BibTeX

@article{YourPaperKey2024,
  title     = {DKEC: Domain Knowledge Enhanced Multi-Label Classification for Diagnosis Prediction},
  author    = {Ge, Xueren and Satpathy, Abhishek and Williams, Ronald Dean and Stankovic, John and Alemzadeh, Homa},
  booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  month     = nov,
  year      = {2024},
  address   = {Miami, Florida, USA},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2024.emnlp-main.712/},
  doi       = {10.18653/v1/2024.emnlp-main.712},
  pages     = {12798--12813}
}