Date of Award
5-2023
Document Type
Thesis
Degree Name
Master of Science
Degree Discipline
Electrical Engineering
Abstract
Mining Clinical Notes for relevant information has attracted a lot of interest in Natural Language Processing (NLP). Medical documents contain language whose distributions vary from that of the general domain and have a vocabulary that evolves with time. Recently, attention based deep learning language models have become the new state-of-the-art in language modeling capturing strong representations of language with respect to the context it is in, improving on classic clinical NLP task such as medication detection, and medication classification.
In this thesis research, the Harvard Medical School’s 2022 National Clinical NLP Challenges (n2c2) is considered where the Contextualized Medication Event Dataset (CMED) has been given for the challenge. CMED is a dataset of unstructured Electronic Health Records (EHRs) and annotated notes that contain task relevant information about the EHRs. The goal of the challenge is to develop effective solutions for extracting contextual information related to medications from EHRs using data driven methods. In this thesis, variations of Google’s attention-based Bert architecture have been applied for this challenge, namely, Bert Base, BioBert, and two variations of Bio+Clinical Bert, that are pre-trained on general domain, biomedical domain, and clinical domain corpora, respectively. They are used to perform named entity recognition (NER) for medication extraction and medical event detection. Pre-processing methods have been developed for breaking down EHRs for compatibility with the Bert model on NER task, and the variations of Bert are fine-tuned with CMED for the n2c2 task. Performance analysis has been carried out using a script based on constructing medical terms from the evaluation portion of CMED with metrics including recall, precision, and F1-Score. The results demonstrate that Bio+Clinical Bert outperforms Bert Base and BioBert, as well as three of the top ten performers in the challenge.
Index terms: Bi-directional encoder representations from transformers, electronic health records, natural language processing, transformer
Committee Chair/Advisor
Lijun Qian
Committee Member
Xishuang Dong
Committee Member
Xiangfang Li
Committee Member
Richard Wilkins
Publisher
Prairie View A&M University
Rights
© 2021 Prairie View A & M UniversityThis work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Date of Digitization
2/8/2024
Contributing Institution
John B Coleman Library
City of Publication
Prairie View
MIME Type
Application/PDF
Recommended Citation
Quddoos, T. A. (2023). Performance Analysis Of Attention Based Deep Learning Models On Named Entity Recognition In Electronic Health Records. Retrieved from https://digitalcommons.pvamu.edu/pvamu-theses/1533