All Theses

Performance Analysis Of Attention Based Deep Learning Models On Named Entity Recognition In Electronic Health Records

Tariq Abdul Quddoos, Prairie View A&M University

Date of Award

5-2023

Document Type

Thesis

Degree Name

Master of Science

Department

Electrical Engineering

Abstract

Mining Clinical Notes for relevant information has attracted a lot of interest in Natural Language Processing (NLP). Medical documents contain language whose distributions vary from that of the general domain and have a vocabulary that evolves with time. Recently, attention based deep learning language models have become the new state-of-the-art in language modeling capturing strong representations of language with respect to the context it is in, improving on classic clinical NLP task such as medication detection, and medication classification.

In this thesis research, the Harvard Medical School’s 2022 National Clinical NLP Challenges (n2c2) is considered where the Contextualized Medication Event Dataset (CMED) has been given for the challenge. CMED is a dataset of unstructured Electronic Health Records (EHRs) and annotated notes that contain task relevant information about the EHRs. The goal of the challenge is to develop effective solutions for extracting contextual information related to medications from EHRs using data driven methods. In this thesis, variations of Google’s attention-based Bert architecture have been applied for this challenge, namely, Bert Base, BioBert, and two variations of Bio+Clinical Bert, that are pre-trained on general domain, biomedical domain, and clinical domain corpora, respectively. They are used to perform named entity recognition (NER) for medication extraction and medical event detection. Pre-processing methods have been developed for breaking down EHRs for compatibility with the Bert model on NER task, and the variations of Bert are fine-tuned with CMED for the n2c2 task. Performance analysis has been carried out using a script based on constructing medical terms from the evaluation portion of CMED with metrics including recall, precision, and F1-Score. The results demonstrate that Bio+Clinical Bert outperforms Bert Base and BioBert, as well as three of the top ten performers in the challenge.

Index terms: Bi-directional encoder representations from transformers, electronic health records, natural language processing, transformer

Committee Chair/Advisor

Lijun Qian

Committee Member

Xishuang Dong

Committee Member

Xiangfang Li

Committee Member

Richard Wilkins

Publisher

Prairie View A&M University

Rights

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Date of Digitization

2/8/2024

Contributing Institution

John B Coleman Library

City of Publication

Prairie View

MIME Type

Application/PDF

Recommended Citation

Quddoos, T. A. (2023). Performance Analysis Of Attention Based Deep Learning Models On Named Entity Recognition In Electronic Health Records. Retrieved from https://digitalcommons.pvamu.edu/pvamu-theses/1533

Download

Catalog Record

COinS

All Theses

Performance Analysis Of Attention Based Deep Learning Models On Named Entity Recognition In Electronic Health Records

Date of Award

Document Type

Degree Name

Department

Abstract

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Publisher

Rights

Date of Digitization

Contributing Institution

City of Publication

MIME Type

Recommended Citation

Browse

Search

Author Corner

All Theses

Performance Analysis Of Attention Based Deep Learning Models On Named Entity Recognition In Electronic Health Records

Author

Date of Award

Document Type

Degree Name

Department

Abstract

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Publisher

Rights

Date of Digitization

Contributing Institution

City of Publication

MIME Type

Recommended Citation

Share

Browse

Search

Author Corner