Date of Award

8-2025

Document Type

Thesis

Degree Name

Master of Science

Degree Discipline

Computer Science

Abstract

Transposons play a pivotal role in genome evolution and contribute to genome expansion, with novel transposon discovery offering valuable insights into genetic function and its implications for health and diseases. This study focused on identifying novel transposons within a large-scale genomic dataset using unsupervised machine learning approaches to uncover hidden patterns and detect elements that deviated from known transposon groups. To address the challenge of data scale and to deal with the computational complexity, two complementary approaches were adopted: first, analyzing the entire dataset to picture the broad spectrum of diversity; second, a strategic reduction and filtering method to produce a manageable dataset that enabled efficient identification of novel elements.

The density-based clustering algorithm called Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) was applied to both the full and filtered datasets due to its robust capacity to detect outliers. In our study, transposons were considered as outliers that do not fit into any well-defined clusters and are highly suitable for pinpointing potentially different elements. That is how outliers represent promising candidates for novel transposes. In addition to detecting novel transposons, the clustering patterns revealed meaningful phylogenetic relationships among transposon groups, shedding light on their evolutionary trajectories and biological interconnections. This integrated method significantly enhanced the detection of novel transposons, deepening understanding of their impact on genomic architecture and their potential roles in human health. Ultimately, these findings offer a more nuanced view of genome dynamics and expand the landscape of functional genomics research.

Index Terms: Bioinformatics, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) Clustering, high performance computing, transposons, unsupervised learning.

Committee Chair/Advisor

Noushin Ghaffari

Committee Member

Lin Li

Committee Member

Sherri S. Frizell

Committee Member

Md. Shuvo

Publisher

Prairie View A&M University

Rights

© 2021 Prairie View A & M University

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Date of Digitization

10/22/2025

Contributing Institution

J. B . Coleman Library

City of Publication

Prairie View

MIME Type

Application/PDF


Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.