Date of Award
5-2023
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Degree Discipline
Electrical Engineering
Abstract
Single-cell sequencing is a recently advanced revolutionary technology which enables researchers to obtain genomic, transcriptomic, or multi-omics information through gene expression analysis. It gives the advantage of analyzing highly heterogenous cell type information compared to traditional sequencing methods, which is gaining popularity in the biomedical area. Moreover, this analysis can help for early diagnosis and drug development of tumor cells, and cancer cell types. In the workflow of gene expression data profiling, identification of the cell types is an important task, but it faces many challenges like the curse of dimensionality, sparsity, batch effect, and overfitting. However, these challenges can be overcome by performing a feature selection technique which selects more relevant features by reducing feature dimensions. In this research work, recurrent neural network-based feature selection model is proposed to extract relevant features from high dimensional, and low sample size data. Moreover, a deep learning-based gene embedding model is also proposed to reduce data sparsity of single-cell data for cell type identification. The proposed frameworks have been implemented with different architectures of recurrent neural networks, and demonstrated via real-world micro-array datasets and single-cell RNA-seq data and observed that the proposed models perform better than other feature selection models. A semi-supervised model is also implemented using the same workflow of gene embedding concept since labeling data is very cumbersome, time consuming, and requires manual effort and expertise in the field. Therefore, different ratios of labeled data are used in the experiment to validate the concept. Experimental results show that the proposed semi-supervised approach represents very encouraging performance even though a limited number of labeled data is used via the gene embedding concept. In addition, graph attention based autoencoder model has also been studied to learn the latent features by incorporating prior knowledge with gene expression data for cell type classification.
Index Terms — Single-Cell Gene Expression Data, Gene Embedding, Semi-Supervised model, Incorporate Prior Knowledge, Gene-gene Interaction Network, Deep Learning, Graph Auto Encoder
Committee Chair/Advisor
Xiangfang Li
Committee Co-Chair:
Xishuang Dong
Committee Member
John Fuller
Committee Member
Lijun Qian
Committee Member
Lin Li
Publisher
Prairie View A&M University
Rights
© 2021 Prairie View A & M University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Date of Digitization
6/09/2023
Contributing Institution
John B Coleman Library
City of Publication
Prairie View
MIME Type
Application/PDF
Recommended Citation
Chowdhury, S. (2023). Cell Type Classification Via Deep Learning On Single-Cell Gene Expression Data. Retrieved from https://digitalcommons.pvamu.edu/pvamu-dissertations/15