Date of Award

12-2025

Document Type

Thesis

Degree Name

Master of Science

Department

Computer Science

Abstract

The recent drive for a cleaner environment has led to an increase in greener modes of transportation, including the use of bicycles. These emerging trends have led to an increase in bicycle-related crashes. According to the CDC, an estimated 1,150 cyclists were killed in 2023, with 120,000 sustaining non-fatal injuries. Transportation planners and agencies are under pressure to make cycling safer by providing dedicated bike lanes and improving existing infrastructure. To accomplish this, it is very important to understand which factors contribute to bicycle crash severity across different roadways.

Despite recent advancements in modeling bicycle crash severity, most studies rely on classical methods such as ordered probit, ordered logit, and logistic regression. Modern machine-learning techniques, including ensemble forests and boosting algorithms, handle nonlinearities and high-dimensional data more effectively, but still face challenges with class imbalance, even when employing resampling techniques such as SMOTE. While many studies fail to generalize across other regions, this study integrated multi-state data from Texas, Colorado, and California, to enhance model transferability and explicitly included lane-type information extracted from national road-network data. Using spatial clustering and predictive modeling, it tackled class imbalance through class-weighted XGBoost with threshold tuning and leveraged high-performance computing (Bridges2) for scalable training on large, multi-state datasets.

Among all evaluated models, XGBoost produced the strongest performance. Accuracy was 86.7%, precision was balanced at 0.840, and recall values were 0.860, while the F1 score was 0.85, outperforming all other tested algorithms. Feature-importance analysis revealed that lighting conditions, roadway type, traffic exposure, and time of day were dominant predictors of severity. Hotspot analysis identified clusters of severe crashes along multilane arterials and poorly lit intersections. This study found that while bicycle infrastructure can reduce the likelihood of a crash, its proximity did not significantly affect crash severity. Instead, severity was primarily driven by roadway features and lighting conditions.

This research introduced a flexible, analytics-based framework that combined GIS-based mapping, computational learning methods, and advanced computing resources to support data-informed bicycle safety planning and policy development. The multistate model may enable researchers and policymakers to generate region-specific predictions of crash severity and advance safer cycling infrastructure.

Index Terms—Bicycle crash, class imbalance, crash severity, feature selection, geographic information systems (GIS), high-performance computing (HPC), machine learning, model transferability, spatial analysis, transportation safety, extreme gradient boosting (XGBoost).

Committee Chair/Advisor

Noushin Ghaffari

Committee Member

Mehdi Azimi

Committee Member

Judy Perkins

Committee Member

Lin Li

Publisher

Prairie View A&M University

Rights

© 2021 Prairie View A & M University

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Date of Digitization

05/08/2026

Contributing Institution

John B Coleman Library

City of Publication

prairie view

MIME Type

Application/PDF


Share

COinS