Date of Award
12-2025
Document Type
Thesis
Degree Name
Master of Science
Department
Computer Science
Abstract
The recent drive for a cleaner environment has led to an increase in greener modes of transportation, including the use of bicycles. These emerging trends have led to an increase in bicycle-related crashes. According to the CDC, an estimated 1,150 cyclists were killed in 2023, with 120,000 sustaining non-fatal injuries. Transportation planners and agencies are under pressure to make cycling safer by providing dedicated bike lanes and improving existing infrastructure. To accomplish this, it is very important to understand which factors contribute to bicycle crash severity across different roadways.
Despite recent advancements in modeling bicycle crash severity, most studies rely on classical methods such as ordered probit, ordered logit, and logistic regression. Modern machine-learning techniques, including ensemble forests and boosting algorithms, handle nonlinearities and high-dimensional data more effectively, but still face challenges with class imbalance, even when employing resampling techniques such as SMOTE. While many studies fail to generalize across other regions, this study integrated multi-state data from Texas, Colorado, and California, to enhance model transferability and explicitly included lane-type information extracted from national road-network data. Using spatial clustering and predictive modeling, it tackled class imbalance through class-weighted XGBoost with threshold tuning and leveraged high-performance computing (Bridges2) for scalable training on large, multi-state datasets.
Among all evaluated models, XGBoost produced the strongest performance. Accuracy was 86.7%, precision was balanced at 0.840, and recall values were 0.860, while the F1 score was 0.85, outperforming all other tested algorithms. Feature-importance analysis revealed that lighting conditions, roadway type, traffic exposure, and time of day were dominant predictors of severity. Hotspot analysis identified clusters of severe crashes along multilane arterials and poorly lit intersections. This study found that while bicycle infrastructure can reduce the likelihood of a crash, its proximity did not significantly affect crash severity. Instead, severity was primarily driven by roadway features and lighting conditions.
This research introduced a flexible, analytics-based framework that combined GIS-based mapping, computational learning methods, and advanced computing resources to support data-informed bicycle safety planning and policy development. The multistate model may enable researchers and policymakers to generate region-specific predictions of crash severity and advance safer cycling infrastructure.
Index Terms—Bicycle crash, class imbalance, crash severity, feature selection, geographic information systems (GIS), high-performance computing (HPC), machine learning, model transferability, spatial analysis, transportation safety, extreme gradient boosting (XGBoost).
Committee Chair/Advisor
Noushin Ghaffari
Committee Member
Mehdi Azimi
Committee Member
Judy Perkins
Committee Member
Lin Li
Publisher
Prairie View A&M University
Rights
© 2021 Prairie View A & M University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Date of Digitization
05/08/2026
Contributing Institution
John B Coleman Library
City of Publication
prairie view
MIME Type
Application/PDF
Recommended Citation
Ejikeme, P. (2025). Integrating Gis And Machine Learning To Uncover Spatial, Temporal, And Contributing Factors To Bicycle Crashes. Retrieved from https://digitalcommons.pvamu.edu/pvamu-theses/1671