Overfit AI

Unveiling pedestrian injury severity risk factors: A comparative analysis of machine learning and deep learning methods

Pedestrian safety has emerged as a critical global concern, accounting for a notable share of all road traffic injuries and fatalities. The study described investigates risk factors of vehicle–pedestrian collisions through a systematic comparison of traditional machine learning models—Random Forest (RF) and Gradient-Boosted Trees (GBT)—with a Deep Neural Network (DNN) model, providing a unified framework to analyze consistency among their outputs. Using detailed crash data from Rawalpindi, Pakistan, the research applies these computational frameworks to identify predictors of pedestrian crash severity and to establish interpretable, data-driven insights for policy and intervention design.

Research Motivation and Problem Definition

Pedestrian-vehicle collisions represent one of the most severe categories of road traffic incidents due to their high likelihood of injury or fatality. While numerous studies have leveraged classic statistical or individual machine learning models to predict crash outcomes, few have specifically compared the consistency of predictive factors between machine learning and deep learning systems. This study fills that gap by systematically assessing whether major risk factors identified by different modeling paradigms remain stable across analytical frameworks, an aspect essential for building trustworthy policy-oriented models.

Data and Model Frameworks

Crash data from Rawalpindi’s urban road network provide the study’s empirical basis. These datasets incorporate diverse features, including time of crash, weather conditions, seasonal variability, roadway characteristics, and behavioral variables such as speeding or distractions. The research employs three main predictive approaches: Random Forest, Gradient-Boosted Trees, and Deep Neural Networks. Each offers unique methodological strengths—RF and GBT are ensemble techniques capable of capturing nonlinear interactions and variable importance, while DNN models excel at learning complex, hierarchical representations of input features.

The study also integrates Shapley Additive Explanations (SHAP) to unpack model interpretability. SHAP assigns each feature a contribution score toward the prediction outcome, helping reveal which factors most affect pedestrian injury severity, thereby rendering deep learning outputs more explainable and actionable.

Comparative Performance Analysis

Results show that the DNN model achieved the highest predictive accuracy (93.51%), indicating the model’s superior ability to generalize complex feature interactions. However, when evaluating the consistency of key features, RF and DNN models both pinpointed nearly identical core risk contributors—seasonal variation, weather conditions, speeding, distractions, crash type, and temporal factors. This remarkable alignment underscores that, despite differences in model architecture, both deep and ensemble learning approaches converge on similar explanatory factors, increasing confidence in interpretability and policy relevance.

The Gradient-Boosted Tree model, while slightly less accurate, still demonstrated robust performance, validating its suitability for smaller datasets or computationally constrained scenarios. Feature attribution through SHAP analytics demonstrated that overspeeding and nighttime crashes carry especially high importance scores, aligning with empirical evidence from international pedestrian safety literature.

Implications and Conclusions

The study’s findings have crucial implications for data-driven traffic safety policy, especially in low- and middle-income countries where crash datasets are often inconsistent or incomplete. The consistency in identified risk factors between RF and DNN models suggests reliability and robustness of these insights for practical deployment. The research contributes methodological advancements by integrating SHAP interpretation layers within deep learning frameworks, making complex black-box models more transparent and adoptable by practitioners.

In conclusion, the study demonstrates that the hybrid use of machine learning and deep learning not only enhances the precision of pedestrian crash severity prediction but also strengthens the interpretability of results. The aligned findings across models provide a scientifically sound foundation for designing affordable, targeted road safety measures, including seasonal speed limits, improved lighting, and pedestrian infrastructure upgrades that collectively advance urban mobility safety goals.

You can read the paper here

Archives