Vehicle and Pedestrian Detection Based on Improved YOLOv5s

Shun-Yong Zhou; , Hao Zhu; , Xue Liu; , Ya-Lan Zeng; , Si-Cheng Li; , Yang-Ming Luo

Journal of Vibration Testing and System Dynamics

C. Steve Suh (editor), Pawel Olejnik (editor),

Vehicle and Pedestrian Detection Based on Improved YOLOv5s

Journal of Vibration Testing and System Dynamics 8(2) (2024) 183--194 | DOI:10.5890/JVTSD.2024.06.003

Shun-Yong Zhou$^{1,2}$, Hao Zhu$^{1,2}$, Xue Liu$^{1,2}$, Ya-Lan Zeng$^{1,2}$, Si-Cheng Li$^{1,2}$, Yang-Ming Luo$^{1,2}$

$^1$ School of Automation and Information Engineering, Sichuan University of Science & Engineering, Yibin 644000, China

$^2$ Artificial Intelligence Key Laboratory of Sichuan Province, Yibin 644000, China

Download Full Text PDF

Abstract

An updated YOLOv5s algorithm for vehicle and pedestrian identification is proposed to address the variety of vehicle and pedestrian targets in road traffic under complex environments. First, the network's nonlinear capability is improved with the addition of the GELU activation function; Second, the hybrid pyramid pooling structure (HSPPF) is utilized in the backbone network to lessen the loss of feature layer information; Finally, transfer learning and the EIoU loss function are incorporated to enhance the model's accuracy and speed of convergence. The experimental findings demonstrate that the enhanced algorithm can reliably identify targets such as vehicles and pedestrians. Its mAP value is 94.0\%, 1.8\% faster than before the enhancement, and its detection speed is 80.6 FPS. It is more suited for complex real-world traffic circumstances and has higher detection accuracy and speed when compared to other algorithms.

Acknowledgments

The work presented in this paper was partially supported by the Program of Sichuan Provincial Department of Science and Technology (No. 2020YFSY0027) and The Innovation Fund of Postgraduate, Sichuan University of Science \& Engineering (No. Y2022129 and Y2022163), the Innovation and Entrepreneurship Program for College Students (No. S202210622033).

References

[1]	Premebida, C., Monteiro, G., Nunes, U., and Peixoto, P. (2007), A lidar and vision-based approach for pedestrian and vehicle detection and tracking, 2007 IEEE Intelligent Transportation Systems Conference, 1044-1049.

[2]	Hariyono, J., Hoang, V., and Jo, K. (2014), Motion segmentation using optical flow for pedestrian detection from moving vehicle, International Conference on Computational Collective Intelligence, 8733, 204-213.

[3] Dalal, N. and Triggs, B. (2005), Histograms of oriented gradients for human detection, 2005 IEEE computer society conference on computer vision and pattern recognition, 886-893.
[4] Xiao, D., Xin, C., Zhang, T., Zhu, H. and Li, X. (2014), Saliency texture structure descriptor and its application in pedestrian detection, Journal of Software, 25(3), 675-689.
[5] Ren, J. and Han, J. (2021), A new multi-scale pedestrian detection algorithm in traffic environment, Journal of Electrical Engineering $\&$ Technology, 16(2), 1151-1161.

[6]	Ren, S., He, K., Girshick, R., and Sun, J. (2017), Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis $\&$ Machine Intelligence, 39(06), 1137-1149.

[7] Cao, J., Song, C., Song, S., Peng, S., Wang, D., Shao, Y., and Xiao, F. (2020), Front vehicle detection algorithm for a smart car based on improved SSD model, Sensors, 20(16), 4646.
[8] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., and Berg, A.C. (2016), Ssd: Single shot multibox detector, European Conference on Computer Vision, 9905, 21-37.

[9]	Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015), Going deeper with convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition, 1-9.

[10]	Gao, X. and Jiang, L. (2020), Research on detection method of traffic anomaly based on improved YOLOv3, 2020 7th International Conference on Information Science and Control Engineering (ICISCE), 1134-1139.

[11] Redmon, J. and Farhadi, A. (2018), Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767.
[12] Woo, S., Park, J., Lee, J., and Kweon, I.S. (2018), Cbam: Convolutional block attention module, Proceedings of the European conference on computer vision (ECCV), 3-19.
[13] Ma, L., Chen, Y., and Zhang, J. (2021), Vehicle and pedestrian detection based on improved YOLOv4-tiny model, Journal of Physics: Conference Series, 1920(1), 012034
[14] Bochkovskiy, A., Wang, C., and Liao, H.M. (2020), Yolov4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934.
[15] Stergiou, A., Poppe, R., and Kalliatakis, G. (2021), Refining activation downsampling with SoftPool, Proceedings of the IEEE/CVF International Conference on Computer Vision, 10357-10366
[16] Zhang, Y., Ren, W., Zhang, Z., Jia, Z., Wang, L., and Tan, T. (2022), Focal and efficient IOU loss for accurate bounding box regression, Neurocomputing, 506, 146-157.

[17]	Lin, T., Doll{a}r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017), Feature pyramid networks for object detection, Proceedings of the IEEE conference on computer vision and pattern recognition, 2117-2125.

[18]	Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018), Path aggregation network for instance segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition, 8759-8768.

[19] Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013), Vision meets robotics: The kitti dataset, The International Journal of Robotics Research, 32(11), 1231-1237.