Criteria and indicators of quality of information support for legal regulation of electronic document management in Ukraine. Наукові записки. Інститут поліграфії та медійних технологій. НУ «Львівська політехніка»

Author(s)	Collection number	Pages	Download abstract	Download full text
Hileta I. V., Коминар Т. Н.	№ 2 (71)	155-165

Summary
References

This article is devoted to the analysis of modern methods for optimizing computer vision models for deployment on edge devices with limited computational resources. The key approaches to adapting YOLO (You Only Look Once) family models for edge platforms such as Orange Pi 5 and Raspberry Pi 5 using hardware neural network accelerators (NPU, TPU, VPU) are examined. Special attention is paid to model compression methods, including quantization (INT8, FP16), structural and non-structural pruning, knowledge distillation, and computational graph optimization. Inference delegation systems are analyzed, including TensorRT, RKNN Toolkit, and Edge TPU Delegate, which ensure efficient use of specialized accelerators for real-time object detection.

The study demonstrates that the application of combined optimization methods allows achieving model size reduction up to 70% while maintaining detection accuracy above 98% of the baseline level. A comprehensive optimization pipeline is proposed, covering stages from dataset preparation to model deployment on the target platform, taking into account energy consumption and computational power constraints. The pipeline includes data collection and annotation, augmentation, stratified sampling, baseline model training, accuracy evaluation, quantization (QAT/PTQ), export to intermediate formats (ONNX, TFLite), inference delegation through specialized toolkits (RKNN, TensorRT, Edge TPU Compiler), edge deployment testing, and performance profiling.

Comparative analysis of hardware accelerators reveals significant differences in performance, supported precision formats, and application areas. Google Coral Edge TPU provides up to 4 TOPS with mandatory INT8 quantization, making it optimal for IoT and smart home systems. NVIDIA Jetson Xavier NX delivers 21 TOPS with FP16/INT8 support, suitable for drones, robotics, and video surveillance. Hailo-8 offers 26 TOPS for detection tasks in smart cameras and UAVs. Intel Movidius Myriad X (1 TOPS) is designed for low-budget CV applications, while Kendryte K210 (0.3 TOPS) targets mini-drones and IoT sensors.

The research findings create a foundation for practical implementation of object detectors in embedded systems, autonomous unmanned aerial vehicles, airspace video monitoring systems, and other edge-AI applications where performance, autonomy, and energy efficiency are critical. Future research directions include refining optimization pipelines, adapting models for diverse hardware platforms, and enhancing energy efficiency to enable high-performance and accurate detection solutions on edge devices.

Keywords: computer vision, YOLO, edge AI, neural network optimization, quantization, NPU/TPU, object detection, embedded systems, SBC, inference delegation.

doi: 10.32403/1998-6912-2025-2-71-140-154

1. Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge Computing: Vision and Challenges. IEEE Internet of Things Journal, 3(5), 637–646.
2. Satyanarayanan, M. (2017). The Emergence of Edge Computing. Computer, 50(1), 30–39.
3. Shi, W., & Dustdar, S. (2016). The Promise of Edge Computing. Computer, 49(5), 78–81.
4. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788.
5. Jouppi, N. P., Young, C., Patil, N., et al. (2017). In-Datacenter Performance Analysis of a Tensor Processing Unit. Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), 1–12.
6. Bochkovskiy, A., Wang, C.-Y., & Liao, H.-Y. M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934. Retrieved from https://arxiv.org/abs/2004.10934.
7. Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y. M. (2022). YOLOv7: Trainable bag-of-freebies and aggregation of reparameterized modules. arXiv:2207.02696. Retrieved from https://arxiv.org/abs/2207.02696.
8. Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
9. Han, S., Mao, H., & Dally, W. J. (2016). Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. International Conference on Learning Representations (ICLR). Retrieved from https://arxiv.org/abs/1510.00149.
10. Ignatov, A., Timofte, R., et al. (2019). AI Benchmark: Running Deep Neural Networks on Android Smartphones. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
11. Orange Pi. (2023). Orange Pi 5 User Manual — Rockchip RK3588 Specifications. Retrieved from http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/service-and-support/ Orange-Pi-5.html.
12. Google. (2023). Coral Edge TPU Technical Specifications and Performance Guide. Retrieved from https://coral.ai/docs/edgetpu/inference/.
13. NVIDIA Corporation. (2024). Jetson Modules — Technical Specifications. Retrieved from https://developer.nvidia.com/embedded/jetson-modules.
14. Intel Corporation. (2022). Intel Movidius VPU — Vision Processing Unit Architecture. Retrieved from https://www.intel.com/content/www/us/en/products/details/processors/movidius-vpu.html.
15. Hailo Technologies. (2023). Hailo-8 AI Processor Datasheet. Retrieved from https://hailo.ai/products/hailo-8-ai-accelerator/.
16. Krishnamoorthi, R. (2018). Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342. Retrieved from https://arxiv.org/abs/1806.08342.
17. Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2017). Pruning Convolutional Neural Networks for Resource Efficient Inference. International Conference on Learning Representations (ICLR). Retrieved from https://arxiv.org/abs/1611.06440.
18. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv:1503.02531. Retrieved from https://arxiv.org/abs/1503.02531.
19. NVIDIA Corporation. (2024). TensorRT Developer Guide — Optimizing Inference Performance. Retrieved from https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/.
20. Ultralytics. (2024). YOLOv8 Documentation — Export and Deployment. Retrieved from https://docs.ultralytics.com/modes/export/.
21. Jani, M. (2023). Optimization of YOLOv5 for Embedded Platforms: Quantization, Pruning and Knowledge Distillation. International Journal of Computer Vision and Machine Learning, 12(3), 45–62.
22. Alqahtani, F., Al-Makhadmeh, Z., & Tolba, A. (2024). Energy-Efficient Object Detection on Edge Devices: A Comparative Study of YOLO Variants on Raspberry Pi and NVIDIA Jetson. Sensors, 24(8), Article 2517.