Running YOLO on Edge Devices: Lessons from the Factory Floor
Jetson Nano, INT8 quantization, and the latency budget you actually need for real-time inspection.
Running computer vision models on edge devices in a factory is a different game from running them in a cloud notebook. Here's what we learned deploying YOLO-based inspection systems on NVIDIA Jetson devices.
The latency budget
Our production line moves at 60 parts per minute. That gives us 1 second per part. Factor in camera capture (50ms), image preprocessing (30ms), network transfer (20ms if using a separate compute node), and post-processing (50ms). That leaves ~850ms for inference. Comfortable for one model, tight for an ensemble.
Quantization: the free lunch
We went from FP32 (45ms inference) to FP16 (25ms) to INT8 (12ms) with less than 1% accuracy loss. INT8 quantization on TensorRT is essentially free performance. The trick is using a representative calibration dataset — at least 500 images that cover all defect types and lighting conditions.
What actually fails in production
- Lighting changes. Sunlight through a factory window at 4pm creates shadows that didn't exist during training. Solution: controlled LED lighting with diffusers, plus data augmentation with brightness/contrast variations.
- Camera drift. Vibrations from the press line slowly shift the camera angle over weeks. Solution: a calibration check every shift using a reference pattern.
- New defect types. The model sees a defect it wasn't trained on and either ignores it or misclassifies it. Solution: an anomaly detection layer that flags anything the model is uncertain about.
- Temperature. Jetson devices throttle in hot factory environments. Solution: proper heatsinks, thermal monitoring, and a fallback to lower-res inference if temperature spikes.
Our edge stack
NVIDIA Jetson Orin NX, YOLOv8 with TensorRT optimization, Docker containers for deployment, MQTT for result streaming, and a central dashboard for monitoring all stations.
