I would drop CUDA and Nvidia mess, you don't need GPU neither for training or running inference, CPU should suffice.
I would start with running a pre trained network that's based on COCO set, on top of every frame that you get from camera. So basically you're doing object detection on images.
After the success of this POC, you can gather about 1000 images of pigeons together with transfer learning, again on the same pre trained COCO model.
Thanks for the heads-up! That's probably a lot more viable for my situation. I scrapped around 3000 images of pigeons from the internet hopefully it's enough for a decent POC.
After the success of this POC, you can gather about 1000 images of pigeons together with transfer learning, again on the same pre trained COCO model.
https://medium.com/object-detection-using-tensorflow-and-coc...