- Published on
Real Time Object Detection using YoloV7 on Google Colab
Introduction
YOLO (you only look once) is the start-of-the-art object detection model available out there overpowering most of its rivals.
How does it work?
If you want to know how YOLO works in one paragraph, it goes like this:
Firstly, training data is composed of images, bounding box vectors (i.e., [Pc Bx By Bw Bh C1 C2 ....]), with each vector representing a box around a known object in the image. Thus, images with four known objects in it will have four such vectors, and with seven will have seven vectors.
Since it is essential to have a fixed length vector to train a neural network model, the key feature of the YOLO model is to divide the image into NxN grid where each grid cell produces B bounding boxes of their own (i.e., [Pc Bx By Bw Bh C1 C2 ....]). Therefore we’re now having a fixed length and number of vectors. Taking the above technical jargon and translating it into numbers, if we have a 4x4 grid and each grid produces a 1x7 vector, then we have 16 (1x7) vectors in total. (Here 1x7 signifies that we're dealing only with two classes)
Now that we know how YOLO's training (grid celled images against fixed length vectors) works, let's talk briefly about prediction. As with training, given an image, it is divided into NXN grid cells, and each cell produces B bounding boxes. As a result, there will be a clutter of bounding boxes (since each cell produces B boxes, there will be overlap), which is resolved by intersection over union (IOU) and non-maximum suppression (NMS). Last but not least, we now have the prediction (bounding box vectors), which we can use to draw a box around the predicted object.
Phew! I know the above would take a lot of energy to intake, but just in case if you want to explain to someone about YOLO in a minute or so you could probably do so with the above paragraphs.
Enough theory, Let’s just dive into the central purpose of this tutorial, which is to perform object detection on any youtube video using YOLOV7 by running the model on GPU powered google colab platform just like the following.
Step 1: Open Google Colab and set the runtime type to GPU accelerator
That is, Choose RunTime from the Navbar and then select Change runtime type and set the hardware accelerator to GPU
Google Drive link, upload to colab and then unzip it by running the following command
Step 2: Download the Zip file through this!unzip ONNX-YOLOV7-Object-Detection.zip
unzip "file-location"
.Step 3: Change the current working directory and install the prerequisites
import os
print(os.getcwd())
os.chdir("/content/ONNX-YOLOV7-Object-Detection")
print(os.getcwd())
!pip install -r requirements.txt
!pip install youtube_dl
!pip install git+https://github.com/zizo-pro/pafy@b8976f22c19e4ab5515cacbfae0a3970370c102b
Step 4: Open the video_object_detection.py file and replace the youtube link with the link you want to test with YOLOv7.
import cv2
import pafy
from YOLOv7 import YOLOv7
videoUrl = 'https://youtu.be/nhyDDH-YHOc' # Replace this link with your custom youtube video link
videoPafy = pafy.new(videoUrl)
print(videoPafy.streams)
cap = cv2.VideoCapture(videoPafy.streams[-1].url)
start_time = 0 # skip first {start_time} seconds
cap.set(cv2.CAP_PROP_POS_FRAMES, start_time * 30)
out = cv2.VideoWriter('output.avi', cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), 30, (1280, 720))
...
cv.imshow()
, cv2.namedWindow()
doesn't workStep 5: Run the following command and wait for the results.
!python video_object_detection.py
[normal:mp4@640x360, normal:mp4@1280x720]
....
Step 5 takes between 5 and 40 minutes, depending on the video's quality and duration. And sometimes, even more, depending upon your internet connection and Google Colab’s GPU traffic. For faster results, test videos in under 4 minutes or so.
Upon finishing the execution, you can see the output file named “output.avi” in the current working directory. Download the file and view the final output.
Credits
Official YoloV7 Research Paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Source code for YOLOv7
Hugging faces Demo using Gradio
Python scripts performing object detection using the YOLOv7 model in ONNX.
ONNX Models of YOLOv7
That's the conclusion, I hope you enjoyed it. If you're stuck at any point, please comment down below (if comments are not showing up, kindly pull a refresh to clear any glitches).