Authors: Dokyun Kim & Dominic Salmieri
Using OpenCV and a Neato, we will recreate the Red Light Green Light game from the Netflix show Squid Game. The Neato will periodically rotate towards the players, and whoever is still moving gets eliminated from the game.
This program uses 2 popular computer vision algorithms (YOLO and SORT) to track people's movement.
Tracking algorithms usually consist of 4 main steps.
- Identification
- Extract features
- Calculate distance
- Match pairs
Step 1 is done using YOLO (You Only Look Once) while steps 2, 3 and 4 will be done using SORT (Simple Online Realtime Tracking). An explanation of each algorithm will be provided in the sections below.
Unlike other detection methods such as HOG (Histogram of Gradients), RCNN, or CNN, YOLO significantly outperforms them in speed. YOLO v1, released in 2016, processed 45 frames per second on a Titan X GPU. YOLO locates and classifies an object at the same time in a one-step process, hence the name 'You Only Look Once.'
The description below is based on the structure of YOLO v1. This program uses the latest YOLO v8, but the governing concepts behind them are similar.
YOLO divides a given image into a S x S grid, represented with the yellow lines in the image below. The red boxes are objects identified by the algorithm.
Fig 1: YOLO example
Each grid cell is represented as a multidimensional vector. The first 5 values are
Compared to neural networks of RCNN-type algorithms, YOLO's neural network is much simpler. As shown below, YOLO's neural network consists of 24 convolutional layers, 4 max-pooling layers, and 2 fully-connected layers.
Fig 2: YOLO's neural network structure
Let
Then, the IOU (Intersection Over Union) of
Fig 3: Example distance matrix calculated using
With the distance matrix, SORT now applies the Hungarian Algorithm to find the best-matching pairs. The Hungarian Algorithm is an optimization algorithm that assigns 'tasks' to 'workers' to minimize the 'cost.' For example, when assigning tasks (a-c) to workers (1-3) given the cost matrix below (Fig 4), the pairs 1-c, 2-a, 3-b would minimize the cost.
Fig 4: Example of the Hungarian Algorithm, the highlighted cells represent the pairings that minimize the cost.
With this in mind, finding best-matching pairs using the distance matrix (Fig 3) becomes an optimization problem for assigning boxes from
Fig 5: Result of applying the Hungarian Algorithm to the distance matrix. The best-matching pairs are 1-b, 2-d, 3-a, and 4-c. Since column d was added for the sake of making the matrix square, the 2-d pairing gets thrown out.
Once SORT finishes all the steps described above, it moves onto the next frame after postprocessing. During postprocessing, the
Fig 6: Result of applying SORT to an image. Each person is assigned to unique IDs. This ID will be used to determine which person moved.
Identifying people and tracking their bounding boxes works well to detect some movement, but if a player moves within their bounding box, the size and shape of the bounding box may not change. To pick up these in-bounding box movements, we used an image difference algorithm to calculate how much the pixels in each bounding box changed between two sequential frames of the video.
To start, the absolute difference between two images is found:
Fig 7: The cv.absdiff()
between two consecutive frames of video
Then, the image is converted to grayscale if it isn't already. The reason this is done after the difference opperation is that it captures more of the difference.
Fig 8: The cv.cvtColor(_, cv.COLOR_BGR2GRAY)
of the difference of the frames
Once the image is grayscale, it is threshholded, making it a binary image composed of white pixels that changed and black pixels that did not.
Fig 9: The cv.threshold(_, threshold, 255, cv.THRESH_BINARY)
of the grayscale difference using the given threshhold
Finally, the image is opened to remove noise, which is an errosion followed by a dilation. This operation will remove small differences that are likely noise without significantly affecting the area of larger differences.
Fig 10: The cv.morphologyEx(_, cv.MORPH_OPEN, kernel)
of the threshholded difference using the given kernel
Once we have the final difference, we find the amount of changed pixels in each player's bounding box. If this sum is greater than a certain number, then the player is considered to have moved.
Adding Game Features (neato_integration branch)
In a real game of Red Light Green Light, there is one "traffic cop" who turns towards the players in random time intervals. When a player is caught moving while the traffic cop is facing the players, they are eliminated from the game. If a player manages to tap the traffic cop's shoulders without getting eliminated, game is finished.
A Neato will be the traffic cop for this project, rotating towards/away from players periodically and searching for movement. Shown below is a state diagram for various states the Neato will be in throughout the game.
graph TD;
F([init])
A([wait])
B([turn_to])
C([scan_and_elim])
D([turn_away])
E([game_end])
F -->|initialize| A
A -->|5 seconds| B
B -->|finish turn| C
C -->|random time| D
D -->|finish turn| A
A-->|bumper triggered| E
B-->|bumper triggered| E
C-->|bumper triggered| E
D-->|bumper triggered| E
Fig 11: State Diagram of Neato
random_wait
state is when the Neato is waiting for a randomly selected time period before turning towards the players.turn_to
state is when the Neato is turning towards the players.scan_and_elim
is when the Neato is scanning for movement and elimininating moving players.turn_away
is when the Neato's scanning period (5 seconds) has passed and is turning away from the players.game_end
can be entered at any point in the loop when the Neato's bump switch is triggered. This represents the shoulder tapping, indicating that the game is over.
To integrate key game features into the Neato, various topics need to be published and subscribed to. In our case, there are 2 subscriber nodes and 1 publisher node.
graph TD;
A([Node 'run_neato_game'])
B([subscriber nodes])
C([publisher nodes])
D([odometry])
E([image])
F([cmd_vel])
A -->|creates| B
A -->|creates| C
B -->|subscribes| D
B -->|subscribes| E
C -->|publishes| F
Fig 12: Node Diagram showing publishers and subscribers used in the program.
The odometry
subscriber ensures the Neato is turning the correct amount every iteration. The image
subcriber collects frames that are used for movement detection. The cmd_vel
publisher ensures that the Neato turns at a desired angular velocity.
- Limitations of SORT
- ID assignment fails if person enters/leaves the frame
- Computer vision is a difficult topic
- Had to read multiple research papers
- The concepts were difficult to grasp first time
- Tuning the movement detection
- Too sensitive or too lenient; hard to find the middleground
- Learned how to use GitHub branches
- Prevented merge errors
- Built a solid understanding of the concepts behind YOLO and SORT.
- If we had more time, we would have tried to implement the following:
- Neato turning to the person who moved before eliminating
- Reduce noise in images for better motion detection
- Explore other methods for detecting movement
Bewley, Alex, et al. "Simple online and realtime tracking." 2016 IEEE international conference on image processing (ICIP). IEEE, 2016.
Oh, Il-Seok. Pattern Recognition, Computer Vision, and Machine Learning. Hanbit Academy, 2023.
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.