GestureBot and Feature Detection
GestureBot acted as my fall A.I. project. Mostly I wanted to research computer vision recognition of gestures. I didn’t really expect this to be a rounding success. People have been working on this problem for decades without coming upon an effective solution.
I dug around in the source code for Intel’s Open CV, an open source C library for computer vision (it also has less documented Python interfaces, I will note). It also contains machine learning algorithms and GUI prototyping functionality. What I focused on was it’s object and face recognition system, which uses Haar-like feature detection. Haar-like feature detection is a trained machine-learning algorithm: it gets positive and negative examples of what it’s looking for, gets a set of unseen test data, and is told by a human whether the system’s classification of test data was correct (this goes on for several rounds to produce a predictive decision tree system). As for how it guesses the classification, it breaks the images into smaller rectangles and classifies those based on the overall color, contrast levels, etc. (not unlike how neurons work in the brain).
But, despite it’s great reputation for feature classification, Haar-like feature detection is suboptimal for a gesture controlled robot. Each individual gesture needs to go through the training process. Which means thousands of images and lots of training time.
So, I looked at one of the simplest classification algorithms: k-nearest neighbors. This takes the latest image and compares it to a database of images and classifies it to whatever classification has the majority for the k images in the database that are deemed “closest.” It’s simple and has a linear big O processing time.
It also is easily affected by light variations and backgrounds. As hand gestures are relatively small to a photograph, the backgrounds in particular impacted how well the system worked.
The most interesting aspect I found was that if one removed the backgrounds and loaded the database full of images with spatial and/or color transformations, it performed worse than with the raw images, being more or less random.
I will say that programming k-nearest neighbors was fairly simple. In Java (which I used because of the robot I was using), it was perhaps 50-100 lines to work on an image. The code relies on features built into our Robot API, so it would require understanding that first (plus having the correct drivers).