Monday, November 30, 2009

Hand Posture and Gesture Recognition

Human-computer interaction is currently a active field in computer science, with Microsoft working on project Natal, and Universities developing virtual reality systems. One of the most basic problems in HCI is hand posture and hand gesture recognition. While these to terms are often used interchangebly, they are fundamentally different. Hand posture is a static model of the hand, which a gesture is dynamic, and involves change in pose. We use both of these naturally in a normal conversation to convey ideas and emotions. There are two ways to implement such a system. One is data-glove based, and the other is vision based. A data-glove based system will give the most tracking results, but requires the most hardware to implement. In addition, most devices are clunky, and may limit how natural such a system feels. The second method is a vision-based approach. This involves using a simple web cam to provide an image of the hand and extract pose data from it. This approach is the cheapest, and most lightweight, but the results can be unreliable. Part of the problem is that most vision based systems are oriented in 2 dimensions. Using 2D data to classify gestures is much more innacurate than using 3D coordinates, but then the problem arises of how to determine depth from a single image. Multiple cameras can be used, but current stereo matching algorithims are not efficient enough to be used in real time. The solution is to take advantage of a variety of monocular depth cues to determine the depth with the most likely probability. These cues include shadows, occlusion, scale change, and more.

Another problem with determining hand poses is getting it to function in real time. A non-realtime application destroys the interactivity and complicates the gestures. This problem can be solved by taking advantage of multi-threading technology. Intel's newest processors, including the Core 2 Duo, and the Core i7 both have the ability to preform parallel processing (executing multiple commands at the same time). In fact, the Core i7 allows 8 commands to be executed at the same time. Since this is a relatively new technology, most programs don't take advantage of this. Utilizing this technology can optimize the program up to 8 times.

When detecting hand poses, images must be passed through different filters, some o which are independant of each other. By performing a different filter for each thread, you can increase detection times. This means that instead of waiting for 16 image operations to complete, you only have to wait the time of 2.

No comments:

Post a Comment