Feb. 17, 11:45, Wei Ji Ma. Optimality and Probabilistic Computation in Visual Categorization

Core research directions

Animate Vision



How do people and machines recognize and classify the motion of living things? People are remarkably good at this, even with sparse or noisy signals. This research direction comprises human perceptual experiments, models, simulations and automated recognition devices built around motion detectors tuned according to naturalistic biomechanical constraints. Projects also explore models of motion recognition in human and machines, including motion tracking, eye tracking, local and global motion coding, motion and attention, learning, and motion in texture fields and natural scenes.

human vs non-human recognition

Perception in 3-D environments
3-D view of a footprint from above A classical problems in perception is how to perceive 3-dimensional objects and scenes from 2-dimensional images. Projects highlight the mechanisms for integrating different cues – stereo, motion, shading, texture and occlusion – to obtain coherent representations of 2 and 3-dimensional shape. Models focus on fundamental questions about how shape is represented in humans, what kind of image transformations are needed to optimize shape recognition in machines, and what kind of computer-generated shapes and objects are most compatible with human perceptual function.

Scanning, Search, and Attention

The perceptual representation of a scene is built from sequences of individual views because it is impossible to perceive or remember everything in a single glance. Projects explore how we use eye movements and perceptual attention to gather information from the world and plan effective patterns of action in realistic tasks. Models focus on mechanisms of perceptual attention and memory and their interaction with eye movements and organized plans of action. State of the art computer image transformations are also studied in order to find out how to make important aspects of a scene noticed and readily apprehended by active human observers.

Visual Language

agitated vs. relaxed

These projects are developing systems to interpret America Sign Language (ASL) based on motion-tracking algorithms and stochastic learning procedures. Models must not only recognize hand signals, but must also incorporate gestures, facial expressions, syntactic rules, and contextual cues to resolve ambiguities and cope with high levels of noise. Techniques employed are drawn from computer vision, statistics, computational learning, linguistics, psycholinguistics, and perceptual recognition of patterns and motion.

Visual Communication & Visual Interface Design
motion estimation talk with hands
Often the most effective computer-generated images are not literal depictions of a scene, but rather, meaningful abstractions that convey essential information in a way that is readily understood. Projects underway are creating usable images – “visual explanations” – whose structure and design are compatible with the way the human perceptual system perceives and extracts meaning from a scene over space and time. Examples are found in the generation of assembly instructions, line drawings of 3-dimensional objects, and artistic rendering of natural scenes. These efforts depend on incorporating models of human perceptual analysis into the rules governing the design.