Selective Visual Attention for Robust Visual Tracking

Long-duration tracking of general targets is quite challenging for computer vision, because in practice target may undergo large uncertainties in its visual appearance and the unconstrained environments may be cluttered and distractive, although tracking has never been a challenge to the human visual system. Psychological and cognitive findings indicate that the human perception is attentional and selective, and both early attentional selection that may be innate and late attentional selection that may be learned are necessary for human visual tracking.

We propose a new visual tracking approach by reflecting some aspects of spatial selective attention, and presents a novel attentional visual tracking (AVT) algorithm. In AVT, the early selection process extracts a pool of attentional regions (ARs) that are defined as the salient image regions which have good localization properties, and the late selection process dynamically identifies a subset of discriminative attentional regions (D-ARs) through a discriminative learning on the historical data on the fly. The computationally demanding process of matching of the AR pool is done in an efficient and innovative way by using the idea in the locality-sensitive hashing (LSH) technique. The proposed AVT algorithm is general, robust and computationally efficient, as shown in extensive experiments on a large variety of real-world video.

Demo Sequences (Click images to play the viedo. If video does not play, please install DivX video codec.)


Marathon	Cowboy	Bicyclist

Marathon 2	Walker	Bicyclist 2

Tanker	Zebra 1	Walker 2

Another Tanker	Zebra 2	Face

Publication:

Ming Yang, Junsong Yuan and Ying Wu, "Spatial Selection for Attentional Visual Tracking", in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR'07), Minneapolis, MN, June 2007 [PDF]

Return to Tracking Research