RGB-D Kernel Descriptors

RGB-D Kernel descriptors achieves state-of-the-art results on many types of recognition tasks. Linear SVM is sufficient for good accuracy. Please check our demo codes in the package. If you use our package, please cite the following papers. If you have any questions or suggestions, please contact us.

Datasets Kernel Descriptors Data Source
RGB-D Object (Category) 86.5% http://www.cs.washington.edu/rgbd-dataset
RGB-D Object (Instance) 93.0% http://www.cs.washington.edu/rgbd-dataset
Bird200 26.2% http://vision.caltech.edu/visipedia/CUB-200.html
Bird2002011 43.0% http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
Cifar10 80.0% http://www.cs.toronto.edu/~kriz/cifar.html
Caltech101 77.5% http://vision.stanford.edu/resources_links.html
Extended Yaleface 99.4% http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html
USPS 97.6% http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets
Scene15 87.8% http://www.cs.unc.edu/~lazebnik
UIUC-Sports 89.1% http://vision.stanford.edu/lijiali/event_dataset
Flickr Material 54.0% http://people.csail.mit.edu/celiu/CVPR2010/FMD
Stanford Background 82.9% http://dags.stanford.edu/projects/scenedataset.html
NYU Depth Dataset 76.1% http://cs.nyu.edu/~silberman/site/?page_id=27

Code

Datasets

Demos

Publications

Learning Algorithms for Recognition

  • Liefeng Bo, Xiaofeng Ren and Dieter Fox, Depth Kernel Descriptors for Object Recognition, in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2011. [PDF] [BIB]

  • Liefeng Bo, Kevin Lai, Xiaofeng Ren and Dieter Fox, Object Recognition with Hierarchical Kernel Descriptors, in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2011. [PDF] [BIB]

  • Liefeng Bo, Xiaofeng Ren and Dieter Fox, Kernel Descriptors for Visual Recognition, Advances in Neural Information Processing Systems (NIPS), December, 2010. [PDF] [Spotlight] [Video] [BIB]

Fine-Grained Recognition, Scene Labeling and Material Recognition

  • Shulin Yang, Liefeng Bo, Jue Wang and Linda Shapiro, Unsupervised Template Learning for Fine-Grained Object Recognition, Advances in Neural Information Processing Systems (NIPS), December, 2012. [PDF] [BIB]

  • Xiaofeng Ren, Liefeng Bo and Dieter Fox, RGB-(D) Scene Labeling: Features and Algorithms, In IEEE International Conference on Computer Vision and Pattern Recognition(CVPR), June 2012. [PDF] [BIB]

  • Diane Hu, Liefeng Bo, Xiaofeng Ren, Toward Robust Material Recognition for Everyday Objects, In British Machine Vision Conference (BMVC), September, 2011. [PDF] [BIB]

What are RGB-D kernel descriptors?

The core of building recognition systems is to extract expressive features from high-dimensional structured data such as images, videos, audio, depth maps, and 3D point clouds. Kernel descriptors aim to discover such features using machine learning methodology. The standard approach to object recognition is to compute pixel attributes in small windows around (a subset of) pixels. For example, the gradient orientation and magnitude attributes in SIFT, a most successful feature in computer vision, are computed from 5x5 image windows. A key question for object recognition is then how to measure the similarity of image patches based on the attributes of pixels within them, because this similarity measure is used in a classifier such as a support vector machine (SVM). Techniques based on histogram features such as SIFT or HOG, discretize the individual pixel attribute values into bins and then compute a histogram over the discrete attribute values within a patch. The similarity between two patches can then be computed based on their histograms. Unfortunately, the binning restricts the similarity measure and introduces quantization errors, which limit the recognition accuracy.

We highlight the kernel view of SIFT, HOG, and bag of visual words, and show that histogram features are a special, rather restricted case of efficient match kernels. This novel insight allows us to design a family of kernel descriptors. Kernel descriptors avoid the need for pixel attribute discretization and are able to generate rich patch-level features from different types of pixel attributes. Here, the similarity between two patche is based on a kernel function, called match kernel, that averages over the continuous similarities between all pairs of pixel attributes in the two patches. Match kernels are extremely flexible and easy to incorporate domain knowledge, since the similarity measure between pixel attributes can be any positive definite kernel, such as the popular Gaussian kernel function. While match kernels provide a natural similarity measure for image patches, evaluating these kernels can be computationally expensive, in particular for large image patches. To compute kernel descriptors, one has to move to the feature space forming the kernel function. Unfortunately, the dimensionality of these feature vectors is high, even infinite if for instance a Gaussian kernel is used. Thus, for computational efficiency and for representational convenience, we reduce the dimensionality by projecting the high/infinite dimensional feature vector to a set of finite basis vectors using kernel principal component analysis. This procedure can approximate the original match kernels very well, as shown in [3].

To summarize, extracting kernel descriptors involves the following steps: (1) define pixel attributes; (2) design match kernels to measure the similarities of image patches based on these pixel attributes; (3) determine approximate, low dimensional match kernels. While the third step is done automatically by learning low dimensional representations and the defined kernels, the first two steps make the approach applicable to a specific scenario. Thus, kernel descriptors provides a unified and principled framework to extract rich features from sensor data. We have developed eight types of kernel descriptors [1, 2, 3] for RGB-Depth images, a relative complete feature sets, to capture rich cues for robust object recognition. Our kernel descriptors achieve state-of-the-art results on many benchmarks: USPS, extended Yaleface, Scene-15, Caltech-101, CIFAR-10, CIFAR-10-ImageNet, and RGB-D object dataset. More importantly, our kernel descriptors have exhibited very robust performance in severa real world recognition systems: the autonomous chess playing manipulator robot, the RGB-depth camera based everyday object recognition system, and the object-aware situated interactive system (OASIS). The OASIS Lego demo was shown live at the Consumer Electronics Show 2011.