CVPR 2006 notes
From LadypackWiki
| Table of contents |
Day 1 (Saturday, June 17, 2006)
Attended "Using the Graphics Processing Unit for Computer Vision" by E. Scott Larsen (http://www.cs.unc.edu/~larsene/)
The basic idea is that the GPU on typical graphics cards (e.g. nVidia, ATI) has some very powerful computing resources that can be used for computer vision algorithms. For example, the FFT and even general purpose convolution can be sped up by a factor of 40x-60x. This could be _very_ useful for real-time CV work.
Scott provided slides and code examples, which I'm mirroring here (http://people.csail.mit.edu/albert/reading/larsene-gpu-tutorial.zip) for now.
some links: http://openvidia.sourceforge.net http://gpgpu.org
Spent the rest of the day in the "Beyond Patches" workshop. Pietro Perona gave the same high-level talk he gave here at CSAIL last year.
The idea of "Beyond Patches" was that computer vision has, in recent years, focused a lot of attention on local patch detectors and descriptors, and it is now time to move beyond them and incorporate other types of feature detectors and descriptors. A prominent trend seemed to be returning to some sort of component-based approach where the geometry and structure of multiple patches are considered together to aid in recognition.
Day 2 (Sunday, June 18)
Attended the "25 Years of RANSAC" workshop in the morning. Unfortunately missed the talk by Robert Bolles. Phil Torr gave a great tutorial on RANSAC, introducing the basics behind it, some of the limitations, and variations. They showed some very impressive videos on using RANSAC for doing real-time structure from motion, and also low-drift egomotion (i.e. on the order of meters after tens of minutes).
Skipped out to go to Andrew Blake's invited talk on "Interactive image editing - powered by Computer Vision" where he described his work with MSR. Fairly high-level, but very interesting.
Attended the rest of the RANSAC workshop, but was not able to understand a large part of the rest of the talks. The speakers assumed familiarity with a number of subjects that I just didn't know (M-estimators, Kernel Density Estimators, maybe a couple others). It's possible that these subjects were covered in the second part of the tutorial that I missed because of Andrew Blake's talk.
Wei Zhang had an interesting talk on his paper "Ensemble method for robust motion estimation" where they used higher order statistics to reliably detect outliers in the correspondence matching problem. Should read that paper.
Day 3 (Monday, June 19)
Official start of CVPR.
Papers I plan on reading first:
Andreas Opelt, Axel Pinz, and Andrew Zisserman - "Incremental Learning of Object Detectors Using a Visual Shape Alphabet". The idea was similar to recent object recognition papers that build visual vocabularies of local features, but instead of using blobs (i.e. SIFT), they used edge segments to represent the shape/contour of an object. Train with AdaBoost, and shazam!
Ethan Eade and Tom Drummond - "Scalable Monocular SLAM". An attempt to do real-time vision-based monocular SLAM. Builds off of Andrew Davison's work, but instead of using an EKF, they tried it with FastSLAM and were able to scale to around 400-500 simultaneously tracked features.
Kristy Sim and Richard Hartley - "Removing Outliers Using the
Norm". I didn't attend this talk, but plan on reading the paper. From what I understand, the basic idea is that they've gotten the
norm to work much better than the L2 norm for SFM, and show here how to detect and avoid outliers, which the
norm is highly susceptible to.
Steven Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski - "A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms". A bunch of dudes at UW GRAIL, MSR, and other places decided it's high time for a standardized data set with which to evaluate and compare stereo algorithms. This talk described the creation of the data set and the results of running a number of algorithms on it. Very interesting if you're into stereo.
Assaf Zomet and Shree Nayar - "Lensless Imaging with a Controllable Aperture". Not applicable to what I'm doing, but super interesting. The idea is to remove the lens from a digital camera, and then use two stacked LCD panels as a controllable aperture. Since pixels are normally black, and transparent when current is applied, you can stick the LCDs in front of a CCD and do fancy things like using parts of the CCD for image processing or imaging disjoint parts of the scene.
Day 4
Ce Liu, William Freeman, Richard Szeliski, Sing Bing Kang - "Noise Estimation from a Single Image". I'm pretty sure I've seen Bill give this talk before, somewhere. Pretty cool. The idea is to learn a prior model of CCD pixel noise, and then estimate the nose model for an arbitrary image. Oversegment the image into piecewise smooth regions.
Qifa Ke and Takeo Kanade - "Uncertainty Models in Quasiconvex Optimization for Geometric Reconstruction". Structure from Motion using the
norm.
Kristy Sim and Richard Hartley - "Recovering Camera Motion Using
Minimization". Very similar technique to Ke and Kanade. Looks like Sim and Hartley had a somewhat more complex model, but the basic idea seems to be almost the same. Talked to Kristy, and she agreed that their work was very similar, and that they were mutually unaware of each other's work.
Mark Pupilli and Andrew Calway - "Real-Time Visual SLAM with Resilience to Erratic Motion". ???
