Progress: January 2006 - March 2006
From LadypackWiki
| Table of contents |
Monday, Mar 20, 2006
A. Levin and R. Szeliski. Visual odometry and map correlation (http://www.research.microsoft.com/vision/visionbasedmodeling/publications/Levin-CVPR04.pdf). In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'2004), volume I, pages 611-618, Washington, DC, June 2004
Saturday, Mar 11, 2006
Experiences with an interactive museum tour-guide robot (http://www-2.cs.cmu.edu/~thrun/papers/thrun.tourguide.pdf)
Handling real-world motion planning: a hospital transport robot (http://ieeexplore.ieee.org/iel1/37/3441/00120445.pdf%3Farnumber%3D120445)
Thursday, Mar 9, 2006
Sam's Laser FAQ (http://repairfaq.ece.drexel.edu/sam/laserdio.htm) - amazingly detailed site on lasers. Direct link to diode lasers
Thursday, Mar 2, 2006
Explored options for making use of existing software libraries in my work.
- OpenCV (http://sourceforge.net/projects/opencvlibrary/) - mature and comprehensive library for doing low-level vision. Python bindings are not quite there yet. Spent a few days wrestling with this with moderate success. I'm not using it for now because the wrappers aren't so mature. Originally, I wanted the optical flow implementations, but will use other implementations for now (see next).
- KLT tracker (http://www.ces.clemson.edu/~stb/klt/) - public domain KLT implementation by Stan Birchfield. Created SWIG wrappers for this and am now using both the tracker and the included feature detector (some corner detector that checks eigenvalues) to do sparse optical flow.
- FAST (http://mi.eng.cam.ac.uk/~er258/work/fast.html) - Ed Rosten's simple feature detector (really a point/corner detector). Wrote python wrappers around this and may use it for feature detection instead of stb's corner detector.
Spent some time learning how to use SWIG (http://www.swig.org/) to create Python wrappers around C libraries. Very powerful, but also fairly complex. Figuring this out took a few days, but now I can create Python wrappers super duper fast. SWIG has trouble with complex data structures and pointers, which can make wrapping some code quite messy. In the meantime, I also found a bug in SWIG and filed a bug report (https://sourceforge.net/tracker/?func=detail&atid=101645&aid=1439898&group_id=1645)
Created some videos showing the results of running FAST and KLT on one data set.
http://people.csail.mit.edu/albert/ladypack/media/KLT-tracking-33x-1.avi
http://people.csail.mit.edu/albert/ladypack/media/FAST-features-33x-1.avi
TODO
- project IMU's rotational prediction onto video just for visual inspection.
- implement some form of optimizer to estimate rotational and translational components from the spherical optical flow field. Seth suggests using EM. Can use IMU for initial estimate of rotational component.
- hand label a data set with constraints.
- create a trajectory file with constraints to send to Ed's SLAM solver.
- once that works, get SpeechBuilder up and running for real speech parsing.
- read Ed's paper (http://groups.csail.mit.edu/rvsn/content/eolson/graphoptim/)
Thursday, Feb 16, 2006
Finally got around to profiling the orientation readings of one of the XSens MTi IMUs. Walked around the 3rd floor, pointing it in a constant direction and writing down measurements. The carpeting on the floor has a repeating square pattern that I used to align the MTi for each reading. You can see from the figure below that the readings can vary dramatically over just a few feet, which essentially renders the magnetometer a bit useless for getting an accurate heading unless we have a calibrated magnetic field map (a bit unreasonable)
Tuesday, Feb 14, 2006
Met with Charlie Kemp last week to discuss the potential of collaborating.
Seth put together a basic plan to get something working and produce something possibly publishable. The idea is to use vision+IMU to get rough trajectory estimate. Then use laser pointer and SLS software to mark features. Then plug into Ed Olson's SLAM optimizer to get a better trajectory/state estimate. Then register to a CAD map of 33x for visualization.
Current roadmap
- pick an egomotion-from-vision technique and implement it
- integrate this with IMU sensor data
- locate laser dot in video stream
- setup a SpeechBuilder framework with simple vocabulary.
- run galaxy on data stream to extract landmark instances. At these instances, locate laser dot and create a new feature in SLAM. Combine this with egomotion estimate
- send all of this to Ed's solver.
- register with known map of area.
somewhere in this is the initialization procedure (where is the starting point??)
Monday, Feb 6, 2006
For now, I'm focusing on the problem of locating the laser dot in the video stream. It looks like it will be quite difficult, and I'm not entirely sure how to proceed. There are several pieces of information that will be useful in this respect
- the laser dot is a bright, bright red.
- the laser dot is usually a small circular shape of some sort
- The MTi provides an orientation measurement of where the laser range finder is pointing
- the laser range finder provides a distance measurement of how far the dot is from the laser range finder.
There are a few ways to try locating the laser dot
- convert to HSV and segment by color. look for little red circles.
- this doesn't work very well. Other objects also appear bright red and round, especially blinky LEDs on power strips and other devices. The carpeted floor in 33x is also bright red.
- estimate actual position of laser range finder with respect to ladybug. Combine this with orientation data and distance measurement to refine the laser dot search space.
- if all the estimates are accurate, then might not even need to look for the laser dot.
Estimating laser range finder's position seems like a reasonable thing to do. But how to do it?
- dumb guess. Assume the user's arm and hand are rigidly fixed. Maybe this is 'good enough'?
- search for laser range finder in the video stream.
- train a SIFT detector on an image of the laser range finder, use this to localize the disto and, from the SIFT scale information, can infer the distance of the laser range finder from the camera.
Seems like SIFT is a useful thing to have around, and will probably implement it eventually for other stuff, so maybe just implement it now and try it out?
Wednesday, Feb 1, 2006
notes for upcoming presentation to Robotics faculty on Friday.
Machine Understanding of Narrated Tours
This project has several unrelated major technical hurdles. To approach the problem, we are going to simplify numerous steps, and use existing tools when possible.
- Speech recognition problem
- not going to touch this at all. Instead, outsource to SLS (using Galaxy or something) and have a very simple, limited grammar
- Mapping problem
- initially, start off with known maps of the environment. Use CAD models of Stata, research space to start off with. Maybe do map building (e.g. SFM) later on
- Localization
- cheat as much as possible to begin with. Possibly use fiducials placed throughout the environment. Combine with camera to get precise location and pose estimate.
- What is the user gesturing at?
- use laser pointer and IMU to start with. Later on, maybe do vision-based pose estimation and then combine with epipolar constraint to determine exactly what's being pointed at (assumes user is pointing).
- object recognition
- rotflmao
- scene/place recognition
- ???
Monday, Jan 30, 2006
DARPA dude came to visit last Friday. Seth wanted me to demo my stuff, but I had made advance plans to go on the GSC ski trip to Sunday River. Instead, I made a short video for Seth to show.
http://people.csail.mit.edu/albert/ladypack/media/20060126-ladypack-darpa.avi (18 MB XViD encoded)
Some screenshots:
Thursday, January 19, 2006
TODO
- localize laser pointer dot in images
- pose estimation of laser pointer
- mockup video for DARPA
- profile MTi orientation data for different points in stata
Friday, January 6, 2006
Learned some OpenGL and played around with the XSens MTi sensors to get a feel for how the drift is. Made a couple videos.
http://people.csail.mit.edu/albert/ladypack/media/lp1-20060104-190915.mti-xvid.avi
http://people.csail.mit.edu/albert/ladypack/media/lp1-20060104-202759.mti-xvid.avi
The first video is the integrated IMU data where the IMU didn't move at all for 10 seconds. At the end, there's a few meters of drift, which doesn't seem like it's too bad.
The second video is also integrated IMU data where I held the IMU in my hand and rotated it roughly along each of the three major axes. At the end (~14 seconds), there is also a few meters of drift.




