Intro to VXL ------------ VXL is a big, powerful collection of libraries for image processing, computer vision, and math ... the naming convention for each library is V-something-L. Today we'll start to look at two vnl : Numerics Library (vectors and matrices, doing math on them) vil : Core image library (loading/saving images, image processing) - Documentation is online. - The high level documentation is somewhat less than adequate. But check out "core : VXL overview documentation" - To figure out how to perform particular operations on vectors/matrices/images you'll probably have to dig into the class hierarchy and class declarations. - If you know the class/method name, google is a good tool for finding documentation! Basic matrix and vector operations ---------------------------------- Using these is pretty easy (and is often modelled on Matlab). For example, this declares a 3x4 matrix of double: #include int main() { vnl_matrix P(3,4); return 0; } Operators are overloaded as expected, so if we have another 3x4 matrix Q, we can add the two like this vnl_matrix R = P + Q; The vnl_vector is equally straightforward. Here we make a 4-element vector of doubles, premultiply it by P, and print the result: vnl_vector X(4); std::cerr << P*X; Note that: - vnl_matrix and vnl_vector are templated, i.e. they are classes parameterized by another type. - If you haven't seen templates before, don't worry .. while it's slightly tricky to write/debug a templated class properly, it's easy to use one that's been well written. The following types come "instantiated" already, so best to stick to these: vnl_matrix vnl_matrix vnl_matrix - order of indices is row first, then column - indices are zero-based! Extended example: vnl_matrix A(3,3); // 3x3 matrix, elements not initialized vnl_matrix B(3,3, 1.0); // 3x3 matrix, filled with ones. vnl_matrix R(3,4); // Rectangular matrix std::cerr << "A is " << A.rows() << 'x' << A.columns() << std::endl << "A has a total of " << A.size() << " elements" << std::endl; A(0,0) = 2.0; // Set top-left component of A. A(3,3) = 0.0; // *** Error, (3,3) is outside the range of A. A.set_size(3,4); // Change size of A, invalidating elements. R.update(A, 0, 1); // Copy A into R, starting at (0,1): last 3 cols R.set_column(0, B.get_column(0)); // Copy 1st col of B into R std::cerr << R.extract(3,3, 0,1) // Print last 3 cols << R.get_n_columns(1, 3) const; // Ditto A.fill(0.0); // Set all elements of A to 0.0 A.fill_diagonal(1.0); // Set diagonal elements to 1.0 A.set_identity(); // Set A to identity matrix R = R.transpose(); // Make transposed copy, assign to R R.inplace_transpose(); // Transpose R without copying. A.flipud(); // Reverse order of rows of A A.fliplr(); // Reverse columns A.normalize_rows(); // Divide each row by its 2-norm A.scale_row(0, 2.0); // Multiply row 0 by 2 vnl_matrix C = B + 0.1 * A; // Arithmetic C += 2.3; vnl_matrix Csqrt = C.apply(sqrt); // Square root all elements element_product(Csqrt, Csqrt); // Should be equal to C, modulo roundoff std::cerr << A.fro_norm() // Print sum of squares of elements << A.min_value(); // Print minimum element if (A.is_zero(1e-8)) std::cerr << "Each element of A is within 1e-8 of zero\n"; if (A.is_identity(1e-8)) std::cerr << "(A - I) is_zero to 1e-8\n"; - The most frequently asked question about vnl_matrix is "where is the inverse method"? The inverse is actually not defined as a method, because there are too many ways of forming it, each with different tradeoffs. The simple answer is that you can use the vnl_matrix_inverse class to compute an inverse object: #include int main() { std::cerr << vnl_matrix_inverse(A) * B; return 0; } - the vnl_matrix_inverse class extends the regular matrix class, and you intialize it with the matrix you want to invert! - this works using SVD (the "cadillac" of matrix decompositions, numerically very well-behaved but a bit slower) ... VXL also provides alternative algorithms/ways of doing this Working with images ------------------- Several related classes within the VIL library provide similar functionality. For this course, we'll use the class vil_image_view so make sure when reading the docs you're actually looking at this class! Loading an image - VXL supports .ppm, .ppm, .jpg, .gif, and other common image formats. The same code will automatically handle images from any of these formats. - To load an image from disk: #include #include #include #include int main() { vil_image_view img3; img3 = vil_convert_to_n_planes(3, // convert image to a 3-band image vil_convert_cast(vxl_byte(), // force image to have byte [0..255] pixels vil_load("test.ppm"))); } - This is somewhat tricky: * Note the declaration of the image is templated. The standard type for loading is or (equivalently) . To manipulate images later, we may need to store larger integer values or even floating point data .... * The call to vil_load opens the file on disk and generates a temporary view of the image inside the file. * The call to vil_convert_cast, causes the data to be converted to a particular data type. Since image data already has one byte per channel, it doesn't do much in this case. But we can use this call to convert the data to floating point, for example, immediately upon opening the image. * Finally, the call to vil_convert_to_n_planes is used so that the data is stored in 3 planes. So if the image is grayscale, this will convert the image to RGB with 3 identical planes. Note that: - order of indices is first x (left-to-right), then y (top-to-bottom) - this is different from the convention for matrices! - indices are zero-based! Acessing/writing image pixels Simple threshholding example: // what does this code do? unsigned char THRESH = 200; int i,j; for (unsigned j = 0; j < img3.nj(); ++j) for (unsigned i = 0; i < img3.ni(); ++i) if (img3(i,j,0) < THRESH && img3(i,j,1) < THRESH && img3(i,j,2) < THRESH) { img3(i,j,0) = 0; img3(i,j,1) = 0; img3(i,j,2) = 0; } - img3.ni() returns the number of pixels in the x direction (width) - img3.nj() y direction (height) - img3.nplanes() returns the number of "planes" in the image the RGB image above has 3 planes - but in general an image can have an arbitrary number of planes (e.g., 4 planes for RGBA, 2 planes for x and y derivatives, or even a movie indexed by time t) Forcing an RGB interpretation - VIL also supports colour images with a single plane. In such images, each entry is a structure with 3 components corresponding to the R, G, and B values for a pixel. - This can sometimes make your code more legible and intuitive. // declaration of a single-plane RGB image, with one byte per colour channel, // templates of templates! vil_image_view > imgRGB; // note the space in "> >" above, so as not to confuse the compiler // with the ">>" input operator imgRGB = vil_convert_to_component_order( img3 ); // convert image to RGB format - Given this kind of single-plane RGB image, we can modify the previous example by replacing img3(i,j,0) with imgRGB(i,j).r img3(i,j,1) with imgRGB(i,j).g img3(i,j,2) with imgRGB(i,j).b Saving the image // pretty simple vil_save(img3, "thresh.ppm"); - The choice of file format is determined automatically from the extension of the filename. - Note: any images that will be saved to disk MUST be converted to or , as the save function will not store images of different data types. This means that if you have floating point data, or integers outside of [0,255] you must re-scale the data appropriately so that it lies within 0-255 and then convert it to the appropriate single byte storage type.