A 10-minute overview of statistical machine learning

Machine learning in 10 minutes!

What is machine learning?

Machine learning refers to a class of computational and statistical methods for estimating dependencies between data and using the estimated dependencies to make predictions. For instance, in computer vision research, the data may be sizes of objects (measured in pixels) and the corresponding class-labels (i.e. `person', or `car'), the dependencies are the probability distributions of sizes given a particular class of objects, and the predictions involve calculating the probability that a new, unknown object, of given size `s', is a person or a car. The underlying idea is that there is some dependency between the label and the feature value (size, in the above example), and the exact nature of the dependency can be `learnt' by observing a number of examples.

Models and Algorithms in Learning

There are many models for describing dependencies between data. There are also many algorithms for learning the parameters of a chosen model. Neural networks are one example of a model used in learning. A support vector machine (SVM) is another type of model. Graphical models are a general class of probabilistic models. Maximum likelihood estimation by gradient descent is an example of a learning algorithm.

Parameter Estimation

Many learning problems are solved by estimating some model parameters from data (for example, estimating the parameters of the probability distributions of object sizes). To estimate parameter values, a cost function over parameter values is chosen, and the parameter value that minimises the cost is the solution of the estimation problem. For instance, when choosing parameter values for a Gaussian model given some data samples, the estimated mean and covariance are the ones that maximise the probability P of having generated all the data from a Gaussian distribution. In this case, the cost function is simply the negative of the probability P.

Since the optimisation problems involved in estimation can rarely be solved in closed form, there are many techniques (such as gradient descent and simulated annealing) to search for extrema. Genetic algorithms (as far as I understand) are one such family of search techniques that is well suited for high dimensional problems (i.e. when the feature space has many, many dimensions, as opposed to the single dimension---size---in the example given above).

Non-parametric Methods

Not all approaches to machine learning involve parametric models. Non-parametric methods, such as Gaussian processes, are also quite popular. These methods typically require more data, but can model a wider range of data dependencies than parametric models.

Back to my research page