Machine learning refers to a class of computational and
statistical methods for estimating dependencies between data and using
the estimated dependencies to make predictions. For instance, in computer vision research, the data may be sizes of objects (measured in pixels) and the
corresponding class-labels (i.e. `person', or `car'), the dependencies
are the probability distributions of sizes given a particular
class of objects, and the predictions involve calculating the
probability that a new, unknown object, of given size `s', is a person
or a car. The underlying idea is that there is some dependency between
the label and the feature value (size, in the above example), and the
exact nature of the dependency can be `learnt' by observing a number of
examples.
Models and Algorithms in Learning
There are many
models for describing dependencies between data.
There are also many algorithms for learning
the parameters of a chosen model. Neural networks are one example of a model used in learning.
A support vector machine (SVM) is another type of model.
Graphical models are a general class of probabilistic models.
Maximum likelihood estimation by gradient descent is an example of a learning algorithm.
Parameter Estimation
Many learning
problems are solved by estimating some model parameters from data (for example, estimating
the parameters of the probability distributions of object sizes).
To estimate parameter values, a cost function over parameter values is chosen,
and the parameter value that minimises the cost is the solution of the estimation problem.
For instance, when choosing parameter values for a Gaussian model given some data samples,
the estimated mean and covariance are the ones that maximise the probability P of having generated
all the data from a Gaussian distribution. In this case, the cost function is simply the
negative of the probability P.
Since the optimisation problems involved in estimation can rarely
be solved in closed form, there are many techniques (such as gradient
descent and simulated annealing) to search for extrema. Genetic
algorithms (as far as I understand) are one such family of search
techniques that is well suited for high dimensional problems (i.e. when
the feature space has many, many dimensions, as opposed to the single
dimension---size---in the example given above).
Non-parametric Methods
Not all approaches to machine learning involve parametric models.
Non-parametric methods, such as Gaussian processes, are also quite popular. These methods typically require more data,
but can model a wider range of data dependencies than parametric models.