>Achilles

1. What is nFOIL?

In relational learning or inductive logic programming, one typically induces a set of rules (or clauses). The resulting rule-set then defines a disjunctive hypothesis, since an instance is classified as positive if it satisfies the conditions of one of the rules. On the other hand, a probabilistic model defines a joint probability distribution over a class variable and a set of "attributes" or "features", and the type of model constrains the joint probability distributions that can be represented.

A straightforward but powerful idea to integrate these two approaches is to interpret the clauses or rules as "attributes" over which a joint probability distribution can be defined. Using Naive Bayes as the probabilistic model, this translates into the statement that

clauses are independent.

For more details, we recommend that you read

N. Landwehr, K. Kersting, L. De Raedt. nFOIL: Integrating Naive Bayes and FOIL. In M. Veloso and S. Kambhampati, editors, Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05), pages 795-800, Pittsburgh, Pennsylvania, USA, July 9-13, 2005.

Some more details

nFOIL essentially performs a covering approach such as Quinlan's FOIL, in which one feature (in the form of a clause) is learned after the other, until adding further features does not yield improvements. Clauses are combined with Naive Bayes. The search heuristic is based on class conditional likelihood.

More precisely, the main difference between FOIL and nFOIL is that nFOIL does not follow a separate-and-conquer approach. In FOIL, this is possible because the final model is taken as the disjunction of the clauses (every clause covering a certain subset of examples), which has two consequences:

Examples that are already covered do not have to be considered when learning additional clauses.
(Non-recursive) clauses already learned do not need to be considered when scoring additional clauses.

These properties do not hold in nFOIL because every clause can affect the likelihood of all examples. Consequently, the nFOIL algorithm is obtained from FOIL by changing two components in FOIL:

The set of examples is not changed.
The score of a clause c is the conditional likelihood of data assuming model C u {c0} (where C is the set of all clauses discovered so far) with optimal parameters:
max l(E, C u {c0})
[Go to next Section]