To illustrate the (n)FOIL setting, consider Srinivasan et al.'s mutagenesis data set. Here, the problem is to predict the mutagenicity of a set of molecular compounds. Here, we will use the atom and bond structure information only. The dataset is divided into two sets: a regression friendly (r.f.) set with 188 entries (125 positives, 63 negatives) and a regression unfriendly (r.u.) set with 42 entries (13 positives and 29 negatives).
Consider now the
background theory:
atel(mol1, a0, c) bond(mol1, a0, a10, 7)
atel(mol1, a1, n) bond(mol1, a1, a3, 7)
atel(mol1, a2, o) bond(mol1, a3, a4, 2)
...
and the example muta(mol1). It is covered by the following hypothesis:
muta(X) :- atel(X, A, c), atel(X,B, o), bond(X, A,B, 7)
muta(X) :- atel(X, A, fl), bond(X, A,B, 2) |
This kind of hypothesis can be found by FOIL-like algorithms. In nFOIL,
the covers relation, i.e., the way we evaluate clauses and models is changed from
a deterministic one in FOIL to a probabilistic one. Instead of asking whether
an example e is entailed by H u B (where B denotes some background knowledge),
we compute the probability that the example is covered, i.e,
P(e | H,B). |
Reconsider the mutagenicity example, and assume
that the hypothesis is as sketched before. Then the attributes a1 and a2 are
atel(X, A, c), atel(X,B, o), bond(X, A,B, 7) and
atel(X, A, fl), bel(X, A,B, 2).
|
and the target predicate - the binary class variable - c is muta(X). Now assume that
the probability distributions P(ai|p) encoded in the model
are
P(c = t) = 0.6
P(a1 = t|c = t) = 0.7
P(a1 = t|c = f) = 0.4
P(a2 = t|c = t) = 0.5
P(a2 = t|c = f) = 0.1
|
Summing out yields
P(a1 = t, q2 = t) = 0.226
P(a1 = t, q2 = f) = 0.354
P(a1 = f, q2 = t) = 0.114
P(a1 = f, q2 = f) = 0.306
|
where t (f) denotes true (false). For the positively labeled
example, where {X/mol1}, we have that a1 succeeds and a2
fails, i.e., c = true, a1 = true, a2 = false. Thus,
P(e|H,B) = [ P(a1|c) * P(a2|c) * P(c) ] / P(a1, a2)
= [ 0.7 * 0.5 * 0.6 ] / 0.354
~ 0.595
|