BLOG Home | Download | User Manual | Source Code Docs | Publications

Bayesian Logic (BLOG) User Manual

This manual gives a brief explanation of how to use the BLOG inference engine. It assumes that you already understand the BLOG language itself, which is described in several publications. There is also a syntax reference excerpted from Chapter 4 of Brian Milch's Ph.D. dissertation.

Basic Usage

The way to run the inference engine is with the runblog script in the top-level blog-<version> directory. The script takes a sequence of blog files as arguments. For example:

./runblog examples/balls/poisson-prior-noisy.mblog examples/balls/all-same.eblog examples/balls/num-balls.qblog

BLOG files can contain a mixture of model statements, evidence assertions, and queries. However, it is often convenient to separate these things into model files (which conventionally have the .mblog suffix), evidence files (with the .eblog suffix), and query files (with the .qblog suffix). In the examples directory, you will also see files with the generic .blog suffix, indicating that they contain all three kinds of statements.

Evidence

The file examples/balls/all-same.eblog contains some simple evidence assertions. More precisely, these are value evidence statements, which have the form:

obs term = value;

Here term can be an arbitrary logical term (without free variables). The value must be a constant symbol: it can be a built-in constant symbol such as a numeral or "true"; a non-random constant symbol defined by the user, such as Green; or a random constant symbol (that is, a random function that takes no arguments).

The inference engine also supports symbol evidence statements, which introduce new symbols to stand for objects that might not have been referred to by any term in the language. Such statements also assert that the symbols refer to distinct objects and exhaust the specified set. For instance:

obs {Blip b: Time(b) = 8} = {B1, B2, B3};

There are not yet any examples of this in the examples directory.

Queries

We can begin by looking at examples/burglary.qblog, which contains the very simple query:

query Burglary;

Here the expression being queried is just a random constant. However, a query can be an arbitrary logical term (again, without free variables) or even a sentence, such as the equality sentence in examples/balls/id-uncert-noisy.blog:

query (BallDrawn(Draw2) = BallDrawn(Draw3));

In fact, the inference engine even allows you to query the values of expressions that are not first-order sentences, but rather second-order expressions involving sets. For instance, the query in examples/balls/num-balls.qblog is:

query #{Ball b};

This expression denotes the size of the set of all objects b of type Ball. It is also possible to query the sizes of more interesting sets, such as:

query #{Ball b: Color(b) = Green};

In fact, we don't even need to query the size of the set; we can query the set itself:

query {Ball b: Color(b) = Green};

In general, the expression after the query keyword can be any ArgSpec, which is an expression that evaluates to some value in every possible world. These expressions are called ArgSpecs because they can also be used as arguments to conditional probability distributions (CPDs). The inference engine just computes a histogram of the values that the ArgSpec takes on in the worlds that are sampled -- the entities being counted in this histogram can be integers, user-defined objects, sets, etc. The only ArgSpecs that are treated specially are Boolean ones (sentences), for which the inference engine just reports the fraction of worlds where the ArgSpec is true.

Specifying the Inference Engine Class and Parameters

By default, the BLOG inference engine uses the blog.LWSampler class, which does likelihood weighting. You can use the --sampler command line option (e.g., --sampler=blog.LWSampler) to specify an alternative sampler. The following samplers are included in the current version:

blog.RejectionSampler: constructs a possible world by starting with random variables (RVs) that have no parents, and then instantiating additional RVs after their parents are instantiated. This is the algorithm described in [Milch et al., IJCAI 2005]. Its instantiation process is not guided by the evidence or queries. When all the evidence and query variables have finally been instantiated, the RejectionSampler checks whether the evidence is satisfied, and rejects the world if not.
blog.LWSampler: constructs a possible world by backward chaining from the query and evidence variables. The RVs are still instantiated in a context-specific topological order, but the choice of which RVs to instantiate is guided by the evidence and queries. Also, instead of sampling values for evidence variables (and then rejecting the world if these values don't match the asserted evidence) the LWSampler assigns all evidence variables their asserted values. It weights each world by the product of the probabilities of the evidence variables taking these values given their parents.
blog.MHSampler: simulates a Markov chain over possible worlds. At each step, it uses a proposal distribution to propose the next world, and then chooses whether to accept this proposal or stay at the current world. This choice is based on the acceptance ratio for the proposed move. If you use the MHSampler, you can specify a proposal distribution class with the --proposer flag. The proposers currently available are:
- blog.GenericProposer: this is the default proposer. It chooses an instantiated variable uniformly at random, and resamples a value for this variable given its parents. Since this proposal distribution doesn't look at the variable's children at all, it is even less effective than Gibbs sampling. On the other hand, it is completely general, since it just requires sampling from a variable's CPD (general Gibbs sampling for variables with infinite domains is non-trivial, but we're working on it).
- blog.UrnBallsSplitMerge: this is a special-purpose proposer for urn-and-balls models. It will not work on any other models. This proposer is included to illustrate how a modeller can hand-craft a proposal distribution for a particular task. Even with a hand-crafted proposer, the general-purpose MHSampler can still compute the acceptance ratio, although the proposer needs to compute the ratio of proposal probabilities (q(x|x') / q(x'|x)). There is also another proposer class called blog.UrnBallsSplitMergeNoIds, but there is probably no reason to use it.

For all these samplers, the number of samples can be controlled with the --num_samples (or -n) command line option. For information about other command line options, see the documentation for the Main class.

Random versus Reproducible Output

By default, the BLOG inference engine uses the same seed for its pseudorandom number generator every time it is run. This makes its behavior reproducible, which vastly simplifies debugging. To use a seed based on the clock time (which is much more random), use the --randomize (or -r) command line option.

Specifying Parfactors

As of version 0.3, BLOG supports the specification of parfactors in model files. (For more information about parfactors and lifted inference, see the C-FOVE paper). For example, the BLOG file examples/competing-workshops.blog contains the parfactor statement:

parfactor Workshop W, Person X . 
    MultiArrayPotential[[0.2, 0.8, 0.8, 0.8]] 
        (hot(W), attends(X));

The general form of a parfactor statement is:

parfactor ⟨type₁⟩ ⟨var₁⟩, …, ⟨type_k⟩ ⟨var_k⟩ : ⟨constraint⟩ . ⟨potential-spec⟩ (⟨term₁⟩, …, ⟨term_n⟩);

The list ⟨type₁⟩ ⟨var₁⟩, …, ⟨type_k⟩ ⟨var_k⟩ specifies the logical variables that the parfactor is quantifying over, along with their types. (To create a ground factor that does not quantify over any variables, use the keyword factor instead of parfactor). These variables are optionally followed by a colon and a constraint, which is a BLOG formula. This formula must be a conjunction of inequalities containing only constant symbols and the parfactor's logical variables.

The ⟨potential-spec⟩ element specifies the potential to use for the parfactor. It has the form:

⟨potential-type⟩ [⟨param₁⟩, …, ⟨param_m⟩]

The ⟨potential-type⟩ element specifies the type of potential; currently, the only supported type is MultiArrayPotential. A potential of the given type will be constructed with parameters ⟨param₁⟩, …, ⟨param_m⟩. MultiArrayPotential expects a single parameter, a row vector with one entry for each assignment of values to the terms in the parfactor. For instance, in our example above, we had

MultiArrayPotential[[0.2, 0.8, 0.8, 0.8]]

The outer set of square brackets delimits the list of parameters; the inner brackets delimit the weight vector itself. The mapping from value assignments to indices in the vector is lexicographic, with the last dimension changing fastest. The possible values for each dimension are ordered as in the guaranteed statement that introduced them in the BLOG file; for Boolean values, true comes before false.

The list ⟨term₁⟩, …, ⟨term_n⟩ specifies the terms that the parfactor applies to. Each term must be either a function application or a counting term. The function applications must be non-nested: that is, their arguments must be logical variables or constant symbols, not other function applications. Nested terms are static errors; models which contain them will not compile.

A counting term is expressed in BLOG in with the syntax

#(⟨type⟩ ⟨var⟩ : ⟨constraint⟩)[⟨term⟩]

Here, ⟨type⟩ ⟨var⟩ specifies the logical variable to be counted over, and its type. The constraint is optional; like the constraint on a parfactor, it is a conjunction of inequalities. The ⟨term⟩ element is the term whose values are being counted; again, the inference code assumes the term is non-nested.

On histogram ordering: As mentioned above, value assignments map lexicographically onto indices in MultiArrayPotential parameter lists. Since counting terms do not have declarations in the form of guaranteed statements, their order deserves special mention. Histograms are ordered so that all items are initially in the first bucket; they gradually "trickle down" to subsequent buckets. Buckets within a histogram are in lexicographic order. For example, when counting over a Boolean formula F(X), where the logical variable X has domain size n (i.e., there are n ground symbols of type X), histogram (n, 0) (i.e., all n values true) comes first, followed by (n-1,1), etc.

Important note: In order for a counting term to have a well-defined type, its constraint must be in normal form with respect to the constraint on the parfactor where it occurs. That is, if the counting variable x is constrained to be unequal to another variable y, then the excluded set for y in the parfactor's constraint must include all the other terms in x's excluded set. For more on normal form constraints, see the C-FOVE paper.

Invoking the VE and FOVE Engines

BLOG now includes exact algorithms based on variable elimination that are applicable to models with known objects (that is, without number statements). First, BLOG includes the standard variable elimination algorithm. To run it, use:

runblog ⟨file.blog⟩ -e ve.VarElimEngine

This will use the model, evidence, and queries in ⟨file.blog⟩, and print out the posterior distribution for each query variable, as well as total inference time, in nanoseconds. Query variables must be ground; BLOG does not support queries over logical variables.

BLOG also performs lifted variable elimination; this is equivalent to ground variable elimination but potentially much faster. Again, this algorithm is only applicable to models without unknown objects. To run it, use:

runblog ⟨file.blog⟩ -e fove.LiftedVarElim

This will print query results and the inference time in nanoseconds.

With both kinds of variable elimination, you can include the -v flag to run in verbose mode, which prints out the sequence of operations performed.