BLOG Home | Download | User Manual | Source Code Docs | Publications
runblog
script in the top-level blog-<version>
directory. The script takes a sequence of blog files as arguments. For example:
./runblog examples/balls/poisson-prior-noisy.mblog examples/balls/all-same.eblog examples/balls/num-balls.qblog
BLOG files can contain a mixture of model statements, evidence assertions, and queries. However, it is often convenient to separate these things into model files (which conventionally have the .mblog
suffix), evidence files (with the .eblog
suffix), and query files (with the .qblog
suffix). In the examples
directory, you will also see files with the generic .blog
suffix, indicating that they contain all three kinds of statements.
examples/balls/all-same.eblog
contains some simple evidence assertions. More precisely, these are value evidence statements, which have the form:
Here term can be an arbitrary logical term (without free variables). The value must be a constant symbol: it can be a built-in constant symbol such as a numeral or "true"; a non-random constant symbol defined by the user, such asobs
term=
value;
Green
; or a random constant symbol (that is, a random function that takes no arguments).
The inference engine also supports symbol evidence statements, which introduce new symbols to stand for objects that might not have been referred to by any term in the language. Such statements also assert that the symbols refer to distinct objects and exhaust the specified set. For instance:
obs {Blip b: Time(b) = 8} = {B1, B2, B3};
There are not yet any examples of this in the examples
directory.
examples/burglary.qblog
, which contains the very simple query:
query Burglary;
Here the expression being queried is just a random constant. However, a query can be an arbitrary logical term (again, without free variables) or even a sentence, such as the equality sentence in examples/balls/id-uncert-noisy.blog
:
query (BallDrawn(Draw2) = BallDrawn(Draw3));
In fact, the inference engine even allows you to query the values of expressions that are not first-order sentences, but rather second-order expressions involving sets. For instance, the query in examples/balls/num-balls.qblog
is:
query #{Ball b};
This expression denotes the size of the set of all objects b
of type Ball
. It is also possible to query the sizes of more interesting sets, such as:
query #{Ball b: Color(b) = Green};
In fact, we don't even need to query the size of the set; we can query the set itself:
query {Ball b: Color(b) = Green};
In general, the expression after the query
keyword can be any ArgSpec, which is an expression that evaluates to some value in every possible world. These expressions are called ArgSpecs because they can also be used as arguments to conditional probability distributions (CPDs). The inference engine just computes a histogram of the values that the ArgSpec takes on in the worlds that are sampled -- the entities being counted in this histogram can be integers, user-defined objects, sets, etc. The only ArgSpecs that are treated specially are Boolean ones (sentences), for which the inference engine just reports the fraction of worlds where the ArgSpec is true
.
blog.LWSampler
class, which does likelihood weighting. You can use the --sampler
command line option (e.g., --sampler=blog.LWSampler
) to specify an alternative sampler. The following samplers are included in the current version:
blog.RejectionSampler
: constructs a possible world by starting with random variables (RVs) that have no parents, and then instantiating additional RVs after their parents are instantiated. This is the algorithm described in [Milch et al., IJCAI 2005]. Its instantiation process is not guided by the evidence or queries. When all the evidence and query variables have finally been instantiated, the RejectionSampler checks whether the evidence is satisfied, and rejects the world if not.
blog.LWSampler
: constructs a possible world by backward chaining from the query and evidence variables. The RVs are still instantiated in a context-specific topological order, but the choice of which RVs to instantiate is guided by the evidence and queries. Also, instead of sampling values for evidence variables (and then rejecting the world if these values don't match the asserted evidence) the LWSampler assigns all evidence variables their asserted values. It weights each world by the product of the probabilities of the evidence variables taking these values given their parents.
blog.MHSampler
: simulates a Markov chain over possible worlds. At each step, it uses a proposal distribution to propose the next world, and then chooses whether to accept this proposal or stay at the current world. This choice is based on the acceptance ratio for the proposed move. If you use the MHSampler, you can specify a proposal distribution class with the --proposer
flag. The proposers currently available are:
blog.GenericProposer
: this is the default proposer. It chooses an instantiated variable uniformly at random, and resamples a value for this variable given its parents. Since this proposal distribution doesn't look at the variable's children at all, it is even less effective than Gibbs sampling. On the other hand, it is completely general, since it just requires sampling from a variable's CPD (general Gibbs sampling for variables with infinite domains is non-trivial, but we're working on it).
blog.UrnBallsSplitMerge
: this is a special-purpose proposer for urn-and-balls models. It will not work on any other models. This proposer is included to illustrate how a modeller can hand-craft a proposal distribution for a particular task. Even with a hand-crafted proposer, the general-purpose MHSampler can still compute the acceptance ratio, although the proposer needs to compute the ratio of proposal probabilities (q(x|x') / q(x'|x)). There is also another proposer class called blog.UrnBallsSplitMergeNoIds
, but there is probably no reason to use it.
--num_samples
(or -n
) command line option. For information about other command line options, see the documentation for the Main
class.
--randomize
(or -r
) command line option.
examples/competing-workshops.blog
contains the parfactor
statement:
The general form of a parfactor statement is:parfactor Workshop W, Person X . MultiArrayPotential[[0.2, 0.8, 0.8, 0.8]] (hot(W), attends(X));
parfactor 〈type1〉 〈var1〉, …, 〈typek〉 〈vark〉 : 〈constraint〉 . 〈potential-spec〉 (〈term1〉, …, 〈termn〉);The list 〈type1〉 〈var1〉, …, 〈typek〉 〈vark〉 specifies the logical variables that the parfactor is quantifying over, along with their types. (To create a ground factor that does not quantify over any variables, use the keyword
factor
instead
of parfactor
). These variables are optionally followed by a
colon and a constraint, which is a BLOG formula. This formula must be a
conjunction of inequalities containing only constant symbols and the
parfactor's logical variables.
The 〈potential-spec〉 element specifies the potential to use for the parfactor. It has the form:
〈potential-type〉 [〈param1〉, …, 〈paramm〉]The 〈potential-type〉 element specifies the type of potential; currently, the only supported type is
MultiArrayPotential
. A potential of the given type will be
constructed with parameters 〈param1〉, …,
〈paramm〉. MultiArrayPotential
expects a single parameter, a row vector with one entry for each assignment
of values to the terms in the parfactor. For instance, in our example
above, we had
MultiArrayPotential[[0.2, 0.8, 0.8, 0.8]]
The outer set of square brackets delimits the list of parameters; the inner
brackets delimit the weight vector itself. The mapping from value
assignments to indices in the vector is lexicographic, with the last
dimension changing fastest. The possible values for each dimension are
ordered as in the guaranteed
statement that introduced them in
the BLOG file; for Boolean values, true
comes
before false
.
The list 〈term1〉, …, 〈termn〉 specifies the terms that the parfactor applies to. Each term must be either a function application or a counting term. The function applications must be non-nested: that is, their arguments must be logical variables or constant symbols, not other function applications. Nested terms are static errors; models which contain them will not compile.
A counting term is expressed in BLOG in with the syntax
#(〈type〉 〈var〉 : 〈constraint〉)[〈term〉]Here, 〈type〉 〈var〉 specifies the logical variable to be counted over, and its type. The constraint is optional; like the constraint on a parfactor, it is a conjunction of inequalities. The 〈term〉 element is the term whose values are being counted; again, the inference code assumes the term is non-nested.
On histogram ordering: As mentioned above, value
assignments map lexicographically onto indices
in MultiArrayPotential
parameter lists. Since counting terms
do not have declarations in the form of guaranteed
statements,
their order deserves special mention. Histograms are ordered so that all
items are initially in the first bucket; they gradually "trickle down" to
subsequent buckets. Buckets within a histogram are in
lexicographic order. For example, when counting over a Boolean
formula F(X)
, where the logical variable X
has
domain size n (i.e., there are n ground symbols of
type X
), histogram (n, 0)
(i.e., all n
values true
) comes first, followed by (n-1,1)
,
etc.
Important note: In order for a counting term to have a well-defined type, its constraint must be in normal form with respect to the constraint on the parfactor where it occurs. That is, if the counting variable x is constrained to be unequal to another variable y, then the excluded set for y in the parfactor's constraint must include all the other terms in x's excluded set. For more on normal form constraints, see the C-FOVE paper.
runblog 〈file.blog〉 -e ve.VarElimEngine
This will use the model, evidence, and queries in
〈file.blog〉, and print out the posterior distribution
for each query variable, as well as total inference time, in nanoseconds.
Query variables must be ground; BLOG does not support queries over logical
variables.
BLOG also performs lifted variable elimination; this is equivalent to ground variable elimination but potentially much faster. Again, this algorithm is only applicable to models without unknown objects. To run it, use:
runblog 〈file.blog〉 -e fove.LiftedVarElim
This will print query results and the inference time in nanoseconds.
With both kinds of variable elimination, you can include the -v
flag to run in verbose mode, which prints out the sequence of operations performed.