1) In general, you want to use MC-SAT as it is much faster and can
handle deterministic dependencies better than Gibbs.
2) The option -numStepsEveryMCSat is an artifact which we have removed
from the newest version. Originally, it was implemented to speed up
MC-SAT, but it actually slows it down. As we have updated MC-SAT, we
have not supported this option and it appears to be broken (thus the
removal of it).
3) On the bloodtype domain, the differing results from different order
and number of query atoms resulted from a bug in the open- /
closed-world assumptions which Alchemy was making. We've fixed this bug
in the newest release and I have verified that the results stay the same
when reordering the query atoms. Thanks for the bug report :) Also, if
you want to compute P(bloodtypeA(Linus) | mchromA(uwe),pchromA(uwe)),
then you need to add mchromA(Uwe) and pchromA(Uwe) to bloodtype.db and
-ow mchromA,pchromA to the command line (telling Alchemy that all other
groundings of these predicates are unknown).
4) On the university domain, the same bug from (3) was causing faulty
results. I've verified the results from the new version and they are
similar to Primula's except for advisedBy(Gail,Glen). It appears you are
running inference in Alchemy on an empty database; is this wanted? Is
this what you are doing for the other systems? It wasn't clear to me
from the description.
5) We will have an implementation of exact inference in the near future
for handling small problems such as these. We have been focusing on
larger scale problems where exact inference is not feasible.
6) Multi-value encoding is achieved with the ! operator (i.e.
HasColor(object, !color) declares that HasColor(o, c_i) is true for some
c_i and HasColor(o, c_j) is false for all j != i, where c_i, c_j are of
type color).
7) The definition of noisy OR from Pearl (1988) is essentially the
following:
For each X_i, define an auxiliary variable U_i. If X_i = 0, then U_i = 0
with certainty. If X_i = 1, U_i = 0 with probability p_i, and U_i = 1
with probability 1 - p_i. Y is then the deterministic OR of the U_i's.
In other words, each "cause" X_i fails independently with probability
p_i to produce the "effect" Y. (p_i is often called the failure
probability of X_i.) The resulting distribution for Y is: P(Y=0|X) =
\prod_{True X_i's} p_i, P(Y=1|X) = 1 - \prod_{True X_i's} p_i.
To implement this in an MLN, we need the following features and weights:
Y <=> U_1 v ... v U_n with infinite weight.
For each (X_i,U_i), a set of features capturing P(U_i|X_i):
!X_i ^ U_i with minus infinite weight.
X_i ^ !U_i with weight log(p_i).
X_i ^ U_i with weight log(1-p_i).
Now P(Y|X) = \sum{U} P(Y,U|X), where X is (X_1, ..., X_n), similarly for
U, and the sum is over all states of U. A bit of algebra shows that
P(Y=0|X) = \prod_{True X_i's} p_i and P(Y=1) = ..., as desired.
Sometimes a "leak node" is included, allowing Y to be true even if all
the X's are false, and representing all the "causes" that we don't know
about. This just corresponds to adding an extra X_0 that is always set
to true.
8) We are taking a look at the last two challenge problems and will
suggest reasonable encodings (or if you already have models, we can take
a look at them).
Also, in bloodtype.db, I think founder(Wilhelm) should be
founder(Willhelm) (it should be the same constant as in the father
predicate). I also noticed that inference on the bloodtype example took
much longer than I would expect. I looked at what MC-SAT was doing in
each sample and the problem appears to be a hard SAT problem; either a
solution was found very quickly (in less than 1000 flips) or not at all
(after stopping at 1 million flips). I would suggest setting
-mwsMaxSteps 1000; it speeds it up without significantly affecting the
results.