The performance characteristics of different algorithms

When we introduced conditioning we pointed out that the rejection sampling and enumeration (or mathematical) definitions are equivalent—we could take either one as the definition of how Infer should behave with condition. There are many different ways to compute the same distribution, it is thus useful to separately think about the distributions we are building (including conditional distributions) and how we will compute them. Indeed, in the last few chapters we have explored the dynamics of inference without worrying about the details of inference algorithms. The efficiency characteristics of different implementations of Infer can be very different, however, and this is important both practically and for motivating cognitive hypotheses at the level of algorithms (or psychological processes).

The “guess and check” method of rejection sampling (implemented in method:"rejection") is conceptually useful but is often not efficient: even if we are sure that our model can satisfy the condition, it will often take a very large number of samples to find computations that do so. To see this, try making the baserate probability of A, B, and C lower in this example:

var baserate = 0.1

var model = function(){
var A = flip(baserate)
var B = flip(baserate)
var C = flip(baserate)
condition(A+B+C >= 2)
return A
}

viz(Infer({method: 'rejection', samples: 100}, model))


Even for this simple program, lowering the baserate by just one order of magnitude, to $0.01$, will make rejection sampling impractical.

Another option that we’ve seen before is to enumerate all of the possible executions of the model, using the rules of probability to calculate the conditional distribution:

var baserate = 0.1

var model = function(){
var A = flip(baserate)
var B = flip(baserate)
var C = flip(baserate)
condition(A+B+C >= 2)
return A
}

viz(Infer({method: 'enumerate'}, model))


Notice that the time it takes for this program to run doesn’t depend on the baserate. Unfortunately it does depend critically on the number of random choices in an execution history: the number of possible histories that must be considered grows exponentially in the number of random choices. To see this try adding more random choices to the sum (following the pattern of A). The dependence on size of the execution space renders enumeration impractical for many models. In addition, enumeration isn’t feasible at all when the model contains a continuous distribution (because there are uncountably many value that would need to be enumerated).

There are many other algorithms and techniques for probabilistic inference; many are implemented as methods for Infer in WebPPL. For instance, Markov chain Monte Carlo inference approximates the posterior distribution via a random walk. (By default the ‘method:”MCMC”’ yields the Metropolis Hastings version of MCMC).

var baserate = 0.1

var model = function(){
var A = flip(baserate)
var B = flip(baserate)
var C = flip(baserate)
condition(A+B+C >= 2)
return A
}

viz(Infer({method: 'MCMC', lag: 100}, model))


See what happens in the above inference as you lower the baserate. Unlike rejection sampling, inference will not slow down appreciably (but results will become less stable). Unlike enumeration, inference should also not slow down exponentially as the size of the state space is increased. This is an example of the kind of tradeoffs that are common between different inference algorithms.

There are a number of other inference methods available in WebPPL. These include sequential Monte Carlo and variational inference. As with rejection sampling, enumeration, and MCMC, their performance characteristics can vary depending on details of the model.

The landscape of inference algorithms

This section is waiting for a refresh. In the meantime, see PPAML Summer School 2016: Approximate Inference Algorithms.

Process-level cognitive modeling

As we noted in an earlier chapter, there is an interesting parallel between the Infer abstraction, which separates model specification from inference method, and the idea of levels of analysis in cognitive science @Marr1982. For most of this book we are interested in the computational level of describing what people know about the world and what inferences that knowledge licenses. That is, we treat the model argument to infer as the scientific hypothesis, and the options (including ‘method’) argument as a engineering detail needed to derive predictions. We can make a great deal of progress with this level of abstraction.

The algorithmic level goes further, attempting to describe the process by which people draw these inferences, and taking the options to Infer as part of the hypotheses. While Infer specifies an ideal, different methods for inference will approximate this ideal better or worse in different cases; they will also do so with different time and space tradeoffs. Is it reasonable to interpret the inference algorithms that we borrow from statistics as psychological hypotheses at the algorithmic level? Which algorithm does the brain use for inference? Could it be MCMC? Enumeration?

If we take the algorithms for inference as psychological hypotheses, then the approximation and resource-usage characteristics of the algorithms will be the signature phenomena of interest.

Next chapter: Learning as conditional inference