Probabilistic Programming

tbenst · on Oct 23, 2016

I attended DARPA's Probabilistic Programming for Advancing Machine Learning (PPAML) summer school. The aim of Probabilistic Programming languages (PPL) is to abstract away the act of Bayesian inference into modular engines such that switching from say Hamiltonian Monte Carlo to a Particle Filter requires changing exactly one string. If you can write a model in a PPL, you get inference for free [1].

The purported aim is to allow machine learning code that today requires 1000-10,000 lines of code to be written in 10-100 lines.

This works brilliantly for single shot learning. Let's say you are trying to teach a computer to recognize a handwritten character after a single example. First, you build a generative model that follows how letters are constructed: hand touches paper and makes a primitive shape (line, curve, loop, etc.). Multiple such primitives are strung together to form a character. For each example characters, infer the most likely sequence of such primitives. When a new classification is requested, take samples from the generative model for each known character. Calculate the difference in pixel value, and do this hundreds of thousands of times. You can now construct an accurate marginal probability for each known character while only needing a single example.

Powerful stuff!

[1] efficiency not guaranteed..(yet)

Edit: Here's a few curated resources: http://webppl.org/ http://www.robots.ox.ac.uk/~fwood/anglican/ http://mc-stan.org/ http://dippl.org/ https://probmods.org/

gnipgnip · on Oct 23, 2016

The "Inverse Graphics" paper is a nice read too :)

http://www.cv-foundation.org/openaccess/content_cvpr_2015/ht...

kriro · on Oct 24, 2016

Very interesting and I haven't read much about the topic (yet) so excuse my pre-morningcoffee naive comment. I know very little about languages like Stan etc. but what's the reasoning of baking these "inference for free" ideas into the language. It sounds like something for a library. Are there distinct advantages to having a new language for these things? My very naive approach would be using something like the pipelines in Phoenix (pipe_through :hamiltonian)

I suppose the reasoning is the same as for Prolog?

adamnemecek · on Oct 23, 2016

What are the current ppl languages that are pushing the proverbial envelope?

tansey · on Oct 23, 2016

Stan (http://mc-stan.org/documentation/) is arguably the most advanced language. It's especially pushing the bounds of doing automatic variational inference, for the scenario where your model does not have a nice conjugate form that would be amenable to Gibbs sampling. It's not quite reached what I would say is production-quality, but some of the best people in the world of computational Bayesian methods (e.g., Michael Betancourt, most of David Blei's lab, etc.) are working on it.

tbenst · on Oct 23, 2016

Yes Stan is awesome! The main difference between something like Stan and "next gen" languages like Anglican and Webppl, is there are basically no restrictions in where you use a distribution. Nested inference, probabilistic recursion, etc are all possible. For certain classes of problems this leads to greatly enhanced expressiveness. On the flip side, Stan is more production ready right now

jmde · on Oct 24, 2016

A major downside of Stan is its lack of support for discrete priors. This isn't really advertised very well, but is more of a problem than it might sound initially. Its type handling also can get a little frustrating at times. Overall, I highly recommend it but it does have its downsides, and there's some room for alternatives or improvement.

twiecki · on Oct 24, 2016

PyMC3 (http://pymc-devs.github.io/pymc3/) has all the powerful samplers that Stan has (i.e. NUTS) as well as support for discrete priors. It allows you to specify your models in pure Python, supports matrix algebra, and also has variational inference which allows for large-ish scale machine learning (e.g. here is a blog post where I train a Bayesian Neural Net on MNIST http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learn...). Under the hood, it uses theano for JIT compilation to C or GPU.

I'm one of the core devs.

tbenst · on Oct 23, 2016

Anglican for high performance (Clojure host, strong multithreading support) and WebPPL for ease of integration (JavaScript client side or server side). Both are implemented using continuation passing style trampolines.

sgt101 · on Oct 23, 2016

errm "continuation passing style trampolines"...

???

tbenst · on Oct 23, 2016

You don't need to know what that is to use the languages :). In essence, functions always take one argument plus a "continuation," which takes the value that would have otherwise returned. This is called a trampoline. It's typically a compilation technique rather than a programming style.

sgt101 · on Oct 30, 2016

Sounds like a curried function to me.

murbard2 · on Oct 24, 2016

As much as I love the concept probabilistic programming, the name makes me cringe every time I hear it. It’s verbose, doesn’t abbreviate well and borders on being a tongue twister. More substantially, the name suggests that the act of programming itself is probabilistic (a real but distinct phenomenom :) )

Worse, rather than highlighting the key value proposition: automated inference, the name suggests a focus on the development of probabilistic programs. Perhaps, as a result, many introductions to probabilistic programming concern themselves almost exclusively on the comparatively trivial task of expressing models as algorithms while blissfully ignoring the thorny problem of inference.

I suggest talking about “automated inference” instead. It rolls off the tongue much more easily, and, more importantly, gets to the heart of the matter.

kleiba · on Oct 24, 2016

But for me, "automated inference" evokes an associations with description logics or ontologies rather than probabilistic programming.

murbard2 · on Oct 24, 2016

Would you prefer automated induction?

kleiba · on Oct 25, 2016

I don't think that's much of a difference. Doesn't probabilistic programming allow for a wider range of applications than automated inference/induction? To me, a language like Prolog would fit a such a description better.

kriro · on Oct 24, 2016

ProPro

adamnemecek · on Oct 23, 2016

"Design and Implementation of Probabilistic Programming Languages" http://dippl.org is pretty dank too. You implement your own probabilistic language in JS.

thecity2 · on Oct 23, 2016

I think "Bayesian Inferential Programming" would be more direct terminology.

ptero · on Oct 23, 2016

I know nothing about the probabilistic programming, but would not it be easier (and more applicable) to have a library with relevant functionality and a bunch of interfaces for Go/C/Python/... than to implement a new language? An honest question.

ppcsf · on Oct 24, 2016

I developed a PPL within C#, which I guess has a more familiar syntax for a lot of people. Basically, PPLs allow you to treat entire probabilistic models as primitives, and combine them as easily as we add integers. Let's say we have two models describing the distribution of university grades in America and India, and we want to combine them into a single model, where you have a 25% chance of being an American, and a 75% chance of being an Indian:

  var combined = from isAmerican in Bernoulli(0.25)
                 from grade in isAmerican ? americanGrades : indianGrades
                 select grade;

This is now a new model that can be composed further with other models. Building models like this feels very expressive.

The inference method is also completely decoupled from the model specification process, allowing us to perform a Sequential Monte Carlo just by writing:

  var smcResults = combined.SmcMultiple(100);

or

  var pimhResults = combined.Pimh(10);

for a Particle-Independant Metropolis-Hastings.

There's more examples here: https://github.com/joashc/csharp-probability-monad

glassx · on Oct 23, 2016

It seems that their definition of "language" encompasses more than general-purpose PLs and includes DSLs, libraries and code-generators.

In their list there are libraries for Scala, .NET, Python, C, OCaml and Javascript.

jbpetersen · on Oct 23, 2016

Example from another comment here: http://dippl.org/chapters/02-webppl.html

bogomipz · on Oct 24, 2016

If anyone is interested Manning released a decent book on the subject a few months back. You can download a couple of sample chapters here:

https://www.manning.com/books/practical-probabilistic-progra...

dorianm · on Oct 23, 2016

There is a "probabilistic programming" gem for ruby: https://github.com/urso/rb_probdsl (from 2010)

eli_gottlieb · on Oct 23, 2016

My kingdom for a compiled, maintained PPL with recursive queries!

piemonkey · on Oct 23, 2016

As a researcher that is currently working on implementing and improving existing PPLs, I'm very curious: what is your specific use-case that you have in mind for recursive queries?

eli_gottlieb · on Oct 23, 2016

Honestly, I'd like to be able to try out some of the old social-reasoning demos from forestdb. It's not really a high-priority thing; it just bugs me that there were working languages for doing this (Church) and now there's basically just Anglican.

wkv · on Oct 24, 2016

If you're interested in social reasoning and nested inference models, there's some examples of how they're implemented in WebPPL here: http://agentmodels.org/chapters/7-multi-agent.html. I think many of the reasoning examples from forestdb that are currently written in Church could be translated into WebPPL in a fairly straightforward manner using the same structure as the agentmodels examples.

YeGoblynQueenne · on Oct 23, 2016

>> what is your specific use-case that you have in mind for recursive queries?

They probably want to run a recursive query.

(sorry)

posterboy · on Oct 23, 2016

I never liked this approach to recursion, as in google's joke search correction for recursion, because recursion is a form of progressive iteration, not reiteration.

YeGoblynQueenne · on Oct 23, 2016

Recursion is frequently used as progressive iteration, but it can go further than that.

Typical example, a factorial function, in Erlang:

  factorial(N) -> 
	factorial(N,1).

  factorial(0, F) -> F;
  factorial(N, F) -> 
	factorial(N-1, N*F).

It's strange to read that as iteration. Instead it feels natural to describe it as composition from parts that are of the same kind as the whole.

posterboy · on Oct 26, 2016

of course that is some kind of iteration because tail recursion is isomorphic to loops and that N-1 sure looks like a while(--N)

YeGoblynQueenne · on Oct 27, 2016

I'm not sure I know what "isomorphic" means in this context.

For sure, (tail) recursion places a pointer to a function on the stack while a loop is a conditional jump. They're not the same thing. I think the similarity between n-1 and --n is a red herring, here.

gnipgnip · on Oct 23, 2016

Can someone in the know please comment about how this is useful ?

I mean yes this is neat, but inference engines already work declaratively with factor-graphs and potentials.

Are there techniques exploiting algebra/logic that helps make inference faster in practice ?

BenoitEssiambre · on Oct 23, 2016

In my humble opinion something like this will be at the root of the solution to general AI.

Deep learning may have the edge currently for being a bit more mathematically tractable and much easier to massively parallelize, but this seems to me like a more fundamental foundation for AI (there remain issues to be solved with it though).

These languages are used to describe models and run probabilistic simulations of them but can also be used to describe other programming languages probabilisticly. This means the potential to go up one leve of abstraction to a probabilistic program that writes other programs to model and predict the world.

Here's a description of one of my failed attempts at this:

https://www.quora.com/What-deep-learning-ideas-have-you-trie...

botexpert · on Oct 23, 2016

this stuff existed for decades and it hit a wall.

sampling is slow.

general AI needs structured prediction. yes, graphical models can do that (HMM, CRF etc.) but inference starts to get slow and special case implementations are required for different domains. [1]

I see no way for someone getting automatic inference and training for [1] with a probabilistic programming language.

given the new deepmind paper on discovering shortest path algorithms, it's quite clear that structured predicition assisted by deep networks works quite well (this was demonstrated by a vast array of work) and graphical models represented by probabilistic programming languages are far away from being that successful.

[1]: http://www.philkr.net/home/densecrf

twiecki · on Oct 24, 2016

I don't think it has to be either or. For example, you can implement a Deep Net in a probabilistic programming framework (http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learn...) like PyMC3. The inference here is not using sampling but rather ADVI (http://pymc-devs.github.io/pymc3/api.html#advi) which is almost as general but much much faster, and can be run on sub-samples of the data (similar to stochastic gradient descent used in deep learning). Once we bridge these two domains we can get the best of both worlds, like a deep net HMM.

eli_gottlieb · on Oct 23, 2016

>this stuff existed for decades and it hit a wall.

>sampling is slow.

Right. Sampling is slow. That's why automated variational inference is a very active field of research these days: instead of approximating the posterior by sampling, you approximate it with an optimization problem whose gradient-descent provides a bound on the posterior probabilities across the parameter space.

All the work on training deep neural networks has made our hardware and software very efficient at solving optimization problems.

kqr · on Oct 23, 2016

If this site is talking about what I think it is, then "regular" programming languages are a special case of this where the probability distribution of each value is the trivial case.

Only based on that, this will be useful for anything regular programming is useful.

In addition, this makes modeling uncertainty much easier (in the sense of "closer at hand"). That may allow for new ways of dealing with user input. Instead of saying "is this email address valid or invalid", we can start asking questions like "Is the probability of this email being valid larger than 99%? Then we'll accept it. Is it larger than 95%? Then we'll ask the user to confirm it. Is it less than 95%? Then we'll tell the user it's incorrect and have them retype it."

These are things we normally don't care to model because it would require lots of additional machinery. With that machinery built into the programming language, it is much easier to reach for, potentially with a better user experience to boot.

gnipgnip · on Oct 23, 2016

Yes, this is precisely what is called probabilistic inference.

I was trying to find out what more it can do other than parsing a DSL into a graphical model.

YeGoblynQueenne · on Oct 23, 2016

The first probabilistic language I came across was PRISM [1] (I made a post about it earlier today). It's a probabilistic _logic_ programming language so you're basically declaring your Prolog facts with probabilities attached and then run your program as a simulation, drawing samples from a distribution over variable bindings.

I see it as having a database of facts about the world with attached probabilities that tell you which view over (or perhaps version of) the world is the most likely.

And then you can do EM search for optimal parameters. Learning, right? It's all built-in to the language and you don't need to hand-craft task-specific versions depending on your domain (like Baum-Welch, Inside-Outside etc).

Also, it's a probabilistic Prolog: it's Turing complete and gives you all the expressive power of first-order predicate calculus. With probabilities. And learning of parameters from data.

Languages like this go way, way beyond ad-hoc implementations of inference over graphical models, to giving us a new vocabulary to express reasoning over vast sets of data.

__________

[1] http://rjida.meijo-u.ac.jp/prism/

platz · on Oct 23, 2016

> graphical models typically serve as coarse, high-level descriptions, eliding critical aspects such as fine-grained independence, abstraction and recursion.

Maybe you're not impressed with graphical models due to your experience dealing with them according to the points above.

by "writing code that generates a sample from the joint distribution", you can achieve much better modelling and control than what it would take with the normal methods of producing graphical models.

kqr · on Oct 23, 2016

Plenty of things! The real world is often best modeled by a probabilistic model; you can rarely state facts in definite terms. The probability the bus arrives on time may be 70%. The probability that a particular host is up may be 92%. The probability of needing a reboot within the next week is 2%.

The probability that the user switched off Wifi on their machine is 4%, and the probability that their router is having trouble is 3%. These numbers can be based on actual measurements. What do we tell the user when they have problems connecting? Instead of just saying "it's either this or that" we can run the numbers, and perhaps in aggregate there's an overwhelming probability it's a particular event, in which case we suggest that first.

There are so many cases where we don't actually know for certain all the parameters involved, but the conventional approach is still to round the probability either up to 100% or down to 0%. Simply because that's easier in conventional programming languages. As a result, you might not see these events as having probability distributions, but they do.

gnipgnip · on Oct 23, 2016

https://en.wikipedia.org/wiki/Graphical_model

YeGoblynQueenne · on Oct 23, 2016

Hey, come on- give an explanation. Don't just flip people the link, innit.

arnorhs · on Oct 23, 2016

wouldn't you be able to type in a correct email with a probability of 0.94 of it being correct?

YeGoblynQueenne · on Oct 23, 2016

The point is not efficiency, but eloquence, I guess. From the website:

>> However, many of the most innovative and useful probabilistic models published by the AI, machine learning, and statistics community far outstrip the representational capacity of graphical models and associated inference techniques.

>> PROBABILISTIC PROGRAMMING LANGUAGES aim to close this representational gap, unifying general purpose programming with probabilistic modeling;

I see it as giving the tools to the community to describe their models and automate inference over them in a unified manner that can be communicated more easily, and in a way that is better understood by all.

imh · on Oct 23, 2016

It's not just about making inference faster. Most of the PGM libraries/DSLs I know of are kinda clunky and inflexible. This is all about answering questions about how to make it easier and more powerful, not only faster.

gnipgnip · on Oct 23, 2016

Unless there is scope for optimization, it should, by necessity, be that the representations are essentially the same, thus making it essentially a problem of parsing.

The research listed in the page (http://probabilistic-programming.org/research/) appears to take the following courses,

Stochastic processes-ish,

- Inference techniques for handling recursion.

PGM-related work,

- Parallelization

- Optimizations for MCMC based on structure

(Old school) Theoretical CS,

- Formalisms reminiscent of Languages.

It's still not entirely clear to me how important this work is; though heavy weights like J. Tenenbaum and others continue to work on it.

The page says that many models can't be subsumed under PGMs, and yes that is true for things like PPCGs and other recursive things (martingales, stochastic processes..).

However, things that PPLs are known for like the inverse graphics work, are really PGMs. It's entirely possible that what I'm asking is akin to questioning the significance of the Deep Neural networks and assorted frameworks, in contrast to chain rule; but considering that it is more than getting X% at Imagenet here, I think it is a reasonable question to ask. Is it about the representation or the implementation ?

gnipgnip · on Oct 23, 2016

- It appears there are indeed things like dataflow analysis/ SMT that can achieve better performance over current techniques.

- Inference on loops seems to be something that can be handled as well (dynamically ?).