I attended DARPA's Probabilistic Programming for Advancing Machine Learning (PPAML) summer school. The aim of Probabilistic Programming languages (PPL) is to abstract away the act of Bayesian inference into modular engines such that switching from say Hamiltonian Monte Carlo to a Particle Filter requires changing exactly one string. If you can write a model in a PPL, you get inference for free [1].
The purported aim is to allow machine learning code that today requires 1000-10,000 lines of code to be written in 10-100 lines.
This works brilliantly for single shot learning. Let's say you are trying to teach a computer to recognize a handwritten character after a single example. First, you build a generative model that follows how letters are constructed: hand touches paper and makes a primitive shape (line, curve, loop, etc.). Multiple such primitives are strung together to form a character. For each example characters, infer the most likely sequence of such primitives. When a new classification is requested, take samples from the generative model for each known character. Calculate the difference in pixel value, and do this hundreds of thousands of times. You can now construct an accurate marginal probability for each known character while only needing a single example.
Very interesting and I haven't read much about the topic (yet) so excuse my pre-morningcoffee naive comment. I know very little about languages like Stan etc. but what's the reasoning of baking these "inference for free" ideas into the language. It sounds like something for a library. Are there distinct advantages to having a new language for these things?
My very naive approach would be using something like the pipelines in Phoenix (pipe_through :hamiltonian)
I suppose the reasoning is the same as for Prolog?
Stan (http://mc-stan.org/documentation/) is arguably the most advanced language. It's especially pushing the bounds of doing automatic variational inference, for the scenario where your model does not have a nice conjugate form that would be amenable to Gibbs sampling. It's not quite reached what I would say is production-quality, but some of the best people in the world of computational Bayesian methods (e.g., Michael Betancourt, most of David Blei's lab, etc.) are working on it.
Yes Stan is awesome! The main difference between something like Stan and "next gen" languages like Anglican and Webppl, is there are basically no restrictions in where you use a distribution. Nested inference, probabilistic recursion, etc are all possible. For certain classes of problems this leads to greatly enhanced expressiveness. On the flip side, Stan is more production ready right now
A major downside of Stan is its lack of support for discrete priors. This isn't really advertised very well, but is more of a problem than it might sound initially. Its type handling also can get a little frustrating at times. Overall, I highly recommend it but it does have its downsides, and there's some room for alternatives or improvement.
PyMC3 (http://pymc-devs.github.io/pymc3/) has all the powerful samplers that Stan has (i.e. NUTS) as well as support for discrete priors. It allows you to specify your models in pure Python, supports matrix algebra, and also has variational inference which allows for large-ish scale machine learning (e.g. here is a blog post where I train a Bayesian Neural Net on MNIST http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learn...). Under the hood, it uses theano for JIT compilation to C or GPU.
Anglican for high performance (Clojure host, strong multithreading support) and WebPPL for ease of integration (JavaScript client side or server side). Both are implemented using continuation passing style trampolines.
You don't need to know what that is to use the languages :). In essence, functions always take one argument plus a "continuation," which takes the value that would have otherwise returned. This is called a trampoline. It's typically a compilation technique rather than a programming style.
As much as I love the concept probabilistic programming, the name makes me cringe every time I hear it. It’s verbose, doesn’t abbreviate well and borders on being a tongue twister. More substantially, the name suggests that the act of programming itself is probabilistic (a real but distinct phenomenom :) )
Worse, rather than highlighting the key value proposition: automated inference, the name suggests a focus on the development of probabilistic programs. Perhaps, as a result, many introductions to probabilistic programming concern themselves almost exclusively on the comparatively trivial task of expressing models as algorithms while blissfully ignoring the thorny problem of inference.
I suggest talking about “automated inference” instead. It rolls off the tongue much more easily, and, more importantly, gets to the heart of the matter.
I don't think that's much of a difference. Doesn't probabilistic programming allow for a wider range of applications than automated inference/induction? To me, a language like Prolog would fit a such a description better.
"Design and Implementation of Probabilistic Programming Languages" http://dippl.org is pretty dank too. You implement your own probabilistic language in JS.
I know nothing about the probabilistic programming, but would not it be easier (and more applicable) to have a library with relevant functionality and a bunch of interfaces for Go/C/Python/... than to implement a new language? An honest question.
I developed a PPL within C#, which I guess has a more familiar syntax for a lot of people. Basically, PPLs allow you to treat entire probabilistic models as primitives, and combine them as easily as we add integers. Let's say we have two models describing the distribution of university grades in America and India, and we want to combine them into a single model, where you have a 25% chance of being an American, and a 75% chance of being an Indian:
var combined = from isAmerican in Bernoulli(0.25)
from grade in isAmerican ? americanGrades : indianGrades
select grade;
This is now a new model that can be composed further with other models. Building models like this feels very expressive.
The inference method is also completely decoupled from the model specification process, allowing us to perform a Sequential Monte Carlo just by writing:
As a researcher that is currently working on implementing and improving existing PPLs, I'm very curious: what is your specific use-case that you have in mind for recursive queries?
Honestly, I'd like to be able to try out some of the old social-reasoning demos from forestdb. It's not really a high-priority thing; it just bugs me that there were working languages for doing this (Church) and now there's basically just Anglican.
If you're interested in social reasoning and nested inference models, there's some examples of how they're implemented in WebPPL here: http://agentmodels.org/chapters/7-multi-agent.html. I think many of the reasoning examples from forestdb that are currently written in Church could be translated into WebPPL in a fairly straightforward manner using the same structure as the agentmodels examples.
I never liked this approach to recursion, as in google's joke search correction for recursion, because recursion is a form of progressive iteration, not reiteration.
I'm not sure I know what "isomorphic" means in this context.
For sure, (tail) recursion places a pointer to a function on the stack while a loop is a conditional jump. They're not the same thing. I think the similarity between n-1 and --n is a red herring, here.
In my humble opinion something like this will be at the root of the solution to general AI.
Deep learning may have the edge currently for being a bit more mathematically tractable and much easier to massively parallelize, but this seems to me like a more fundamental foundation for AI (there remain issues to be solved with it though).
These languages are used to describe models and run probabilistic simulations of them but can also be used to describe other programming languages probabilisticly. This means the potential to go up one leve of abstraction to a probabilistic program that writes other programs to model and predict the world.
Here's a description of one of my failed attempts at this:
general AI needs structured prediction. yes, graphical models can do that (HMM, CRF etc.) but inference starts to get slow and special case implementations are required for different domains. [1]
I see no way for someone getting automatic inference and training for [1] with a probabilistic programming language.
given the new deepmind paper on discovering shortest path algorithms, it's quite clear that structured predicition assisted by deep networks works quite well (this was demonstrated by a vast array of work) and graphical models represented by probabilistic programming languages are far away from being that successful.
I don't think it has to be either or. For example, you can implement a Deep Net in a probabilistic programming framework (http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learn...) like PyMC3. The inference here is not using sampling but rather ADVI (http://pymc-devs.github.io/pymc3/api.html#advi) which is almost as general but much much faster, and can be run on sub-samples of the data (similar to stochastic gradient descent used in deep learning). Once we bridge these two domains we can get the best of both worlds, like a deep net HMM.
>this stuff existed for decades and it hit a wall.
>sampling is slow.
Right. Sampling is slow. That's why automated variational inference is a very active field of research these days: instead of approximating the posterior by sampling, you approximate it with an optimization problem whose gradient-descent provides a bound on the posterior probabilities across the parameter space.
All the work on training deep neural networks has made our hardware and software very efficient at solving optimization problems.
If this site is talking about what I think it is, then "regular" programming languages are a special case of this where the probability distribution of each value is the trivial case.
Only based on that, this will be useful for anything regular programming is useful.
In addition, this makes modeling uncertainty much easier (in the sense of "closer at hand"). That may allow for new ways of dealing with user input. Instead of saying "is this email address valid or invalid", we can start asking questions like "Is the probability of this email being valid larger than 99%? Then we'll accept it. Is it larger than 95%? Then we'll ask the user to confirm it. Is it less than 95%? Then we'll tell the user it's incorrect and have them retype it."
These are things we normally don't care to model because it would require lots of additional machinery. With that machinery built into the programming language, it is much easier to reach for, potentially with a better user experience to boot.
The first probabilistic language I came across was PRISM [1] (I made a post about it earlier today). It's a probabilistic _logic_ programming language so you're basically declaring your Prolog facts with probabilities attached and then run your program as a simulation, drawing samples from a distribution over variable bindings.
I see it as having a database of facts about the world with attached probabilities that tell you which view over (or perhaps version of) the world is the most likely.
And then you can do EM search for optimal parameters. Learning, right? It's all built-in to the language and you don't need to hand-craft task-specific versions depending on your domain (like Baum-Welch, Inside-Outside etc).
Also, it's a probabilistic Prolog: it's Turing complete and gives you all the expressive power of first-order predicate calculus. With probabilities. And learning of parameters from data.
Languages like this go way, way beyond ad-hoc implementations of inference over graphical models, to giving us a new vocabulary to express reasoning over vast sets of data.
> graphical models typically serve as coarse, high-level descriptions, eliding critical aspects such as fine-grained independence, abstraction and recursion.
Maybe you're not impressed with graphical models due to your experience dealing with them according to the points above.
by "writing code that generates a sample from the joint distribution", you can achieve much better modelling and control than what it would take with the normal methods of producing graphical models.
Plenty of things! The real world is often best modeled by a probabilistic model; you can rarely state facts in definite terms. The probability the bus arrives on time may be 70%. The probability that a particular host is up may be 92%. The probability of needing a reboot within the next week is 2%.
The probability that the user switched off Wifi on their machine is 4%, and the probability that their router is having trouble is 3%. These numbers can be based on actual measurements. What do we tell the user when they have problems connecting? Instead of just saying "it's either this or that" we can run the numbers, and perhaps in aggregate there's an overwhelming probability it's a particular event, in which case we suggest that first.
There are so many cases where we don't actually know for certain all the parameters involved, but the conventional approach is still to round the probability either up to 100% or down to 0%. Simply because that's easier in conventional programming languages. As a result, you might not see these events as having probability distributions, but they do.
The point is not efficiency, but eloquence, I guess. From the website:
>> However, many of the most innovative and useful probabilistic models published by the AI, machine learning, and statistics community far outstrip the representational capacity of graphical models and associated inference techniques.
>> PROBABILISTIC PROGRAMMING LANGUAGES aim to close this representational gap, unifying general purpose programming with probabilistic modeling;
I see it as giving the tools to the community to describe their models and automate inference over them in a unified manner that can be communicated more easily, and in a way that is better understood by all.
It's not just about making inference faster. Most of the PGM libraries/DSLs I know of are kinda clunky and inflexible. This is all about answering questions about how to make it easier and more powerful, not only faster.
Unless there is scope for optimization, it should, by necessity, be that the representations are essentially the same, thus making it essentially a problem of parsing.
It's still not entirely clear to me how important this work is; though heavy weights like J. Tenenbaum and others continue to work on it.
The page says that many models can't be subsumed under PGMs, and yes that is true for
things like PPCGs and other recursive things (martingales, stochastic processes..).
However, things that PPLs are known for like the inverse graphics work, are really PGMs. It's entirely possible that what I'm asking is akin to questioning the significance of the Deep Neural networks and assorted frameworks, in contrast to chain rule; but considering that it is more than getting X% at Imagenet here, I think it is a reasonable question to ask. Is it about the representation or the implementation ?
The purported aim is to allow machine learning code that today requires 1000-10,000 lines of code to be written in 10-100 lines.
This works brilliantly for single shot learning. Let's say you are trying to teach a computer to recognize a handwritten character after a single example. First, you build a generative model that follows how letters are constructed: hand touches paper and makes a primitive shape (line, curve, loop, etc.). Multiple such primitives are strung together to form a character. For each example characters, infer the most likely sequence of such primitives. When a new classification is requested, take samples from the generative model for each known character. Calculate the difference in pixel value, and do this hundreds of thousands of times. You can now construct an accurate marginal probability for each known character while only needing a single example.
Powerful stuff!
[1] efficiency not guaranteed..(yet)
Edit: Here's a few curated resources: http://webppl.org/ http://www.robots.ox.ac.uk/~fwood/anglican/ http://mc-stan.org/ http://dippl.org/ https://probmods.org/