Can someone in the know please comment about how this is useful ? I mean yes thi...

BenoitEssiambre · on Oct 23, 2016

In my humble opinion something like this will be at the root of the solution to general AI.

Deep learning may have the edge currently for being a bit more mathematically tractable and much easier to massively parallelize, but this seems to me like a more fundamental foundation for AI (there remain issues to be solved with it though).

These languages are used to describe models and run probabilistic simulations of them but can also be used to describe other programming languages probabilisticly. This means the potential to go up one leve of abstraction to a probabilistic program that writes other programs to model and predict the world.

Here's a description of one of my failed attempts at this:

https://www.quora.com/What-deep-learning-ideas-have-you-trie...

botexpert · on Oct 23, 2016

this stuff existed for decades and it hit a wall.

sampling is slow.

general AI needs structured prediction. yes, graphical models can do that (HMM, CRF etc.) but inference starts to get slow and special case implementations are required for different domains. [1]

I see no way for someone getting automatic inference and training for [1] with a probabilistic programming language.

given the new deepmind paper on discovering shortest path algorithms, it's quite clear that structured predicition assisted by deep networks works quite well (this was demonstrated by a vast array of work) and graphical models represented by probabilistic programming languages are far away from being that successful.

[1]: http://www.philkr.net/home/densecrf

twiecki · on Oct 24, 2016

I don't think it has to be either or. For example, you can implement a Deep Net in a probabilistic programming framework (http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learn...) like PyMC3. The inference here is not using sampling but rather ADVI (http://pymc-devs.github.io/pymc3/api.html#advi) which is almost as general but much much faster, and can be run on sub-samples of the data (similar to stochastic gradient descent used in deep learning). Once we bridge these two domains we can get the best of both worlds, like a deep net HMM.

eli_gottlieb · on Oct 23, 2016

>this stuff existed for decades and it hit a wall.

>sampling is slow.

Right. Sampling is slow. That's why automated variational inference is a very active field of research these days: instead of approximating the posterior by sampling, you approximate it with an optimization problem whose gradient-descent provides a bound on the posterior probabilities across the parameter space.

All the work on training deep neural networks has made our hardware and software very efficient at solving optimization problems.

kqr · on Oct 23, 2016

If this site is talking about what I think it is, then "regular" programming languages are a special case of this where the probability distribution of each value is the trivial case.

Only based on that, this will be useful for anything regular programming is useful.

In addition, this makes modeling uncertainty much easier (in the sense of "closer at hand"). That may allow for new ways of dealing with user input. Instead of saying "is this email address valid or invalid", we can start asking questions like "Is the probability of this email being valid larger than 99%? Then we'll accept it. Is it larger than 95%? Then we'll ask the user to confirm it. Is it less than 95%? Then we'll tell the user it's incorrect and have them retype it."

These are things we normally don't care to model because it would require lots of additional machinery. With that machinery built into the programming language, it is much easier to reach for, potentially with a better user experience to boot.

gnipgnip · on Oct 23, 2016

Yes, this is precisely what is called probabilistic inference.

I was trying to find out what more it can do other than parsing a DSL into a graphical model.

YeGoblynQueenne · on Oct 23, 2016

The first probabilistic language I came across was PRISM [1] (I made a post about it earlier today). It's a probabilistic _logic_ programming language so you're basically declaring your Prolog facts with probabilities attached and then run your program as a simulation, drawing samples from a distribution over variable bindings.

I see it as having a database of facts about the world with attached probabilities that tell you which view over (or perhaps version of) the world is the most likely.

And then you can do EM search for optimal parameters. Learning, right? It's all built-in to the language and you don't need to hand-craft task-specific versions depending on your domain (like Baum-Welch, Inside-Outside etc).

Also, it's a probabilistic Prolog: it's Turing complete and gives you all the expressive power of first-order predicate calculus. With probabilities. And learning of parameters from data.

Languages like this go way, way beyond ad-hoc implementations of inference over graphical models, to giving us a new vocabulary to express reasoning over vast sets of data.

__________

[1] http://rjida.meijo-u.ac.jp/prism/

platz · on Oct 23, 2016

> graphical models typically serve as coarse, high-level descriptions, eliding critical aspects such as fine-grained independence, abstraction and recursion.

Maybe you're not impressed with graphical models due to your experience dealing with them according to the points above.

by "writing code that generates a sample from the joint distribution", you can achieve much better modelling and control than what it would take with the normal methods of producing graphical models.

kqr · on Oct 23, 2016

Plenty of things! The real world is often best modeled by a probabilistic model; you can rarely state facts in definite terms. The probability the bus arrives on time may be 70%. The probability that a particular host is up may be 92%. The probability of needing a reboot within the next week is 2%.

The probability that the user switched off Wifi on their machine is 4%, and the probability that their router is having trouble is 3%. These numbers can be based on actual measurements. What do we tell the user when they have problems connecting? Instead of just saying "it's either this or that" we can run the numbers, and perhaps in aggregate there's an overwhelming probability it's a particular event, in which case we suggest that first.

There are so many cases where we don't actually know for certain all the parameters involved, but the conventional approach is still to round the probability either up to 100% or down to 0%. Simply because that's easier in conventional programming languages. As a result, you might not see these events as having probability distributions, but they do.

gnipgnip · on Oct 23, 2016

https://en.wikipedia.org/wiki/Graphical_model

YeGoblynQueenne · on Oct 23, 2016

Hey, come on- give an explanation. Don't just flip people the link, innit.

arnorhs · on Oct 23, 2016

wouldn't you be able to type in a correct email with a probability of 0.94 of it being correct?

YeGoblynQueenne · on Oct 23, 2016

The point is not efficiency, but eloquence, I guess. From the website:

>> However, many of the most innovative and useful probabilistic models published by the AI, machine learning, and statistics community far outstrip the representational capacity of graphical models and associated inference techniques.

>> PROBABILISTIC PROGRAMMING LANGUAGES aim to close this representational gap, unifying general purpose programming with probabilistic modeling;

I see it as giving the tools to the community to describe their models and automate inference over them in a unified manner that can be communicated more easily, and in a way that is better understood by all.

imh · on Oct 23, 2016

It's not just about making inference faster. Most of the PGM libraries/DSLs I know of are kinda clunky and inflexible. This is all about answering questions about how to make it easier and more powerful, not only faster.

gnipgnip · on Oct 23, 2016

Unless there is scope for optimization, it should, by necessity, be that the representations are essentially the same, thus making it essentially a problem of parsing.

The research listed in the page (http://probabilistic-programming.org/research/) appears to take the following courses,

Stochastic processes-ish,

- Inference techniques for handling recursion.

PGM-related work,

- Parallelization

- Optimizations for MCMC based on structure

(Old school) Theoretical CS,

- Formalisms reminiscent of Languages.

It's still not entirely clear to me how important this work is; though heavy weights like J. Tenenbaum and others continue to work on it.

The page says that many models can't be subsumed under PGMs, and yes that is true for things like PPCGs and other recursive things (martingales, stochastic processes..).

However, things that PPLs are known for like the inverse graphics work, are really PGMs. It's entirely possible that what I'm asking is akin to questioning the significance of the Deep Neural networks and assorted frameworks, in contrast to chain rule; but considering that it is more than getting X% at Imagenet here, I think it is a reasonable question to ask. Is it about the representation or the implementation ?

gnipgnip · on Oct 23, 2016

- It appears there are indeed things like dataflow analysis/ SMT that can achieve better performance over current techniques.

- Inference on loops seems to be something that can be handled as well (dynamically ?).