It is Alan Turing Institute Workshop season, and there was a recent interesting workshop on data science for data rich sports. Of particular interest was looking at statistics for elite athletes. There are some statistical difficulties in working with individuals who are by definition statistical outliers. In the end it seems the Olympic teams tend to rely on a mix of individual longitudinal data, and models from other elite athletes in the same sport. For example, what is the effect of different forms of training on an individual athletes? How should we choose the order for cyclic teams? There is no doubt this is an area where it is vital to be careful about conclusions, and humble about the capability (or otherwise) of the statistical methods that are developed.
We are organising a second Edinburgh Deep Learning Workshop on June 9 2015. The first was very well attended,and this time round we have a number of invited speakers including Rich Caruana, Neil Lawrence and Phil Blunsom. One thing we would love to explore more about is understanding deep learning methods as models rather than just black box devices. Altogether there will be many interesting talks and discussions on both methods and applications.
Bayesian Decision Theory has long been an important part of Bayesian reasoning. Rather than stopping at inference, which is a descriptive result, it provides a basis for action, which is a prescriptive result. Yet, expected utility theory, which has been the dominant formalism for decision theory, has always seemed on shakier ground than the inferential process that sits behind it. Much of this relates to questioning the use of utilities to capture preference relations. Utilities are peculiar in a number of ways: they are a subjective concept, with no unit of measurement and they are not commensurate: one persons utility does not relate to another. There is nothing wrong with this, but it does emphasise that utilities are really a proxy – a proxy for capturing preference relations, and not just that, but preference relations that hold, under rationality assumption, when utilities are expected over. That they can is down to the power of the Von-Neumann-Morgenstein utility theorem. And it is very powerful. It is the power of that axiomatic approach and theorem that means expected utility has the place it has today.
However that doesn’t mean others have not tried to work with other formalisms. In finance, the ideas of risk measures are commonly used, partly for their mathematical convenience: risk measures have a unit of measurement (e.g. money) and hence the associated risk measures can be compared across players. This means some games defined using risk measures take the form of potential games. And potential games are much easier to solve.
The problem with utilities comes from knowing what form they should take. And for real valued quantities, there are issues that turn up. To prevent the St Petersburg paradox, utilities need to be convex. To prevent the most general forms of exchange paradox (or two envelope problem), utilities need to be bounded.
Recently Ole Peters has been commenting on the need (or rather the lack of need) for non-linear utilities. His view is that our take on using utilities is because we focus on an isolated action, rather than the full temporal decision process. By thinking about rates in decision processes we no longer need to posit the idea of subjective non-linear utilities. Instead, what we view as a utility is rather our model for how are actions affect growth rates. It is an interesting observation, and maybe makes ground for the idea of utilities being a little less solipsist than they currently are.
In teaching probabilistic modelling in previous years, I took a fairly traditional (and chronological) route. I first motivated and taught Bayesian networks, and D-Separation, then undirected graphical models and U-Separation. But then we need to do inference, and so it is useful to convert things to factor graphs. It works, but as I taught, it sounded like lots of different rules for different types of graphs, and it took some effort to maintain a unified feel to the whole subject.
This year, I decided to start with factor graphs, as we would need to use them for inference anyway, and they are the most general form of the three. I could teach one Separation rule, and then I could introduce directed and undirected models as special cases. Did it work? In some way yes. The introduction of inference could be done fairly early and seamlessly. But in other ways it did not quite work.
The reason is simple. For directed graphs to really be a special case of factor graphs, we have to introduce directed edges in the factors a.k.a Frey UAI2003. This is a fine idea, and nicely augments the factor graph representation. But it does complicate things, and specifically it is possible to define mixed or directed graphs that look like factor graphs that cannot correspond to any real distribution (there is a requirement for factors to be included to correspond with normalisation functions for conditional distributions). Most rules for ensuring closure are not simple. Suddenly what seems like a neat way of introducing graphical models to a new audience has become fairly complicated to explain…
The different representational approaches for graphical models, are even more varied than this. Structural equation models are another representational form, which can be understood in terms of representing distributions as the fixed points of a deterministic dynamical system under noise injections. Bayesian Networks can be represented as structural equation models, but structural equation models extend Bayesian Networks to directed cyclic models. However structural equation models fall foul when there is not a unique fixed point to the underlying dynamical system.
Other powerful additions (e.g. Gates) can also be added to our models. So are we doomed to explaining lots of different forms of probabilistic graphical model, and the intimate connections between them? Or is there any hope for a simple but versatile Grand Unified Graphical Model? Many researchers have tried. Perhaps someone has already succeeded. Or perhaps any unification would really end up being a synthesis, that is simply as complicated as the union of the parts. All I know is that if anyone does come up with something I am going to have to rewrite my lectures yet again…
Frey 2003 Extending Factor Graphs so as to Unify Directed and Undirected Graphical Models. Proceedings UAI
I very much enjoyed the short project on the game of Go that Chris Clark and I were involved in over the summer. And it was fun to see Martin Mueller do some analysis of the play of the convolutional network:
As it is we have lots of ideas as to how to move on with this project. As the same time the Toronto/DeepMind Collaboration will have their ideas. It will be interesting to see if the two groups come up with different or similar things.