# Why probabilistic modelling is more than playing with random dice

### I am often presented with the challenge of communicating the concept of probabilistic modelling and why it is important. In one sense, the title is unfortunate in that the term “probable” suggests the technique isn’t exact.

One of the common criticisms leveled at the technique is that it is used because something is “unknown”. People think of random simulations that would be used for online gambling, and feeling uneasy with using it in general science thinking they are replacing hard data and calculations with random events. Adding to this, at school probability theory is one of those love it or hate it subjects in mathematics, so the prejudice is somewhat understandable.

Probabilistic modelling, or Monte Carlo simulation, can be precisely what you need when there is a lot of data known about a system. Calculations in science involve putting numbers together that represent physical quantities. Before scaring anyone off with abstract statistical theory, let’s take a concrete example. Suppose we want to estimate the vitamin A intake in France from the consumption of liver (vitamin A is a substance that has both a Recommended Daily Amount (RDA) and an upper limit on intake, because it is toxic at large doses). To calculate intake of vitamin A from eating liver, you need to be aware of two things; the size of the portion of liver and the level of vitamin A inside it. But what is the size of a typical portion of liver? And what is the typical concentration of vitamin A in a portion of liver?

Ideally, we measure the size of the liver portion consumed and the level of vitamin A inside it, for every piece of liver consumed by each person in France. Not very practical I’m sure you’ll agree. In practice, what typically happens is that one study is performed on liver portion sizes, and another on vitamin A levels in liver. Both studies give rise to two sets of data, each describing how much the physical quantities vary. So how do we use them together?

We could take the averages certainly, but that’s not making full use of the data. For people who consume smaller portions of liver with lower levels of vitamin A, we will overestimate their intake. For people who consume larger portions of liver with larger levels of vitamin A we will underestimate intake. So what do we do? Use all the data. How do we use it all? Probabilistic modelling! This will enable you to assess the full variability in the intake, because you can keep all the variation in the inputs and calculate all the variation in the outputs. You don’t get one answer, but that’s because in reality there isn’t one. You can of course, use statistics to summarise your answer, but the point is that we now have a much better handle on all the variation in the answer.

It is also true that probabilistic modelling can be used where information is not fully coherent. It can be used to quantify expert judgement or to model approximate estimates of data, but that’s another discussion. The point is that this is all in an effort to accurately reflect reality. Everything in the real world is intrinsically variable, and probabilistic modelling is the tool of choice to reflect this. I often lament the fact it wasn’t called “variability modelling” instead, that would have helped make things a lot clearer!