Foundations of prediction markets

Deep dive into philosophical and mathematical foundations

This section is more mathematical and may be skipped by most readers. It

will be required for deep understanding of why Contro's prediction markets are better than previous approaches, in particular order-book based markets
defines the goal of a prediction market to be estimating a quantity defined here, and is therefore required for real understanding.

Consider an affair X in the future that will turn out to yield one of two outcomes, A or B. The goal of a binary option prediction market is to find out the probability of A versus B happening in the future. In mathematical terms, X is called a Bernoulli random variable and described by a single probability of A happening,

P(X\!\!= \!\!A) = 1 - P(X\!\!=\!\!B),

where we use the fact that the probabilities of either event happening must be equal to one such that the probability of B is is the remainder of the probability of A.

Interestingly, all (binary) affairs in principle admit such a probability. However, it may be very difficult to know its value. Often, this would require not only a very complicated model description of the situation at hand but also knowledge of certain data that might be hard to obtain.

To illustrate the point, let us start with a simple example. Consider the game of tossing a coin 13 times, together with the two outcomes of A = "head wins" and B = "tail wins". Here, winning means the coin has shown the respective side more times than it has shown the other side (in other words, whoever reaches a score of 7 throws first wins). At any moment, given the current number of heads and tails, we can compute the probability of A winning exactly. For example, in the beginning, before any throw, and assuming the coin is fair, it is 50%. If head comes out at the first throw, it is 61.3%, if it comes out two more times it is 82.8%. If tail is now thrown 3 times in a row so that the score is 3:3, it is back at 50%; and so on (for those very mathematical and curious readers: the closed-form expression of the probability involves the hypergeometric function and is not pretty enough to print here).

We can already tell from this example that a typical situation is that new data comes in and changes the probability as time passes. Let us extend our notation to capture this idea. We make the data D a function of time,

D = D(t).

Furthermore, in mathematics, there is the idea of conditioning a random variable X on information, usually denoted using a vertical bar. We consider all information to be included into D(t) and write

P_A(t) = P(X\!\!= \!\!A|D(t))

for the current probability of A happening given the current data at time t.

Live prediction markets are, unlike e.g. sports books, supposed to estimate $P_A(t).$

A real-world tennis match, for example, is similar in spirit being a game with two possible winners, yet is of course more complicated. It is harder to know a good model of the players, the venue and other influencing factors that can compute the probability and it is harder to collect all useful information to feed into this model (which would for example include the mental and physiological state of the players). Furthere, there are lots of sources of randomness. (Physicists would call this noise. This ultimately can come from quantum mechanics or chaos in the sense that smallest disturbances that we are surrounded by may amplify.). However, simple facts typically still emerge out of messy details. For example, the clean thermodynamic laws (such as the relation of pressure, temperature and volume of a simple gas) emerge out of the very untraceable dynamics of countless molecules colliding with the walls of the enclosure, and it does not matter at all in practice what exactly these molecules are doing. Because averaging over many small disturbances often produces something tractable, there is a whole branch of physics devoted to the extraction of simple laws in ever more complicated situations.

Does a true probability of A winning a tennis match still exist, as we have claimed before? In principle at least, one may follow an approach that is common in science and push all uncertainty (randomness and even uncertainty about the modelling) into parameters that are unknown and itself random. In such analysis, one has to make assumptions about how likely different values of these parameters are, called "prior knowledge". Once this is decided, all this uncertainty does is, when taken fully into account, to modify the precise value of the true probability. To be fully honest, though, it may just not be realistic to know which prior is the right one. This can be handelled by repeating the analysis for different choices and checking how much the results change. This will generate a bunch of values for the probability we are after. If anything can be said at all, one would expect that these values are clustered around a value, with the truth lying somewhere in this cluster.

In conclusion, we can say that it generally possible (but potentially quite difficult) to obtain precise estimates of the true probability of A happening as a function of time. To this end, the complete relevant information should be taken into account. The difficulties in modelling may however imply that in practice one can only obtain a range of possible answers. This has the effect of introducing an uncertainty in the probability itself.

Contro is the first prediction market that allows traders to not only communicate their predictions in terms of a probability, but also their uncertainty.

PreviousWhat are prediction markets?NextHow do prediction markets work?

Last updated 3 years ago