QI philosophy

I think about the blog Quantifying Information as an expression of my curiosity about complexity and uncertainty occurring in the world. When trying to make sense of real world phenomena under uncertainty, I think we often do not get new information about some quantity of interest directly. Instead, we only get new information on quantities that are related to it. Hence, it is on us, to take this related information and draw the right conclusions from it. This process is what I call quantifying information, since I thereby try to stick to the following five pillars:

1 quantification

In my opinion, the best way to eliminate subjectivity as far as possible is by translating information into quantified expressions. Loosely speaking, it is much easier and more precise to compare two given days with respect to their measured temperature, than by whether the days’ temperatures have been described as “nice” or “hot” respectively.

2 data

It is a very natural thing for human beings to learn from past actions and experiences. Each experience will give us additional knowledge and additional insights into the interrelations of the world. However, in some situations individual people will only be able to gather a very limited number of experiences and observations. And these small “samples” in turn can cause imperfect views about the world. For example, take a look at the diagnosis of a very rarely appearing tropical disease. Let’s say doctor A never had any patient suffering from this disease. Therefore, he will usually tend to underestimate its probability of occurrence, or – worst case – will even not be aware of the existence of the disease at all. Assuming that doctor B, however, has already encountered two patients suffering from the disease, he will usually end up testing future patients for the disease disproportionately often. Only sharing and interchanging their data about historic disease occurrences will reduce these biases. Well, forestalling the outcry of your inner data privacy activist: of course, “with great power comes great responsibility” (to use the words of Uncle Ben). Nevertheless: with large data comes great power!

3 models

The more data, the better. So far, so good. But what to do in situations with only limited data to our hands? Well, there is a basic substitution that we can always engage in: replacing data with assumptions. This is a dangerous step that we basically will have to face in every situation – although to varying extent. For example, assuming that each side of a fair die will occur with probability 1/6 seems to be a much more innocent assumption than claiming that people are always rational and interested in their own well-being exclusively. And relying on wrong assumptions in turn will – to some degree – always result in misinterpretations of your data. That being said, in many situations it still will be the best – or only – way to draw conclusions about the real world. There is a simple logic for that: the further the probability of occurrence of an event deviates from the “fair” case of 50%, the more data we will need to estimate this probability with precision. Hence, for very unlikely events, it will be much more efficient to derive the probability from a more easily estimated event through the help of assumptions, instead of estimating it directly. For example, assume that we want to determine the probability of observing a “6” for all of 100 throws of a die. This could be easily derived from the probability of getting a “6” for just one die, through incorporation of the additional assumption of independence for die rolls: \left( \frac{1}{6} \right)^{100}\approx 1.5\cdot 10^{-78}. It would require a lot of sampling from a dice to be able to directly pin down such a small probability! Concluding, you can think of models as abstraction of the real world, trying to find the right balance between data and assumptions.

4 updating

Now to the fun part: predictions! Given a stochastic quantity, the most information that one can have is its distribution: which outcomes occur with which probability. Transferred to a fair coin, this means that the best information is to know that both sides of the coin come with equal probability. Trying to predict the realization of a coin-toss upfront is nonsense, as the event is completely random. However, do not make the mistake to think that we are unknowing, only because we are not able to predict the exact realization. Knowing the correct probabilities of occurrence for each possible outcome (heads, tails) is in fact a lot! Just think about taking bets with a counterpart who thinks that heads appears with probability of 70%. In the long run, you would definitely rip him off on this one! So the point is not necessarily to predict the exact realization of something, but to be able to predict the distribution as best as possible. And this turns out to be anything else but a static task! Conditional on the amount of information that we have, our knowledge about the distribution of some quantity will change. As an example, let’s say we are interested in body heights of people. Therefore we line up all people in the world, ordered by their height. This will show us the distribution of body height: some observations will be around 60 cm, which are newborn babies. A large number of observations will be between 160 cm and 200 cm, which will be mostly adults. And very few people will exceed 220 cm. Now if you randomly pick 10000 people of the world population, and you repeat this procedure, your relative frequencies will basically stay the same: some babies, many adults, and still only few exceedances of 220 cm. Now think about what happens if we randomly pick just one person. The same probabilities will apply here as well: most likely, the person will be between 170 cm and 200 cm, although he / she also could be a baby, so that there still is a chance of a body height of less than 100 cm, and so forth. Hence, given no additional information, our assumption about the randomly picked person will match the overall distribution of body heights of the world population, which is called the unconditional distribution. Let’s now include some additional information. Let’s assume, we know that the randomly chosen person will be female. This now allows updating of our original belief about the distribution. As females on average are smaller than male people, we can decrease the probability of extremely large body heights, while we increase the probability of moderate sizes. Or, resulting in even more dramatic updates, let’s assume that we are told that the randomly chosen female is 1 year old. This immediately excludes all sizes above a certain body height, while we still are not able to tell whether we have a rather large or rather small baby. Again, we are able to update the distribution, although we still are not able to come up with an exact point predictor of what the body height will be. So to speak, we do not know the height of the randomly chosen person, until we are told. And as long as we are told only other, not perfectly related quantities, like gender, age, race and so forth, we are only able to update the distribution of height. This, however, will generally not resolve the uncertainty about height completely, but it will update it.

You can think about updating as getting additional slices of information about your target variable (in the example: height), or like Sherlock Holmes would probably put it: getting additional pieces to the puzzle. The task is finding out what additional information the pieces really contain, and correctly reasoning forward in order to draw the right conclusions about the target variable. And that’s exactly where you will need to be the next Sherlock Holmes!

As it turns out, there are situations where updating will be rather straightforward, at least qualitatively. You do not need to be the next Einstein to comprehend relations with well-behaved patterns: “the more …, the more …”, “the less …, the more …”, etc. However, as soon as the updating involves certain non-linear characteristics, the task becomes tricky, and sometimes even highly counter-intuitive. For one of the most famous examples in this area just take a look at the Monty-Hall problem.

5 visualization

A picture is worth a thousand words – what else is there to say?!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: