# Real world quantitative risk management: estimation risk and model risk

Often, when we learn about a certain quantitative model, we solely focus on deriving quantities that are implied by the model. For example, when there is a model of the insurance policy liabilities of an insurance company, usually the only issue is the derivation of the risk that is implied by the model. Implicitly, however, this is nothing else than the answer to the question: “Given that the model is correct, what is it that we can deduce from it?” One very crucial point in quantitative modeling, however, is that in reality, our model will never be completely correct!

This should not mean that quantitative models are useless. As George E. P. Box has put it: “Essentially, all models are wrong, but some are useful.” Deducing properties from wrong models, like for example value-at-risk, may still help us to get a quite good notion about the risks involved. In our measurement of risk, however, we need to take into account one additional component of risk: the risk that our model is wrong, which would make any results deducted from the model wrong as well. This risk arises from the fact that we do not have full information about the true underlying model. Ultimately, we can distinguish between three different levels of information.

First, we could know the true underlying model, and hence the true source of randomness. This way, exact realizations from the model are still random, but we know the true probabilities, and hence can perfectly estimate the risk involved. This is how the world sometimes appears to be in textbooks, when we only focus on the derivation of properties from the model, without ever reflecting about whether the model really is true. In this setting, risk assessment simply means the computation of certain quantities from a given model.

On the next level, with slightly less information, we could still know the true structure of the underlying model, while the exact parameters of the model now remain unobserved. Hence, the parameters need to be estimated from data, and the exact probabilities implied by the true underlying model will not be known. This way, we will not be able to manage risks exactly the way that we would like to, as we are forced to act on estimated probabilities instead of true ones. In turn, this additionally introduces estimation risk into the risk assessment. The actual size of estimation risk depends on the quality of data, and in principle could be eliminated through infinite sample size.

And, finally, there is the level of information that we encounter in reality: neither the structure of the model nor the parameters are known. Here, large sample sizes might help us to detect the deficiencies of our model. Still, however, we will never be able to retrieve the real true data generating process, so that we have to face an additional source of risk: model risk.

Hence, quantitative risk management is much more than just calculating some quantities from a given model. It is also about finding realistic models, assessing the uncertainty associated with unknown parameters, and validating the model structure with data. We now want to clarify the consequences of risk management under incomplete information about the underlying probabilities. Therefore, we will look at an exemplary risk management application, where we gradually loosen the assumptions towards a more realistic setting. The general setup will be such that we want to quantify the risk of an insurance company under the following simplifying assumptions:

- fix number of policy holders
- deterministic claim size
- given premium payments

## Table of Contents

## 1 Risk under known loss event probabilities

In this first part, we want to assume a setting where we perfectly know all true underlying probabilities. Therefore, we additionally assume

- a known loss event probability
- independent loss events

### 1.1 Deriving risk of insolvency

We start by simply deriving risk in a setting with given capital level. Thereby, we assume that the insurance company has no assets besides the premium payments that it gets from its insurance business. In this setting, we now want to calculate the probability of insolvency.

To be at least somewhat realistic, we choose a large number of policyholders, such that any calculations “by hand” will not be able anymore. Hence, all computations will have to be done with statistical software, and we will make use of the free and open-source software R here.

As we have a setting of independent policy holders with equal loss event probability , the number of overall events follows a Binomial distribution. To better understand the implications of the number of policy holders in this setting, we first plot the domain of possible outcomes for varying numbers of policy holders. Additionally, we also include lines for the 0.2 and 99.8 percent quantiles, and the expected number of loss events.

library(ggplot2) rm(list=ls()) ## specify number of persons pers <- seq(10, 750, 20) ## specify probability p <- 0.1 ## expected values expVals <- pers*p ## upper and lower quantiles alphaVal <- 0.002 upQuants <- qbinom(1-alphaVal, pers, prob=p) lowQuants <- qbinom(alphaVal, pers, prob=p) ## get values for plot increasingPersons <- data.frame(upperBound = pers, expectation = expVals, upperQuantile = upQuants, lowerQuantile = lowQuants, lowerBound = rep(0, length(pers))) pp1 <- ggplot(increasingPersons) + geom_line(aes(x=pers, y=expVals), color="red") + geom_line(aes(x=pers, y=upQuants), color="blue") + geom_line(aes(x=pers, y=lowQuants), color="blue") + geom_line(aes(x=pers, y=lowerBound), linetype="dashed") + geom_line(aes(x=pers, y=pers), linetype="dashed") + xlab("number of persons") + ylab("domain and quantiles of loss events")

As we can see in the figure, with increasing number of policy holders, the variation of the distribution decreases relative to the domain size. In our setting, we will assume 1000 persons. Hence, the bulk of the wealth distribution will already center rather close to its expected value.

Let’s now lay down all the settings of the example, and determine a realistic value for premium payments. In order to get a first impression of the problem set that we have created, we will then plot the associated end of period wealth. As already mentioned, we will treat the insurance company as if it was founded just now, so that the only asset existing besides the premium payments of the underwriting business is the capital provided by the owners. Hence, the end of period wealth consists of profits (losses) from policy holders, plus the originally submitted capital:

###################### ## set up workspace ## ###################### library(ggplot2) ## source multiplot function source("rlib/multiplot.R") ###################################### ## determine probability of default ## ###################################### ## specify settings nPersons <- 1000 claimSize <- 20 p <- 0.1 capital <- 200 ######--------------------------------------------------------------- ## come up with realistic premium payments: ######--------------------------------------------------------------- ## determine expected loss per person expExpenditures <- p*claimSize ## determine premium payment slightly above premiums <- expExpenditures*1.05 ######--------------------------------------------------------------- ## determine wealth distribution ######--------------------------------------------------------------- ## get possible expenditures, sorted from worst to best case possibleNumberOccurrences <- rev(0:nPersons) possibleLosses <- possibleNumberOccurrences*claimSize ## get possible profits profits <- nPersons*premiums - possibleLosses ## get possible wealth outcomes wealth <- profits + capital ## get associated probabilities for losses binomProbs <- dbinom(rev(0:nPersons), size = nPersons, prob = p) ## wealth distribution wealthDistr <- data.frame(xVals = wealth, pVals = binomProbs) ## cdf of wealth distribution wealthCdf <- data.frame(xVals = wealth, pVals = cumsum(binomProbs)) ######--------------------------------------------------------------- ## plot wealth distribution ######--------------------------------------------------------------- pp1 <- ggplot(wealthDistr) + geom_bar(aes(x=xVals, y=pVals), width=claimSize, stat="identity")+ xlab("wealth") + ylab("probability") + labs(title = "pdf") pp2 <- ggplot(wealthCdf) + geom_step(aes(x=xVals, y=pVals), width=claimSize, stat="identity") + xlab("wealth") + ylab("probability") + labs(title = "cdf") ## zoom pp3 <- pp1 + scale_x_continuous(limits = c(-500, 1500)) pp4 <- pp2 + scale_x_continuous(limits = c(-500, 1500)) multiplot(pp1, pp2, pp3, pp4, cols = 2)

In theory, of course, all values within the domain could occur: even values as small as -17600 are possible. However, the probability of these extreme events is already hardly visible even under large magnification. This will be shown in the next plot, where we zoom in on outcomes with negative wealth, and even rescale the axis.

######--------------------------------------------------------------- ## plot outcomes with negative wealth and rescaled probability axis ######--------------------------------------------------------------- ## get outcomes with negative wealth insolvencyOutcomes <- wealthDistr[wealthDistr$xVals < 0, ] ## plot outcomes pp1 <- ggplot(insolvencyOutcomes) + geom_step(aes(x=xVals, y=pVals), width=claimSize, stat="identity")+ xlab("wealth") + ylab("probability") + labs(title = "pdf") ## rescale probabilities rescaledProbs <- insolvencyOutcomes nthRoot <- 20 rescaledProbs$pVals <- rescaledProbs$pVals^(1/nthRoot) ## handle axis breaks oldBreaks <- seq(0, 0.004, 0.001) newBreaks <- oldBreaks^(1/nthRoot) pp2 <- ggplot(rescaledProbs) + geom_step(aes(x=xVals, y= pVals), width=claimSize, stat="identity") + xlab("wealth") + ylab("probability") + scale_y_continuous(breaks = newBreaks, labels = c(paste(oldBreaks))) + labs(title="wealth distribution with rescaled probabilities")

Now that all settings have been outlined, our first target is to measure the risk of insolvency:

Hence, we need to sum up the probabilities of all wealth outcomes with negative values. Note, that our definition of insolvency does not include the outcome of zero wealth, but insolvency only starts with strictly negative values. Visually, the probability of default is easily achieved, given that we have the cumulative distribution function of wealth . Due to

we simply need to evaluate the cumulative distribution function at a value infinitesimally smaller than zero:

The exact computation of the risk of insolvency is straightforward, too. Once we have determined the distribution of the discrete variable , we simply need to add up all the probabilities of outcomes with negative wealth. Decomposed into smaller steps, this amounts to

- identifying all negative outcomes
- extracting the associated probabilities
- summing up the probabilities

######--------------------------------------------------------------- ## plot cumulative distribution function ######--------------------------------------------------------------- ## get probability of default pd <- sum(wealthDistr$pVals[wealthDistr$xVals < 0]) ## get cumulative distribution function cumWealthDistr <- data.frame(xVals = wealthDistr$xVals, pVals = cumsum(wealthDistr$pVals)) ## get data for auxiliary lines: for simplicity, auxiliary lines start ## in zero, instead of infinitesimally to left hLine <- data.frame(xVals = c(-400, 0), yVals = c(pd, pd)) vLine <- data.frame(xVals = c(0, 0), yVals = c(0, pd)) pp3 <- ggplot(cumWealthDistr) + geom_step(aes(x=xVals, y=pVals)) + scale_x_continuous(limits = c(-400, 25)) + scale_y_continuous(limits = c(0, 0.2)) + geom_line(aes(x=xVals, y=yVals), data=hLine, color="blue", linetype = "longdash") + geom_line(aes(x=xVals, y=yVals), data=vLine, color="blue", linetype = "longdash") + xlab("wealth") + ylab("cumulated probability") + labs(title="determining probability of losses") (pd)

As computed result, we get a probability of default of

[1] 0.05345008

### 1.2 Adjusting risk: value-at-risk

So far, we simply did take a pre-set capital level as given, and did only bother with measuring the associated risk. Now, we want to make this setup a bit more realistic, so that we allow ourselves to modify the company’s capital structure. This way, we can change the default probability to match our own risk appetite. As target, we set a limit for the default probability of 2 percent. Keep in mind, however, that we still assume that the true source of randomness completely matches our model, such that all probabilities are completely known.

To determine the associated capital level, we need to relate the problem to the actual underlying source of risk: the stochastic number of loss events of policy holders.

Through application of the inverse cumulative distribution function , this can be solved for :

As capital is costly, we are interested in the smallest possible value that still provides the demanded security level. Hence, the inequality becomes an equality:

The capital level that we get this way is also known as value-at-risk. It measures the level of capital required to keep the probability of default beneath a given level. You can also think about it as a threshold that is not exceeded by your loss distribution with a given probability.

## specify confidence level alpha <- 0.98 ## get quantile for loss event distribution quantLossEvents <- qbinom(alpha, size = nPersons, prob = p) ## get value-at-risk (valRisk <- quantLossEvents*claimSize - nPersons*premiums)

[1] 300

## 2 Estimation risk

In our effort of replicating the intricacies of real word risk management, we will now further increase the uncertainty involved in our exemplary setting. So far, the only uncertainty was about the actual realization of , which was unknown. Still, however, precise probabilistic statements could be derived, since the distribution of was perfectly known: it did follow a Binomial distribution with known parameter .

Now, we will skip this assumption, such that the probability of individual loss events is not known anymore. This way, we will be forced to first come up with an estimate of , which will bring additional imprecision into the measurement of risk. As we will see, the more data that we have at our hands, the smaller the imprecision will be.

For a Bernoulli distributed random variable, there is a quite intuitive and reasonable way of estimating : dividing the number of occurrences through the number of repetitions. For example, to determine the probability of heads for a coin, simply toss the coin several times, and record the number of heads. Then, if you have 544 heads per 1000 tosses, your estimate of would be

More generally, if denotes the number of repetitions, and the number of occurrences, your strategy to derive an estimate for the unknown value of would be calculating

Such a strategy to come up with an estimate is called estimator, and it will be denoted by in the following. It is important to note that the actual estimate you get using some specific estimator always depends on the sample that you apply it to. Hence, if your sample consists of 474 times heads, your estimate would be

while for 512 times heads you get a value of

For any given true value of , sampling involves randomness, and multiple samples drawn will generally differ. Therefore, for any given estimator, the estimate itself is a random variable. Only when the sample is already drawn, the estimated value will be completely determined.

Luckily, the randomness in sampling is not completely arbitrary, but does follow the rules of probability theory again. This way, it is generally possible to derive the distribution of any estimator for a given underlying value of and a sample size. In some cases, it is possible to even analytically calculate the distribution of an estimator. In general, however, it should be sufficient to conduct a small simulation study, in order to get an impression about the properties of an estimator.

Accordingly, we now will take a look at the properties of the estimator . Thereby, we assume , but we pretend like if we didn’t knew, and use the estimator in order to come up with a guess for .

Simulating different samples, we get two different estimated values .

## determine sample size nEvents <- 2000 ## determine probability p <- 0.1 ## define estimator function applyEstimator <- function(simVals){ pHat <- sum(simVals)/length(simVals) } ## set seed to achieve replicability set.seed(126) ## simulate values simVals <- rbinom(nEvents, 1, p) ## calculate estimate pHat <- applyEstimator(simVals) ## simulate again simVals <- rbinom(nEvents, 1, p) ## calculate estimate pHat[2] <- applyEstimator(simVals) ## display both values (pHat)

[1] 0.096 0.109

As the true value used for simulation was , both estimated values are rather close to the true value. Due to the randomness of the underlying samples, however, they are neither exactly equal to the true value, nor equal to each other.

Still, we have inspected the estimator for only two different samples. But what we rather want to know, is how close the estimator gets on average, or what could happen in the worst case. Ideally, we would like to have the complete distribution of .

This distribution can be approximated through repeatedly evaluating the estimator for differently drawn samples. Therefore, we now repeat the experiment 10000 times, instead of only twice. For each drawn sample, we derive the estimate, and afterwards visualize the distribution of all estimated values.

library(ggplot2) ## define number of repetitions nReps <- 10000 ## preallocate vector to save estimated values pHat <- numeric(nReps) ## in each experiment draw sample and evaluate estimator for (ii in 1:nReps){ ## draw samples simVals <- rbinom(nEvents, 1, p) ## calculate estimate pHat[ii] <- applyEstimator(simVals) } ## transform to data frame pHatDistr <- data.frame(pHat) ## plot estimated values pp1 <- ggplot(pHatDistr) + geom_density(aes(x=pHat)) + geom_vline(xintercept = p, color="red") + scale_x_continuous(lim=c(0, 1)) + xlab(expression(hat(theta))) + ylab("frequency") pp1

From the figure we can already get that the majority of estimated values is between approximately 0.09 and 0.11. For some unfavorable samples, however, we still get estimated values that deviate from the true value with more than 0.01.

What we have so far is an approximation to the real distribution of the estimator . For continuous random variables, a distribution function contains information about the relative likelihoods of all of the infinitely many individual outcomes. This information is generally hard to grasp, so that we now want to focus on some main characteristics of the distribution only. This way, we can reduce the information contained in the continuous distribution into simple numbers, so that, for example, several distribution function might be easier compared. One common way to reduce the distribution function into only a few quantities is to extract its median value, together with some further quantiles of the distribution. It is this principle that one of the most common approaches is based on: the Box plot.

The Box plot tries to visualize the underlying structure of a data sample. An inner Box reaches from the 25 to the 75 percent quantile, and hence visualizes the inner 50 percent of data points. The length of this Box is called inter quartile range. Depending on whether upper and lower bounds are equally distant to the median, the distribution is symmetric or skewed. Further outreaching lines end at the last point within 1.5 times the inter quartile range, measured from the bounds of the inner Box. These lines are called Whiskers. All data points further away from the inner Box are called outliers, and visualized as points. The following plot shall further visualize the construction of a Box plot for the current data sample of estimated values . It is heavily inspired by some code written by Dennis Murphy on the ggplot mailing list.

x <- pHat ## get values of kernel density at sample points df1 = data.frame(x = density(x)$x, y = density(x)$y) ## get boxplot statistics: quantiles, median, min and max df2 = data.frame(t(boxplot.stats(x)$stats)) names(df2) = c('xmin', 'fq', 'med', 'tq', 'xmax') ## get midpoint of maximum density value for height of whiskers in ## graphics df2$ymx <- max(df1$y) df2$ymid <- df2$ymx/2 ## approximate kernel density at median yMed <- approx(df1$x, df1$y, median(x)) df3 <- data.frame(x = df2$med, y = yMed$y) ## determine endpoints of the horizontal whiskers: df2$iqr <- with(df2, tq - fq) df2$wmin <- with(df2, max(xmin, fq - 1.5 * iqr)) df2$wmax <- with(df2, min(xmax, tq + 1.5 * iqr)) ## determine lower outliers outliers <- c(x[x < df2$wmin], x[x > df2$wmax]) dfOutliers <- data.frame(x = outliers, y = rep(df2$ymid, length(outliers))) pp0 <- ggplot() + geom_line(data = df1, aes(x, y), size = 1, color = 'blue') + ## vertical segment at the median geom_segment(data = df3, aes(x = x, y = y, xend = x, yend = 0)) + ## left and right segments, respectively geom_segment(data = df2, aes(x = wmin, y = ymid, xend = fq, yend = ymid)) + geom_segment(data = df2, aes(x = tq, y = ymid, xend = wmax, yend = ymid)) + ## vertical whiskers - reset 0.02 to change height geom_segment(data = df2, aes(x = xmin, y = ymid - 0.02, xend = xmin, yend = ymid + 0.02)) + geom_segment(data = df2, aes(x = xmax, y = ymid - 0.02, xend = xmax, yend = ymid + 0.02)) + ## filled 'box' geom_ribbon(data = subset(df1, x >= df2$fq & x <= df2$tq), aes(x = x, ymin = 0, ymax = y), fill = 'green', alpha = 0.4) + geom_point(data = dfOutliers, aes(x = x, y = y)) + xlab(expression(hat(theta))) + ylab("frequency") ## create resulting box-plot only pp2 <- ggplot(pHatDistr) + geom_boxplot(aes(x=1, y=pHat)) + coord_flip() + theme(axis.text.y = element_blank(), axis.ticks = element_blank()) + ylab(expression(hat(theta))) + xlab("resulting box-plot") ## combine plots in one graphic multiplot(pp0, pp2)

As can be seen from the Box plot, the median of the estimated values perfectly coincides with the true underlying value of . Still, from time to time, there are data samples that result in estimated values that lie outside of the Whiskers.

Next, we want to use Box plots in order to examine whether the distribution of the estimator depends on the sample size. Thereby, sample size means just the size of data that is used for the derivation of . Hence, we now want to additionally perform the same simulation study as before, but this time for different sample sizes also. The resulting distributions of estimated values of the individual sample sizes will be compared based on Box plots.

## define sample sizes for which to conduct simulation study sampleSizes <- c(10, 50, 100, 200, 1000, 20000) ## define number of repetitions nReps <- 10000 ## preallocate vector to save estimated values pHats <- matrix(NA, nrow=nReps, ncol=length(sampleSizes)) ## in each experiment draw sample and evaluate estimator for (jj in 1:length(sampleSizes)){ ## get current sample size nEvents <- sampleSizes[jj] for (ii in 1:nReps){ ## draw samples simVals <- rbinom(nEvents, 1, p) ## calculate estimate pHats[ii, jj] <- applyEstimator(simVals) } } ## transform to data frame pHatDistr <- data.frame(index=1:nReps, pHats) ## reshape data to long format library(reshape2) pHatsMt <- melt(pHatDistr, id.vars = "index") pHatsMt$variable <- factor(pHatsMt$variable, labels=sampleSizes) pp1 <- ggplot(pHatsMt) + geom_boxplot(aes(x=variable, y=value)) + xlab("different sample sizes") + ylab(expression(hat(p))) pp1

As the Box plots of the simulation study show, the estimation of gets more precise with increasing sample size. Hence, the more historic data, the better the estimation.

Therefore, quantitative risk management only works reliably when we have sufficient historic data at our hands. Only then we can determine the parameters of the model appropriately. Hence, a reliable and comprehensive database is a crucial component of sound risk management. In turn, this also implies that quantitative risk management reaches its limits whenever no data is available at all – something that most likely happens whenever the company enters new business areas. In these cases, it may still help to formulate the risks in terms of a mathematical model. However, the parameters of the model then need to be determined based on expert knowledge.

From now on, we will temporarily exclude the issue of sample size from our consideration again, and simply treat the size of the historic data as externally given. Keep in mind, however, that in reality the quality of risk management could be improved through collection of additional data.

Let’s now get back to the issue of determining the default probability in our example. Previously, we did take the true value of as given, and used it to derive the probability of default . Now, we will assume that is unknown, so that it needs to be estimated from historic data. This best guess for in turn can be used to derive the probability of default: we treat it as true parameter, and – equivalently to above – use the model to calculate the associated probability of default.

However, in the simulation study above, we have seen how the estimated value for a true underlying value of is a random variable. It depends on the actual sample that is used for estimation. Hence, also the probability of default that we derive for a given best guess will depend on the actual sample used, and deviate from its true value. Again, we now want to examine this effect in a simulation study.

This time, however, we want to additionally take into account an effect that we did neglected before: the distribution of is discrete in reality. This is just an artifact due to the construction of the estimator,

With this definition, for any given sample size , the estimated value of can only take on some discrete values. For example, with , the only possible values for are

This discrete nature of will then also be carried over to the derived estimated value of the default probability.

######--------------------------------------------------------------- ## specify settings of example ######--------------------------------------------------------------- nPersons <- 1000 claimSize <- 20 p <- 0.1 capital <- 300 ## determine expected loss per person expLoss <- p*claimSize ## determine premium payment slightly above premiums <- expLoss*1.05 ######--------------------------------------------------------------- ## determine true probability of default ######--------------------------------------------------------------- getPd <- function(p){ ## determine probability of default depending on loss event ## probability ## ## get possible expenditures possibleNumberOccurrences <- rev(0:nPersons) possibleLosses <- possibleNumberOccurrences*claimSize ## ## get possible profits profits <- nPersons*premiums - possibleLosses ## ## get possible wealth outcomes wealth <- profits + capital ## ## get associated probabilities for losses binomProbs <- dbinom(possibleNumberOccurrences, size = nPersons, prob = p) ## ## get default probability pd <- sum(binomProbs[wealth < 0]) } ## display true probability of default (truePd <- getPd(0.1)) ######--------------------------------------------------------------- ## specify settings for simulation study ######--------------------------------------------------------------- ## define number of repetitions for simulation study nReps <- 10000 ## define sample size sampleSize <- 1000 ######--------------------------------------------------------------- ## estimate p values ######--------------------------------------------------------------- ## define estimator function applyEstimator <- function(simVals){ pHat <- sum(simVals)/length(simVals) } ## preallocate vector to save estimated values pHat <- numeric(nReps) ## in each experiment draw sample and evaluate estimator for (ii in 1:nReps){ ## draw samples simVals <- rbinom(sampleSize, 1, p) ## ## calculate estimate pHat[ii] <- applyEstimator(simVals) } ## get discrete distribution of estimated p values uniquePHat <- as.data.frame(table(pHat)) pHatDistr <- data.frame(pVals = as.numeric(as.character(uniquePHat$pHat)), frequency = uniquePHat$Freq) ## plot distribution of estimated p values pp1 <- ggplot(pHatDistr) + geom_bar(aes(x=pVals, y=frequency), stat="identity", width=1/(2*nPersons)) + geom_vline(xintercept = p, color="red") + xlab(expression(hat(p))) + ylab("frequency") ######--------------------------------------------------------------- ## for estimate p values, derive associated probabilities of default ######--------------------------------------------------------------- ## get probability of default for given p values pds <- sapply(pHat, getPd) ## define function to count frequencies of unique values freqUniqueOccur <- function(x){ uniqueVals <- as.data.frame(table(x)) uniqueDistr <- data.frame(vals = as.numeric(as.character(uniqueVals[, 1])), frequency = uniqueVals[, 2]) } ## get discrete distribution of estimated pds values pdDistr <- freqUniqueOccur(pds) ## plot distribution of estimated pds pp2 <- ggplot(pdDistr) + geom_bar(aes(x=vals, y=frequency), stat="identity", width=0.005) + geom_vline(xintercept = truePd, color="red") + xlab(expression(hat(PD))) + ylab("frequency") multiplot(pp1, pp2, cols = 2)

The figure depicts the distribution of estimated values , next to the distribution of estimated default probabilities. As we can see, the imprecision in the estimation of seems to even magnify for default probabilities – at least for the cases where is overestimated.

One explanation for this sensitivity to overestimation of is that we kept premium payments fix. That is, premium payments were calculated with respect to the true value of . If we now suddenly assess a substantially larger value of the loss event probability, then the expected profits from premium payments will disappear. Overestimating too much will make us think that current premium payments are insufficient to compensate expected expenditures, so that we expect to incur losses on a regular basis. This, in turn, would substantially increase the perceived default probability, if we did not offset it through an increasing level of capital.

The same analysis also can be done for value-at-risk, which in reality is one of the most important risk quantities that need to be estimated. Again, we investigate how the estimated value-at-risk deviates from its true value, given that it is derived from .

######--------------------------------------------------------------- ## derive value-at-risk for given p ######--------------------------------------------------------------- ## specify confidence level alpha <- 0.98 ## function to calculate value-at-risk getValRisk <- function(p){ valRisk <- qbinom(alpha, nPersons, p)*claimSize - nPersons*premiums } ## display true value-at-risk (trueValRisk <- getValRisk(0.1)) ## calculate value-at-risk for estimated p values estValRisk <- sapply(pHat, getValRisk) ## get discrete distribution for value-at-risk valRiskDistr <- freqUniqueOccur(estValRisk) ## plot distribution of estimated pds pp2 <- ggplot(valRiskDistr) + geom_bar(aes(x=vals, y=frequency), stat="identity", width=5) + geom_vline(xintercept = trueValRisk, color="red") + xlab(expression(hat(VaR))) + ylab("frequency") multiplot(pp1, pp2, cols = 2)

We can see how estimation errors for event loss probabilities also enter into the estimation of value-at-risk. While in reality capital reserves of value would be required to match the specified risk appetite, we will falsely be tempted to provide deviating precautions. For example, in some situations we will be inclined to provide capital reserves as high as , whereby the true amount of required capital would be greatly surpassed. And remember: holding too much capital reserves is a costly business.

At the same time, the incorrect estimation of value-at-risk will also change the true probability of default, once that we rely and act upon it. As the probability of default of a company depends on the amount of capital piled up as reserves, it is influenced by our decision of how much capital to provide. Hence, we will inevitably change the default probability into unintended directions, once we set our capital level with respect to the wrongly estimated values of value-at-risk. This way, we will end up with overly high protection against business risks during some times, while we will take on unreasonably high risks in other cases.

We now visualize this interconnection of estimated values of value-at-risk and resulting probabilities of default that deviate from the intended level.

######--------------------------------------------------------------- ## with true values, get PD for different levels of capital ######--------------------------------------------------------------- ## define capital levels capitalLevels <- seq(-500, 1500, 10) ## get possible expenditures possibleNumberOccurrences <- rev(0:nPersons) possibleLosses <- possibleNumberOccurrences*claimSize ## get possible profits profits <- nPersons*premiums - possibleLosses ## get associated probabilities for losses binomProbs <- dbinom(rev(0:nPersons), size = nPersons, prob = p) ## define function to get probability of default getPdForCapital <- function(capitalLevel){ ## get wealth outcomes wealth <- profits + capitalLevel ## get wealth probabilities pd <- sum(binomProbs[wealth < 0]) } ## get true pd for capital of 300 pdForCapital <- data.frame(capital = capitalLevels, pd = sapply(capitalLevels, getPdForCapital)) pp3 <- ggplot(pdForCapital) + geom_line(aes(capital, pd)) pp3 ## get pds for estimated value-at-risk capital levels realPds <- sapply(estValRisk, getPdForCapital) ## get discrete distribution for real default probabilities realPdDistr <- freqUniqueOccur(realPds) ## plot distribution of estimated pds pp4 <- ggplot(realPdDistr) + geom_bar(aes(x=vals, y=frequency), stat="identity", width=0.01) + geom_vline(xintercept = 1-alpha, color="red") + xlab("PD") + ylab("frequency") multiplot(pp2, pp3, pp4, cols = 1)

In the upper part of the figure we see the estimated results for value-at-risk. The middle part shows how these capital levels in turn translate into probabilities of default. Then, in the bottom part, we see the risk that we really will end up taking given that we blindly rely on the estimated value of value-at-risk. For comparison, the originally intended level of risk is highlighted with a red line.

As this bottom figure very much looks alike the one plot of default probabilities that we created above, let’s assure ourselves that we did not plot the same distribution again. Previously, we did hold the capital level fix, while we did try to estimate the associated probability of default. Due to estimation, we did end up with slightly deviating guesses for the true probability. However, this did only influence our perception of risk – the true level of risk was never influenced, since we did not adapt the capital level according to our estimation. One the other hand, in the scenario now we will act on the estimation of value-at-risk and adapt our capital level accordingly. This way, we influence the true probability of default, most likely influencing it in unintended ways as well. This is called estimation risk.

As we have seen, these two approaches lead to rather similar plots, as they are both based on the imprecisions associated with the estimation of . When shown side by side in one figure, however, we will see that these distributions are not the same.

nDifferentValues <- length(realPdDistr$vals) comparisonPDs <- data.frame(pds = c(pdDistr$pds, realPdDistr$vals), freq = c(pdDistr$frequency, realPdDistr$frequency), variable = factor(c(rep(1, nDifferentValues), rep(2, nDifferentValues)))) pp5 <- ggplot(comparisonPDs) + geom_bar(aes(x=pds, y=freq, color=variable), stat="identity", width=0.005) + geom_vline(xintercept = truePd, color="red") + xlab("PD") + ylab("frequency") pp5

Summing up, in this part we have seen the risk that arises from the fact that the true parameters of our model are unknown and need to be estimated from historic data. This way, our assessment of risk will always deviate from the true level that we face. And, whenever we try to manage risks based on imprecise estimations, the true risk level will be influenced in unintended ways, too.

However, estimation risk still follows probabilistic laws. Hence, at least we can quantify the likelihood of deviations of estimated parameters rather well. A knowledge that we could incorporate into our risk strategy, for example, by providing an adequate add-on to our risk capital. Also, estimation risk vanishes with increasing sample size – a positive outlook, given that we nowadays keep track of and store more data than at any other point in history.

## 3 Model risk

Still, the target of this part is the assessment of risk as it is measured through value-at-risk. Hence, our interest lies in the determination of an adequate capital cushion that complies with a pre-set level of security. Ultimately, value-at-risk is just the value of a quantile, and hence it is a property of the overall loss distribution.

So far, we only did concentrate on one possible way to determine value-at-risk, where we did derive it from the loss events of individual policy holders. However, as the saying goes, many roads lead to Rome, and the same applies to value-at-risk as well. Especially, you could wonder why we did take the detour via individual loss events in the first place, instead of just directly estimating the quantile as a property of the loss distribution. Whenever we have data on individual loss events, we also have data on the loss distribution itself, so why not just derive value-at-risk, for example, as the empirical quantile of this loss distribution data?

To really understand why these are two distinct approaches that generally will lead to different outcomes, we now directly dive into a concrete example. This way, differences between both approaches will immediately become obvious.

This example builds upon the setup that we already used before: still, we hold a portfolio of independent insurance policies, each with claim size of and loss probability . Only this time we will sample the historic data in a more realistic way: we keep record of loss events for each person individually over a time period of 50 years. Thereby, each person shall have either 0 or 1 loss event per year, such that multiple loss events per year and person are not allowed.

At first, we now will try to retrieve the true loss distribution the same way that we did before. We use the complete loss event data of all persons and all years to get an estimate for the loss event probability. Then, we use this estimate to derive the overall profit distribution, where probabilities of profit events are generated from a binomial distribution that holds for loss events. Looking at the comparison between estimated profit distribution and true distribution, it is already quite hard to see any difference. Only when we visualize the differences directly in the bottom of the figure, we can really see any deviations. Hence, the estimated profit distribution that we achieved this way is extremely close to the true one.

library(ggplot2) source("rlib/multiplot.R") ######--------------------------------------------------------------- ## specify settings ######--------------------------------------------------------------- ## specify individual person settings p <- 0.1 claimSize <- 20 premiums <- p*claimSize*1.05 ## specify portfolio settings nPersons <- 1000 nYears <- 50 ######--------------------------------------------------------------- ## get true loss distribution ######--------------------------------------------------------------- ## get probabilities from binomial distribution eventProbs <- dbinom(rev(0:nPersons), size = nPersons, prob = p) ## get possible profits profitEvents <- nPersons*premiums - rev(0:nPersons)*claimSize ## profit distribution as data frame trueProfitDistr <- data.frame(xVals = profitEvents, pVals = eventProbs) ######--------------------------------------------------------------- ## simulate profit data ######--------------------------------------------------------------- ## simulate historic data simVals <- matrix(runif(nPersons*nYears), nrow = nYears, ncol = nPersons) lossEvents <- matrix(0, nrow = nYears, ncol = nPersons) lossEvents[simVals < p] <- 1 expenditures <- 20*lossEvents ## get associated profits profits <- premiums - expenditures ######--------------------------------------------------------------- ## estimate p and derive associated loss distribution ######--------------------------------------------------------------- ## get loss probability per person per year pHat <- sum(lossEvents)/nPersons/nYears ## get estimated probabilities for profit distribution estimatedEventProbs <- dbinom(rev(0:nPersons), size = nPersons, prob = pHat) ## get estimated profit distribution as data frame estimatedProfitDistr <- data.frame(xVals = profitEvents, pVals = estimatedEventProbs) ## get deviations from true probabilities differenceDistr <- data.frame(xVals = profitEvents, pVals = eventProbs - estimatedEventProbs) ## plot estimated profit distribution next to real one pp1 <- ggplot(trueProfitDistr) + geom_bar(aes(x=xVals, y=pVals), stat="identity") + xlim(-800, 1000) + xlab("profit") + ylab("probability") pp2 <- pp1 %+% estimatedProfitDistr + ylab("estimated probability") pp3 <- pp1 %+% differenceDistr + ylab("difference") multiplot(pp1, pp2, pp3, cols = 1)

Now, we want to try a different approach to the estimation of the profit distribution. The main difference between both approaches will be that we interpret the observations of the historic loss data as associated with different random variables. In the first approach, we did treat entries in the historic loss data as realizations of individual loss events. Hence, we did use the observations in order to estimate the probability of an individual loss event. This time, however, we will aggregate individual loss events and premium payments on a yearly basis. What we get after this aggregation now are observations directly from the quantity of interest: profit per year. However, due to aggregation, the sample of historic observations of profit per year consists of only 50 observations – one observation per year. The observations are shown in the following figure.

######--------------------------------------------------------------- ## visualize profit distribution ######--------------------------------------------------------------- ## time series of profits yearlyProfits <- rowSums(profits) yearlyProfitsDf <- data.frame(profit = yearlyProfits, year = 1:nYears) ## plot histogram with density as approximation of profit distribution pp2 <- ggplot(yearlyProfitsDf) + geom_point(aes(x=year, y=profit)) + ylab("profit per year")

We now want to use this sample of observations in order to retrieve the distribution that was used to generate them. Therefore, we will make use of two basic statistical methods: plotting a histogram of the sample, as well as estimating a kernel density to get a smooth representation of the data.

pp3 <- ggplot(yearlyProfitsDf, aes(x=profit)) + geom_histogram(aes(y=..density..), # Histogram with density instead of count on y-axis binwidth=50, colour="black", fill="white") + geom_density(alpha=.2, fill="#FF6666") + xlab("profit per year") + xlim(-800, 1000) multiplot(pp3, pp1, cols = 1)

As we can see from the comparison in the figure, the estimate of the profit distribution that is achieved this way largely deviates from the true distribution. The reason for this is the rather small number of observations that could be used this way.

In order to assure ourselves that the large deviations are not due to any mistakes in our implementation, we conduct the estimation approach once again. This time, however, we choose a much larger sample size of 50000 observations, in order to eliminate the estimation error almost completely.

## specify portfolio settings nYears <- 50000 ######--------------------------------------------------------------- ## simulate profit data for new sample size ######--------------------------------------------------------------- ## simulate historic data simVals <- matrix(runif(nPersons*nYears), nrow = nYears, ncol = nPersons) lossEvents <- matrix(0, nrow = nYears, ncol = nPersons) lossEvents[simVals < p] <- 1 expenditures <- 20*lossEvents ## get associated profits profits <- premiums - expenditures ######--------------------------------------------------------------- ## visualize profit distribution ######--------------------------------------------------------------- ## time series of profits yearlyProfits <- rowSums(profits) yearlyProfitsDf <- data.frame(profit = yearlyProfits, year = 1:nYears) ## plot histogram with density as approximation of profit distribution pp3 <- ggplot(yearlyProfitsDf, aes(x=profit)) + geom_histogram(aes(y=..density..), # Histogram with density instead of count on y-axis binwidth=20, colour="black", fill="white") + geom_density(alpha=.2, fill="#FF6666") + xlab("profit per year") + xlim(-800, 1000) multiplot(pp3, pp1, cols = 1)

As we can see from the figure, this time the deviation to the true underlying profit distribution is only minimal. Hence, our implementation is correct, and the imprecise estimation in the first case with only 50 years of data is only due to estimation. With such a small sample size, it is rather impossible to get the underlying distribution right.

Summing up, the crucial point of this exercise is that there are many different estimators for any given target quantity. Furthermore, these estimators also do make use of data more or less efficiently. However, below the surface, increasing efficiency of data usage in reality can only be accomplished through one technique: incorporating additional assumptions! In general, it is impossible to get more out of some data than there really is. Efficient data usage is not a one-way optimization. Conversely, using data more efficiently is only possible when you choose to incorporate additional assumptions. And this, in turn, is a very critical move: you always face the risk of accidentally including false assumptions, thereby introducing bias into the analysis. Hence, choosing between different estimators always is a tradeoff: more efficient data usage in the best case, opposed to biased results in case that you get the assumptions wrong.

The tricky part, however, is that not all assumptions implied by a certain model may be obvious. For example, looking at the two estimators used in the example above, they rely on assumptions to quite different degrees. The second estimator basically tries to visualize the underlying structure of the data with only a few technical interventions: the bin size for the histogram, and some settings to determine the flexibility of the smoothly estimated distribution function. The first estimator, however, implicitly uses some structural assumptions that are necessary to aggregate the individual loss events into an overall profit distribution. They are:

- equal loss probabilities for each person
- independent loss events

In the example, the estimator just did show such a good performance because these assumptions did comply with the real underlying structure of the data. However, the assumptions of course are not necessarily fulfilled in reality.

Accordingly, we will now take a look at the performance of the estimator for a case where the assumptions are not fulfilled anymore. More precisely, we now will break the assumption of independence. One possible way to achieve this is by rendering the true loss event probability time-varying. The logic behind this is quite simple. Imagine a group of people that are insured against flood. Now, whenever rainfalls within a year are above average, this will increase the probability of flood damages simultaneously for all policy holders. In contrast, there are also years with very low precipitation, thereby simultaneously decreasing the probability of flood damages for all insured people. Mathematically, we can model this situation through introduction of an additional variable that keeps track of the state of rainfalls for each year: high or low rainfalls . These states will be called regimes, denoted by variable . We do not know in advance which state of rainfalls the current year will bring, but we only know the probabilities for both states:

Furthermore, in order to take into account the varying probabilities of flood damages for each of the regimes, each regime now gets assigned its own loss event probability:

In other words: if we are in regime 1 with high rainfalls, the probability of a flood damage will be for each policy holder. In contrast, this probability will be for years of low rainfall.

Still, however, the regime probabilities are set such that the overall probability of a loss damage equals the pre-set level of . Hence, unconditionally, if we do not know whether we have a year of high or low rainfalls, the probability of a flood damage should equal for any person :

This way, the distribution of any individual loss event is completely identical to the previous settings. The only thing that changes is that loss variables now become dependent. In order to measure this, we make use of the concepts of covariance and linear correlation:

For the case of independent loss events, covariance and correlation are equal to zero:

This also could be calculated easier using the simplified formula for covariance:

This way, we get:

In order to get the covariance for the case of two regimes, we first derive the probability of joint losses:

Plugging this into the formula for covariance, we get:

And, for the correlation:

Now that we have introduced the main formulas for the analysis, we set up the values for a concrete example. For a given value of and , we first need to calculate . Then, we plot the different distributions that are obtained for each regime next to each other.

######--------------------------------------------------------------- ## determine settings for time-varying loss event probability ######--------------------------------------------------------------- ## set p1 and w1 p <- 0.1 p1 <- 0.3 w1 <- 0.1 w2 <- 1-w1 ## determine p2 = p1*w1 + p2*(1 - w1) (p2 <- (p - p1*w1)/(1 - w1)) ## get distribution for each regime uncondDistr <- data.frame(xVals = c(0, 1), pVals = c(1-p, p)) regimeDistr1 <- data.frame(xVals = c(0, 1), pVals = c(1-p1, p1)) regimeDistr2 <- data.frame(xVals = c(0, 1), pVals = c(1-p2, p2)) ## plot each distribution pp1 <- ggplot(uncondDistr) + geom_bar(aes(x=xVals, y=pVals), stat="identity", width=0.02) + xlab("loss event") + ylab("probability") pp2 <- ggplot(regimeDistr1) + geom_bar(aes(x=xVals, y=pVals), stat="identity", width=0.02) + xlab("loss event") + ylab("probability") pp3 <- ggplot(regimeDistr2) + geom_bar(aes(x=xVals, y=pVals), stat="identity", width=0.02) + xlab("loss event") + ylab("probability") multiplot(pp2, pp3, cols = 2)

With , for the second regime and we get:

[1] 0.07777778

Hence, we now have one regime with loss event probability larger than , and one with lower probability.

By construction, the individual loss events of different policy holders now are not independent anymore, as they enter regimes of high and low rainfall simultaneously. This way, their loss event probabilities are affected in similar ways, which ultimately will lead to dependency. We can easily check this through computation of covariance and correlation.

## calculate covariance (covar <- p1^2 * w1 + p2^2 * w2 - p^2)

[1] 0.004444444

## calculate correlation (correl <- covar/(p*(1-p)))

[1] 0.04938272

So what are the effects of this dependency on the yearly profit distribution? By construction, the individual loss events are still independent, once that the uncertainty about the underlying regime has been resolved. Hence, within each regime, joint loss events still follow a binomial distribution. The overall distribution of profits then emerges as a simple weighted sum of the profit distributions of both regimes. This way, profits will exhibit a bimodal distribution.

## determine profit distribution for each regime eventProbsLow <- dbinom(rev(0:nPersons), nPersons, prob = p2) eventProbsHigh <- dbinom(rev(0:nPersons), nPersons, prob = p1) eventProbsRegimes <- w1*eventProbsHigh + w2*eventProbsLow ## determine true profit distribution profitDistrRegime <- data.frame(xVals = profitEvents, pVals = eventProbsRegimes) ## plot true profit distribution with dependency pp1 <- ggplot(profitDistrRegime) + geom_bar(aes(x=xVals, y=pVals), stat="identity") + xlab("profit") + ylab("probability")

In order to see the effects of wrong model assumptions, we will now sample from the new underlying structure with dependency, but try to estimate the value-at-risk under the assumption of independence. As estimation with data always incorporates estimation risk, too, we choose a rather large sample size. This way, estimation risk will be mostly eliminated, and we are left with only the bias due to wrong model specification. However, we first want to assure ourselves of the dependence structure that now underlies the sample data. Therefore, we compare empirical correlations estimated from the new model with empirical correlations of sample data from the model with independence.

## number of repititions nYears <- 20000 ## simulate regimes regimes <- rep(p2, nYears) regimes[runif(nYears) < w1] <- p1 ## preallocate simulate loss events lossEvents <- matrix(NA, nYears, nPersons) ## simulate loss events for given loss event probability simLosses <- function(p){ yearLosses <- rep(0, nPersons) yearLosses[runif(nPersons) < p] <- 1 yearLosses } library(plyr) ## simulate data for regime structure and independent loss events simLossesRegime <- aaply(regimes, 1, simLosses) simLossesConstant <- aaply(rep(p, nYears), 1, simLosses) ######--------------------------------------------------------------- ## visualize empirical correlations ######--------------------------------------------------------------- ## get some empirical correlations nCorrs <- 1000 empCorrsRegime <- rep(0, nCorrs) empCorrsConst <- rep(0, nCorrs) for(ii in 1:nCorrs){ samplePersons <- sample(1:nPersons, size = 2) empCorrsRegime[ii] <- cor(simLossesRegime[, samplePersons[1]], simLossesRegime[, samplePersons[2]]) empCorrsConst[ii] <- cor(simLossesConstant[, samplePersons[1]], simLossesConstant[, samplePersons[2]]) } ## plot histograms of empirical correlations for both models empCorrsRegimeDf <- as.data.frame(empCorrsRegime) empCorrsConstDf <- as.data.frame(empCorrsConst) pp1 <- ggplot(empCorrsRegimeDf) + geom_histogram(aes(x=empCorrsRegime)) + xlab("empirical correlations") pp2 <- ggplot(empCorrsConstDf) + geom_histogram(aes(x=empCorrsConst)) + xlab("empirical correlations") multiplot(pp1, pp2, cols = 2)

Clearly, the sample data of the new model does show positive empirical correlations.

Finally, we now want to check what would have happened, if the underlying dependence structure was undetected. Therefore, we assume that we did set the capital level according to the estimated value-at-risk from the wrong model, and calculate the real probability of default.

######--------------------------------------------------------------- ## estimate probability of default for value-at-risk with wrong ## assumption ######--------------------------------------------------------------- ## set confidence level of value-at-risk alpha <- 0.98 ## estimate parameter p pHat <- sum(simLossesRegime)/(nYears*nPersons) ## get quantile for loss event distribution quantLossEvents <- qbinom(alpha, size = nPersons, prob = pHat) ## get value-at-risk valRisk <- quantLossEvents*claimSize - nPersons*premiums ## get real probability of default for estimated value-at-risk (pd <- sum(eventProbsRegimes[profitEvents < 0]))

Instead of the 2 percent probability of default that were intended, we now face a probability of

[1] 0.1007771

Concluding, we now have seen the problems that may arise whenever the model we use builds on wrong assumptions. In the example, the wrong assumptions that we used were quite obvious, especially since it was completely observable at what point we did incorporate them in the model ourselves. In reality, of course, model flaws will not be pointed out equally clearly. Here, we have to search for hidden misspecifications on our own. Even worse, usually sample sizes in reality will be much smaller, so that deviations between model assumptions and reality will be much harder to detect. Especially, since we also simultaneously have to deal with estimation risk.

As side effect of the example, we additionally did get a first impression of the implications that dependency could have on risk. In general, positive dependencies will diminish the effects of diversification, thereby preserving fluctuations from expected values more than under independence. This usually leads to higher risks.

Posted on 2013/12/02, in financial econometrics, R and tagged insurance, Rbloggers, risk management. Bookmark the permalink. 2 Comments.

impressive. by the way, recently i read another impressive work about mean-VAR in contrast mean-Variance, which also called Markowitz as u know. it sounds making a bridge between what i learned from portfolio analysis and risk management. did u kown or hear any comments about mean-VAR from e.g. Prof. Mittnik or other Knowers

So far I never really delved into this topic. But VaR is only a more informative measure than standard deviation if you leave the elliptical world. This, in turn, will most likely exclude analytical solutions (something I also wrote about once), so that you will have to rely on numerical optimizations. I’d expect this to be a rather cumbersome approach in higher dimensions. Of course, it would be highly desirable to have an alternative to Markowitz that better accounts for stylized effects of financial time series.