Analytical solutions in risk management

In risk management, most traditional approaches are built around a fundamental desire for analytical tractability. The reason for this is obvious: with only minor computational power of computers the only way to guarantee quick solutions was through mathematics. And in practice, when risk assessment and portfolio optimization becomes part of your business, you wouldn’t want to have a model that needs hours or days in order to come up with some insights into your risks.

However, in reality the challenges on your analytical risk management model are rather high. It is almost sure that your model will not have to deal with one-dimensional risks only. Whether high-dimensionality arises through multiple assets or multi-period risk assessment of one asset only: both cases force you to deal with interactions of multiple random variables, thereby complicating an analytical solution enormously. And, of course: in most real world situations, you will encounter both cases. Life just ain’t easy…

In this post we will try to work out some circumstances that need to prevail in order that we could still hope to find an analytical solution. Still, whether such convenient circumstances also do occur in reality, is something that we will need to put some further thought on afterwards.

1 Aggregating returns

At the very core of risk management, we will always have a function of multiple random variables, which aggregates our high-dimensional vector of individual risk components into a one-dimensional risk distribution: a portfolio and / or a multi-period return. And, basically, the only way to not exclude an analytical solution already at this fundamental step of the model is through usage of a sufficiently well-behaved aggregation function. In other words: if you try to use anything else but a linear function on your multi-dimensional risk factors, you can already immediately bury your hopes on any analytical solution!

What does this requirement of linearity mean for risk management in practice? Is this really a restriction, or are aggregations strictly linear anyways?

1.1 Aggregating discrete returns

Let’s first assume that we deal with discrete net returns:

\displaystyle  r_{t}^{discr}:=\frac{P_{t}-P_{t-1}}{P_{t-1}}

Given discrete returns r_{i,t}^{discr} on individual assets, the portfolio return is indeed given through a linear function:

\displaystyle  r_{P,t}^{discr}= \sum_{i=1}^{N}w_{i}r_{i,t}^{discr}

Thereby w_{i} denotes the weight on the i-th asset: the relative proportion of our overall budget invested in asset i.

So far, so good. Yes, but only so far… As soon as aggregation over time becomes part of the challenge, linearity will be lost for discrete returns. That is, n-period returns are calculated as a product of one-period returns:

\displaystyle  r_{t,t+n}^{discr}=\prod_{i=1}^{n}(1+r_{t+i}^{discr}) -1

Concluding, with discrete returns and a time horizon of only one period, aggregation will require linear transformations only. But what about multiple periods? Any chances to get around the non-linearity here?

1.2 Aggregating logarithmic returns

Instead of discrete returns, let’s now take a look at the case of logarithmic returns:

\displaystyle \begin{aligned} r_{t}:&=\log(1+r_{t}^{discr})\\  &=\log\left(\frac{P_{t}}{P_{t-1}}\right)\\  &=\log(P_{t})-\log(P_{t-1}) \end{aligned}

Now, aggregation over time becomes linear:

\displaystyle \begin{aligned} r_{t,t+n}&=\log (1 + r_{t,t+n}^{discr})\\ &=\log \left[1+ \prod^{N}_{i=1}(1+r_{t+i}^{discr}) -1\right] \\ &=\log[(1+r^{discr}_{t+1})(1+r^{discr}_{t+2})\ldots (1+r^{discr}_{t+n})]\\ &=\log(1+r^{discr}_{t+1}) + \log(1+r^{discr}_{t+2}) + \ldots + \log(1+r^{discr}_{t+n})\\ &=r_{t+1} + r_{t+2} + \ldots + r_{t+n} \end{aligned}

So, logarithmic returns do solve the non-linearity problem? Well, don’t count your chickens before they hatch… Yes – we now have linearity with respect to aggregation over time. But, sadly, fixing the one aggregation simultaneously breaks the other: this time, our problem will arise for the case of multiple assets.

\displaystyle \begin{aligned}  r_{P}&= \log\left(1+r_{t,P}^{discr}\right)\\  &=\log\left( 1 + \sum^{N}_{i=1}w_{i}r_{t,i}^{discr}\right)\\  &=\log\left( 1+ \sum^{N}_{i=1}w_{i}[\exp(\log(1+r_{i}^{discr}))-1]\right) \\  &=\log\left(1+ \sum^{N}_{i=1}w_{i}[\exp(r_{i})-1]\right)\\  &=\log\left(\sum^{N}_{i=1}w_{i}\exp(r_{i})\right) \end{aligned}

At the end of the day, we will always be able to achieve linearity with respect to only one aggregation. Either we preserve our mathematical tractability for the multiple assets case only (with discrete returns), or we succeed with aggregation over time (with logarithmic returns). Simultaneously fixing both is impossible!

Given that modern risk management should always entail both dimensions of aggregation, what does this tell us? Does striving for analytical solutions become hopeless? Well, not yet! Let’s first think about what further tricks we got in our repertoire.

Hmm, let’s see… We have a non-linear function, but want to have a linear one instead. Any bells ringing?

1.3 Linear approximations

Yes, linearization is what we’re up for now! As you probably recall from Analysis I: any differentiable function can be approximated in the neighborhood of a given point x_{0} through it’s Taylor expansion. And, since we want to have a linear approximation without any polynomial terms, we can cut off any Taylor expansion terms of higher order immediately.

\displaystyle \begin{aligned} f(x)&\approx \sum_{n=0}^{\infty}\frac{f^{(n)}(x_{0})}{n!}(x-x_{0})^{n}\\ &\approx f(x_{0}) + f'(x_{0})(x-x_{0}) \end{aligned}

Hence, what we are working with is a Taylor expansion of order 1. Using this concept, the target now lies in linearly approximating logarithmic portfolio returns:

\displaystyle  r_{P}=\log\left(\sum^{N}_{i=1}w_{i}\exp(r_{i})\right)

As a first step, we will only deal with linearization of the innermost function: \exp(r_{i}). As linearization shall be done through Taylor expansion, we now only need to think about the “anchor point”: a point x_{0} such that all daily returns lie in its vicinity. Since daily logarithmic returns are generally of small size, with mean value approximately equal to zero, we will approximate \exp(r_{i}) for values of r_{i} close to x_{0}=0:

\displaystyle \begin{aligned} \exp(r_{i})&\approx\exp(x_{0}) + (\exp(x_{0}))'(r_{i}-x_{0})\\ &=\exp(0) + \exp(0)(r_{i} - 0) \\ &=1 + r_{i} \end{aligned}

Using this result, the portfolio formula becomes:

\displaystyle \begin{aligned} r_{P}&=\log\left(\sum^{N}_{i=1}w_{i}\exp(r_{i})\right)\\ &\approx\log\left(\sum_{i=1}^{N}w_{i}(r_{i}+1)\right)\\ \end{aligned}

Hence, we still need to get rid of the logarithm. Thereby, we follow a similar logic. With r_{i}\approx 0 for daily logarithmic returns, the argument to the logarithm approximately equals 1:

\displaystyle \begin{aligned} \sum_{i=1}^{N}w_{i}(r_{i}+1)&=1+\sum_{i=1}^{N}w_{i}r_{i}\\ &\approx 1 + \sum_{i=1}^{N}w_{i}0 \\ &= 1 \end{aligned}

The logarithm thus needs to be approximated in a vicinity of x_{0}=1. Using the Taylor expansion of order 1 we get:

\displaystyle \begin{aligned} \log(x)&\approx \log(x_{0}) + \frac{1}{x_{0}}(x-x_{0})\\ &=\log(1) + 1 (x-1)\\ &=x-1 \end{aligned}

Concluding the results, using both approximations together, we get

\displaystyle \begin{aligned} \mathbf{r_{P}}&\mathbf{=}\log\left(\sum^{N}_{i=1}w_{i}\exp(r_{i})\right)\\ &\approx\log\left(\sum_{i=1}^{N}w_{i}(r_{i}+1)\right)\\ &\approx\sum_{i=1}^{N}w_{i}(r_{i}+1)-1\\ &\mathbf{=\sum_{i=1}^{N}w_{i}r_{i}} \end{aligned}

This result is basically what every analytically tractable model builds on. However, keep in mind: the formula is just an approximation – close to the real value, but never right there …

2 Additional requirements

Now we have linear aggregation for logarithmic returns both over multiple assets and multiple periods. However, in order to get an analytically tractable model, linear aggregation functions are only one part of the deal! Additional requirements have to be posed on the distribution of risk factors, with requirements differing, depending on the actual risk measure used. We thereby distinguish between two broad classes of risk measures: moments based risk measures like variance / standard deviation, opposed to distribution based risk measures like Value-at-Risk and Expected Shortfall.

So what does it mean, variance is part of the class of distribution based risk measures? That variance is not depending on the underlying distribution? Of course, that’s not true, as variance clearly can be written as a property of the distribution function F:

\displaystyle \mathbb{V}[X]=\int (x-\mathbb{E}[X])^{2} dF(x)

Nevertheless, there is a huge difference between variance and, for example, VaR. But this difference only becomes apparent whenever X is not just considered as a variable by itself, but as a linear aggregation of other random variables. For both expectation and variance of aggregated random variables, there exist formulas that allow computation without knowledge of the underlying distribution, only by knowing the respective moments of the individual components!

For example, for a sum of random variables we get

\displaystyle  \mathbb{E}\left[\sum_{i=1}^{n}X_{i}\right]=\sum^{n}_{i=1}\mathbb{E}[X_{i}]

for expectations, and

\displaystyle \begin{aligned} \mathbb{V} \left[\sum_{i=1}^{n}X_{i} \right]&= \sum_{i=1}^{n}\mathbb{V}[X_{i}] + 2\sum_{1\leq i< n}\sum_{i < j\leq n}{Cov(X_{i},X_{j})} \\ &=\sum_{i=1}^{n}\mathbb{V}[X_{i}] + 2\sum_{1\leq i< n}\sum_{i < j\leq n}{\sigma_{i}\sigma_{j}\rho_{ij}} \end{aligned}

for variances. Hence, as long as we know the first two moments of the individual components, we do not need to specify any further properties of the distributions involved. Using the additional property for linear transformations of univariate random variables,

\displaystyle  \mathbb{V}[aX+b]=a^{2} \mathbb{V}[X],

we can even derive a formula for portfolio variance this way:

\displaystyle \begin{aligned} \mathbb{V} \left[\sum_{i=1}^{n}w_{i}r_{i} \right] &=\sum_{i=1}^{n}w_{i}^{2}\mathbb{V}[r_{i}] + 2\sum_{1\leq i< n}\sum_{i < j\leq n}{w_{i}w_{j}\sigma_{i}\sigma_{j}\rho_{ij}} \end{aligned}

Let’s pause for moment, in order to think about the implications of this result. Given that we measure portfolio risk in terms of variance, this is an extraordinary powerful result. The only thing that we need to do now, is estimating the first two moments for the individual portfolio components, and we are done with risk calculations in a second! No need for years of studying probability theory, statistics and the like – once you know the meaning of \Sigma, you’re basically done: summing up the individual components – finished! Fantastic result, isn’t it?

Well, usually I’m one of the “half-full glass” guys, focusing on positive aspects. But in this case, please, first take a deeper look at the negative aspects of the formula, too, before you turn your back on probability theory prematurely. Of course, it’s very appealing to have a simple and fast formula, which holds for a most general case of distributions. However, if it is independent of the exact specification of distributions, then shouldn’t it be a rather uninformative way of characterizing distributions, and thereby risks? Well, that’s exactly the point.

Try to think about risk measures like this: in some way, the information incorporated in a continuous distribution function is infinite. That is, for each of infinitely many possible realizations, we know the exact likelihood of occurrence. Now, whenever you compress this infinite amount of information into just one single number, like variance, VaR or ES, you will also automatically reduce the informational content. This most easily can be seen with the following example: imagine two different distributions, but both with equal variance. As long as you have full information about the distributions, you can distinguish between them. However, after reducing them to the value of their variance, they suddenly become indistinguishable. Hence, on grounds of variance, you have lost some of the information that did enable you to tell them apart. An argumentation that also holds for VaR and ES.

Or, to express this argument on an even more fundamental level: the loss of information through reduction to a single number is in some way similar to rounding. Here, also, two different numbers, for example 3.156 and 3.2453, are mapped to the same result after rounding: 3. And, given only the rounded results, it is impossible to distinguish between both numbers anymore.

Now, staying with the metaphor of rounding, you need to think of variance and VaR as two different rounding mechanisms, with different properties also. Speaking more precisely: variance is not a coherent risk measure (a definition that we will not further tackle at this point), and hence fails to fulfill some desirable properties. An intuitive way of pointing out the deficiencies of variance is through comparison to rounding again. When rounding real numbers, we know one very important thing: rounding already each individual component of a sum will usually be less precise than rounding the end result of the sum only. That is, through aggregating rounded components, we will also aggregate the small errors we make at each step, thereby getting less precise. Remarkably, when looking at the portfolio variance formula, this logic does not hold for the “rounding mechanism” variance. Here, we will get the same result both ways. Whether we aggregate the individual distributions and only map the overall distribution to the value of its variance, or whether we map already each individual component to its variance and sum up those individual variances will be the same! Something that does not generally hold for more sophisticated measures like VaR and ES.

So what’s the explanation for that? Is variance suddenly capable of suspending the natural laws of rounding? Well, kind of – but only due to the fact that the concept of variance is just too simple in its structure. In a way, once you apply information reduction through variance at the portfolio level, you already lose so much information, that you could have reduced the information already at each individual step just as well. Or, in other words, variance simply is not an adequate risk measure.

This can also be seen very easily with the following example. Let’s assume, that your portfolio follows the loss distribution L. Calculating the risk \mathfrak{R} of your portfolio through variance, you get \mathfrak{R}=\mathbb{V}(L). Now assume that you add a riskless component to your portfolio, that will pay you a for sure (this will reduce your losses by amount a). Hence, your portfolio realization will now always be better than for the first case, but still your risk will be

\displaystyle \begin{aligned} \mathfrak{R}&= \mathbb{V}(L-a) \\  &=\mathbb{V}(L) \end{aligned}

Even in this absolutely simple case, measuring risk through variance will fail to recognize the difference between both portfolios. Shifting the mean of the portfolio remains undetected.

Concluding, what we get with moments based risk measures like variance in combination with linear aggregation functions is an analytically tractable and very simple formula for portfolio risk. For the portfolio case, we are not even required to specify a complete distribution function at any point! However, this overly simplistic structure, which even allows information reduction before aggregation, just leads to a “rounding mechanism” with inadequate properties as a risk measure. In contrast to that, VaR and ES can not be easily expressed as simple aggregation of individual components – a more complicated structure that will also lead to better suited risk measures.

3 Risk measures with distribution required

In the last part, we have derived – although more intuitively than in a rigorous mathematical framework – some deficiencies of variance as a risk measure. The keyword for this is coherence, which is a generic term for a number of desired properties that a good risk measure should fulfill. Variance, lacking some of these properties, hence is not a coherent risk measure.

So far, the deficiencies of variance are commonly acknowledged in academics and practice. The next step hence is to search an analytical solution for a better suited risk measure. Thereby, for simplicity, we will mostly focus on VaR in this part – a risk measure better suited than variance, although it still lacks some desirable properties. Opposed to variance, without any further assumptions, there generally do not exist any formulas that express VaR as a function of moments or values of risk measures for the individual components. Hence, in order to derive VaR, we generally need to derive the portfolio distribution first.

Hence, the question now is: how can we end up with an analytical solution to the overall portfolio distribution? Only then, we could hope to be able to derive the inverse cumulative distribution function that is required to determine the quantile. Again, arguments in this chapter will not meet rigorous mathematical standards. The focus will be more on understanding which building blocks of the model could prevent an analytical solution. And, as we will see, individual model components need to be set very considerately, in order to not break the analytical solution.

So let’s get back to the main task: determination of the return distribution. As we have already seen, our overall return distribution results from an aggregation function, which is applied to multiple random variables. And, through linearization, we already managed to reduce the function to the easiest possible case. Still, however, linear aggregation does involve integration, and hence can be solved analytically only for special cases. This applies all the more with increasing dimension of the problem.

To incorporate at least some mathematical foundation, let’s now take a look at the formulas for distributions of only two linearly aggregated random variables. For a nice summary of some basic formulas on transformations of random variables, take a look at this page of the University of Alabama in Huntsville.

Let (X_{1},X_{2}) be a two-dimensional random vector, with joint density function given by g(x_{1},x_{2}). Then, random variable Z=X_{1}+X_{2} follows distribution p(z) given by

\displaystyle p(z)=\int h(y,z-y)dy

Or, for the case of independence,

\displaystyle p(z)= \int f_{1}(y)f_{2}(z-y)dy,

where f_{1} and f_{2} denote the marginal distributions.

Hence, for any additional component of the portfolio, we will need to evaluate this integral over the joint distribution of the previously existing portfolio distribution together with the new component. And even then, we only have the probability density function, which we still need to integrate and invert in order to analytically obtain the portfolio quantile. In other words, a very complex procedure, where we can only hope to find an analytical solution in some real special cases! These special cases – to my best knowledge – mainly comprise cases that involve either normally distributed random variables or alpha-stable distributions. So, what’s so special about these distributions, that analytical solutions exist?

The crucial point for theses cases is that these distributions are closed under summation. That means, if you add up two components of the same parametric distribution family, then you will end with a distribution of the same family up to rescaling again. For example, summing up two independent normally distributed random variables will give you a normally distributed random variable again. However – and now I really need your full attention – this closure under linear aggregation does not hold for arbitrary dependence structures! In other words: the sum of two normally distributed random variables is not necessarily normally distributed again!

At this point – I know – you definitely will have yours doubts. I mean, didn’t you hear the exact opposite dozens of times so far? Well, then let me try to elaborate a little bit further on this point. When people usually refer to the closure of the normal distribution, they frequently omit one very crucial word of the theorem: it only holds for jointly normally distributed random variables. With two random variables X_{1} and X_{2}, it is not sufficient to know, that each individual random variable is normally distributed. The way that they are combined into a bivariate joint distribution is crucial as well. Generally speaking, there are infinitely many ways how you could combine two normally distributed univariate random variables into a joint distribution. And, only for rather well-behaved and linear ways of combinations, the joint distribution will be a bivariate normal distribution. In other words, you can very easily combine them into a bivariate distribution, such that (X_{1},X_{2}) is not jointly normally distributed. And then, we would have a case, where individual components each follow a univariate normal distribution, but the sum of these components is not normally distributed nevertheless.

I know, this might sound a little bit abstract so far. The problem is just, that we quite often are not used to dealing with the subtleties of constructions of multi-variate distributions in greater detail. That is, we tend to think, that the multi-variate normal distribution is the only possible way of extending univariate normal distributions to higher dimensions. However, there is a canonical way of combining individual random variables into a joint distribution. We only need something to glue the individual parts together: a copula function. I do not want to get into further details on copulas at this point. If you never heard of the concept so far, just try to be aware of the fact, that there are infinitely many ways to glue individual random variables together. Something that, mathematically, is done through copula functions. And these copula functions enable combinations of random variables in more complex ways through non-linear dependence structures.

So much to copulas – now, let’s get back to our task of finding the portfolio distribution. As we have seen, even in the case of linear aggregation, the portfolio distribution has to be obtained through solving multi-dimensional integrals. These multi-dimensional integrals depend on the joint distribution of the individual portfolio components, so that we can not hope to find an analytical solution to the problem for a very general case. And, the fact that the complete joint distribution is involved, in turn implies that there are two ways to break an analytical solution: either the marginal distributions or the dependence structure could be not well-behaved. Only for very special cases, the multi-dimensional integral can be solved through recursively adding components that are closed under linear aggregation – the most well-known such case for financial data being jointly normally distributed random variables.

Although we now have seen that analytical solutions only exist for some special cases of mulit-variate distributions, we still didn’t point out the exact consequences for modeling financial risks so far. In other words, every model is only an approximation to reality. The only question is, how good is this approximation, given that we rely on special cases with analytical solution? And, even if there are better fitting models out there, we still have to find the right balance between precision and speed: maybe an appropriate model is useless, if it takes days to come up with an estimate of our risk. However, these are questions that everybody needs to answer for himself. The final decision of whether to strive for an analytically solvable model hence should always be made against the background of one’s own preferences.

Nevertheless, at this point I want to allow myself the freedom of pointing out my own current thoughts regarding this trade-off. It seems just fair, since I most likely already failed to preserve full objectivity on the topic anyways, and I do not want to secretly shape anyone’s opinions through the backdoor.

So, first, I am completely convinced, that there are much more accurate models than the currently known ones with analytical solution. They just rely on too many model restrictions required for tractability. Hence, there a lot of models that are better suited to fit stylized facts commonly observable at financial markets, just through their larger flexibility.

Second, in my opinion it is also highly unlikely that we will be able to find analytical solutions to models with larger flexibility anytime in the future. Call me a pessimist, but I just don’t see any promising ways to significantly enlarge the scope of multidimensional integration rules. Nevertheless, I must admit that integration is something I never would list as one of my top skills – irrespective of my degree in mathematics. Hence, maybe we just need to wait for the birth of the next Henri Léon Lebesgue…

Third, although the linearization to logarithmic portfolio returns is commonly used in academics and practice, I still haven’t seen any large study on the appropriateness of this approximation. Maybe we already lose too much precision at this point. This, in turn, would make any quest for analytical tractability almost impossible, since non-linearity will substantially complicate integration.

Last but not least: the trade-off between precision and speed. This is by far the most ambiguous point for me. Quite logically, the trade-off between precision and speed should always be resolved for each application individually. If you are forced to estimate risk on a daily basis, then any models with processing time of more than a day are definitely worthless. However, I did settle myself on something like a minimum precision standard so far. That means, I do not believe that models with solely analytically tractable parts will satisfy our needs in risk assessment on the long run. Given that one has to deal with a large variety of data, what we certainly will have to do even more and more, it becomes highly likely that some aspects of the real risk distribution do fit into an analytically tractable framework only very badly. Hence, we should be better off with a framework that allows some individual components to be incorporated, for example, through Monte Carlo simulation. Of course, with increasing number of risk factors, the degree of approximations that we use should also be increasing as well. Thereby, however, I think it is very crucial that we know at each point exactly what approximations we are relying on. Only if we understand our model deficiencies, only if we know their existence, it is that we are able to find a better solution sometime in the future. However, this awareness of our model deficiencies is something I doubt that we have enough. Hence, from time to time, I think we better should ask ourselves: are we really matching models to reality, or are we just trying to match reality to our models that we are so fond of?

At this point, I think, it only remains for me to say a few words on the lacking formality that was used during the derivation of most results. Coming from mathematical background, this is something I’m not used to myself. However, the reasons for lacking proofs are quite simple. First, it is probably impossible to find a formal proof that some general multidimensional integrals can not be solved analytically. Probably, somewhere out there a solution exists. However, as long as we don’t find it, its existence does not help us any further at this point. And second, as you already could see during my personal remarks on analytically tractable models, I’d rather put my bets on a simulation based approach to risk modeling. Hence, I’m not too much into, for example, alpha-stable distributions or other promising parametric distributions that exist out there. So if there are any experts out there, let me know if something I wrote needs any additional remarks or corrections. Thanks.

Advertisements

Posted on 2013/07/27, in financial econometrics and tagged , , . Bookmark the permalink. 2 Comments.

  1. Fabian Spanhel

    Great article!

  2. Fabian Spanhel

    Just a few remarks… I am not sure whether sums from a jointly stable random vector are stable distributed. It might be that this only holds for iid stable random variables. And maybe it’s more promising to get analytical solutions or approximations for the characteristic function of the sum of RV first and then derive the cdf than trying to get the cdf from computing the pdf of the sum? Nevertheless, I also don’t expect that there will be a model that is true to the stylized facts of the data and admits a closed-form expression for VaR etc.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: