Maximum Likelihood Estimations

We will see below some of the most probable occurrence distributions to be estimated by MLE.

  • Binomial distribution
  • Poisson distribution
  • Exponential distribution

First we need to create likelihood function, then take the log likelihood function. With the MLE estimation, taking the first derivative will yield score vector which can be equated to zero to determine the expected values of the unknown parameters. Second, to find the lower variance-covariance hessian matrix, take the second derivative and form the information matrix ie., expected values of the second order derivatives and inverse of the information matrix should get the cramer-rao variance matrix. From this, we can deduce the asymptotic property of the unknown parameters and hence satisfy MLE properties.

Lemma proof’s for MLE

E[L(\theta;U|V)] \leq E[L(\hat{\theta};U|V)]

Let’s suppose, we have function

y_t = x_t \beta + \epsilon_t

where, \epsilon_t ~ N(0, \sigma_t^2)

\sigma_t^2 = \omega + \alpha_0 \epsilon_{t-1}^2 + \alpha_1 \epsilon_{t-2}^2

Now, let’s see how to derive the likelihood function for this function, given unknown parameters \mu' = \{\theta, \beta, \omega, \alpha_0, \alpha_1 \}

L(Y_t|X_t, I_t;\mu') = \frac{1}{\sqrt{2 \Pi \sigma_t}} e ^ { \frac{y_t - x_t' \beta}{\sigma_t^2}}

Cornish Fisher Approximations

Sometimes, when we are not sure about the exact distribution of the underlying random variable (r.v) X but we know the first four moments, then we can approximate the quantile as below,

q_x(\alpha) = z_{\alpha} + \sigma^3 \frac{z_{\alpha}^2 -1}{6} + (k-3) \frac{z_{\alpha}^3 - 3 z_{\alpha}}{24}

The close the distribution is to standard normal, better approximations. Note that when substituting standard normal make sure to use the negative sign for the left quantile. So, for instance Z_{\alpha \leq 0.01} = -2.33 .

EWMA, GARCH Volatilities

Let’s see the few of the volatility models based on past innovations (meaning past observations). There are few models such as moving average which gives equal weights to past observations but we will see most commonly used models in the industry, one such is EWMA meaning, Exponential Weighted Moving Average.

The general formula is first to calculate the weights,

p = \frac{1 - \lambda}{\lambda(1 - \lambda^M)} \lambda^m

where, m = 1, 2.....M

Then applying to predicting volatility,

\sigma_{t+1}  = \sqrt{p^m R^2_{t-m+1}}

Supposing, M is large, we can formulate more generic solution, termed RiskMetrics formerly modelled by JP Morgan using the lambda values close to 1 for volatility persistence and market reactivity.

\sigma_{t+1} = \lambda \sigma_t + (1-\lambda) R_t

Let’s now see for the another volatility model, most widely used and industry adopted solution, GARCH, Generalised AutoRegressive (meaning current return based on past lagged observations) Conditional heteroskedasticity (meaning non constant / non-homoskedastic variance across observations).

The formula to model for GARCH volatility is given below,

\sigma_{t+1} = \omega + \alpha R^2_t + \beta \sigma_t

where, \hat{\sigma}^2 = \frac{\omega}{1 - (\alpha + \beta)}

Estimating GARCH

Estimating GARCH parameters is done in 2 step process,

Estimate the long run volatility,

\hat{\sigma}^2 = \frac{1}{M} \sum R^2_m

Then, estimate GARCH,

E[R_{t+1}^2 | I_t] = \sigma_{t+1}^2 = \lambda \hat{\sigma}^2 + \alpha \sigma^2_t + \beta R^2_t

Now the parameters can be estimated using Maximum Likelihood estimation, given the past innovations are i.i.d.

GARCH Forecasting

With GARCH, you can forecast the volatility using generalised formula,

E[\sigma^2_{t+k}|I_t] = \hat{\sigma^2} +(\alpha + \beta)^{k-1} \sigma^2_{t+1}

Where, \hat{\sigma^2} = \frac{\omega}{1 - (\alpha + \beta)}

Why is volatility skewed?

Most of the time we see the returns are mean reverted meaning that they trend around mean average whereas the volatility are highly skewed. The fact is that when returns are high, we don’t rush to invest in the underlying’s but when the reverse happens, i.e., returns are low, everyone rush to disinvest and move the assets to the safe treasury bonds or cash. This relationship has ripple effect which assets are correlated to the systematic effect and event which can have influence to affect the general market will have ripple effects on other assets by CAPM though the idiosyncratic effects can be distributed evenly. Hence volatility modelling is huge subject involves studying the pattern of past returns and predicting the future returns. There are number of existing models to name a few, EWMA, ARCH, ARMA, GARCH which can predict based on the past historical returns and forecast the future volatility and parametric models such as Heston, Hagen’s Stochastic Alpha, Beta, Rho (SABR) which models based on the market traded derivatives and uses the market perception of future volatility. We will see in future blogs about each of these underlying models and hence will discuss in depth about the pros and cons of each approach.