## Econometric Methods

Our database contains daily (close to close) financial returns

$$r_1,r_2,...,r_{T}$$

and a corresponding sequence of daily realised measures

$$RM_1,RM_2,...,RM_{T}$$

Realised measures are theoretically sound high frequency, nonparametric based estimators of the variation of the price path of an asset during the times at which the asset trades frequently on an exchange. Realised measures ignore the variation of prices overnight and sometimes the variation in the first few minutes of the trading day when recorded prices may contain large errors. The background to realised measures can be found in the survey articles by Andersen, Bollerslev and Diebold (2008) and Barndorff-Nielsen and Shephard (2007).

We way the statistics reported in the library are generated is spelt out in Shephard and Sheppard (2009). Here we give a brief summary.

The simplest realised measure is realised variance

$$RM_{t}= \sum x_{j,t}^2$$

where

$$x_{j,t}=X_{t_{j,t}}-X_{t_{j-1,t}}$$

and $$t_{j,t}$$ are the times of trades or quotes (or a subset of them) on the t-th day. The theoretical justification of this measure is that if prices are observed without noise then as $$min_{j}|t_{j,t}-t_{j-1,t}| \downarrow 0$$ it consistently estimates the quadratic variation of the price process on the $$t$$-th day. It was formalised econometrically by Andersen, Bollerslev, Diebold and Labys (2001) and Barndorff-Nielsen and Shephard (2002).

In practice market microstructure noise plays an important part and the above authors use 1-5 minute return data or a subset of trades or quotes (e.g. every 15th trade) to mitigate the effect of the noise. Hansen and Lunde (2006) systematically study the impact of noise on realised variance. If a subset of the data is used with the realised variance, then it is possible to average across many such estimators each using different subsets. This is called subsampling. When we report RV estimators we always subsample them to the maximum degree possible from the data as this averaging is always theoretically beneficial especially in the presence of modest amounts of noise.

Three classes of estimators which are somewhat robust to noise have been suggested in the literature: preaveraging (Jacod, Li, Mykland, Podolskij and Vetter(2007)), multiscale Zhang (2007) and Zhang, Mykland and Ait-Sahalia (2005)) and realised kernel (Barndorff-Nielsen, Hansen, Lunde and Shephard (2008)).

Here we focus on the realised kernel in the case where we use a Parzen weight function. It has the familiar form of a HAC type estimator (except there is no adjustment for mean and the sums are not scaled by their sample size)

$$RM_{t}=\sum_{h=-H}^{H} k(h/(H+1))\gamma_{h}$$

where

$$\gamma_{h}=\sum_{j=|h|+1}^n x_{j,t}x_{j-|h|,t}$$

and $$k(x)$$ is the Parzen kernel function. It is necessary for $$H$$ to increase with the sample size in order to consistently estimate the increments of quadratic variation in the presence of noise. We follow precisely the bandwidth choice of $$H$$ spelt out in Barndorff-Nielsen, Hansen, Lunde and Shephard (2009), to which we refer the reader for details. This realised kernel is guaranteed to be non-negative, which is quite important as some of our time series methods rely on this property.