Niccolò's Blog: 10/11

October 30, 2011

An analysis based on the TAQ Database: 10 stocks traded on 10 venues

The aim of this project is the cross-sectional analysis of data collected from different exchange venues for different stocks, obtained from the Trades and Quotes (TAQ) Database.

The data set includes information for trades, BBO and quotes of 10 stocks (BAC, CCU, DE, EXXI, KLAC, KO, MIPS, MS, NCR, SSCC) gathered from 10 venues (NYSE Amex, Boston, Cincinnati, NASD ADF and TRF, Chicago, NYSE, Pacific (NYSE Arca), NASD, X Philadelphia, CBOE), and reported to the Consolidated Quote System.

The original TAQ data set can be found here:

http://www.2shared.com/file/o4aX2Wvn/20110201_MS_trades.html
http://www.2shared.com/file/E3RILlc5/bbo.html
http://www.2shared.com/file/h8ON3AJT/quotes.html
http://www.2shared.com/file/nUridLRq/trades.html

The MATLAB code which provides the results discussed below is accessible here:

TAQAnalysis.m

The code is self-sufficient, in the sense that you can just run it and you will obtain all the results presented, as long as you save the 20110201_MS_trades.csv file in the same folder as all the other trades files and change the paths for the data files in the code file accordingly.

(a)

First, the Market Share (MS) of each stock is computed as the share of the Market Cap of one stock on the Total Market Cap. Denoting price with P and size with S, the Market Share for stock i is given by

where N = 10. The following table and figure reproduce the results for each stock.

Table 1 - Market Share for the ten stocks

Stock	MS (%)
BAC	61.45%
CCU	0.04%
DE	11.58%
EXXI	0.80%
KLAC	2.73%
KO	10.30%
MIPS	0.87%
MS	7.09%
NCR	0.52%
SSCC	4.60%

Figure 1 - Market Share for the ten stocks

From the results, it is possible to conclude that BAC plays the major role as for the share of the total Market Cap among the considered stocks. Also, DE and KO withhold a Market Share of about 10%.

(b)

Second, the percentages of sub-penny transactions occurring for each stock are reported in the table below.

Table 2 - Percentage of sub-penny transactions for the ten stocks

Stock	PSPE (%)
BAC	7.56%
CCU	4.58%
DE	4.33%
EXXI	2.31%
KLAC	2.87%
KO	3.91%
MIPS	6.72%
MS	3.06%
NCR	3.93%
SSCC	1.08%

Figure 2 - Percentage of sub-penny transactions for the ten stocks

(c)

In order to assess how the share of D transactions (occurring on the NASD ADF and TRF markets) (SD) depends on (i) average bid-ask spread (BAS), (ii) realized mid-quote volatility (MQV), (iii) average price (AP), (iv) market cap (MC), and (v) trading volume (TV), the following regression was carried out

and the results are provided in the following tables.

Table 3 - OLS estimation results for the single coefficients

	OLS	t	p-value
Constant	4.77E-01	1.54E+01	5.24E-05
BAS	-1.41E+00	-2.10E+00	5.18E-02
MQV	3.45E-04	1.24E+00	1.41E-01
AP	-2.74E-03	-3.15E+00	1.72E-02
MC	-6.69E-11	-1.63E+00	8.97E-02
TV	1.47E-09	2.42E+00	3.62E-02

Table 4 - OLS estimation results for the regression

adj R²	F	p-value
2.45E-01	1.58E+00	3.38E-01

From the analysis of the results, it is possible to conclude that, at a 5% significance level, average price, market cap and trading volume (together with the constant term) significantly affect the share of D transactions, whereas average bid-ask spread would pass the 10% significance test; instead, realized mid-quote volatility do not significantly explain the variations occurring in SD. Nevertheless, the adjusted coefficient of determination, which accounts for the few degrees of freedom of the model (we have but 10 data points, and we are using 5 explanatory variables), implies that the variations in the regressors explain roughly 25% of the variations in the dependent variable. Finally, through the F-test, it follows that the regression as a whole is not statistically significant at the 5% significant value.

(d)

Now, we want to assess how the percentage of sub-penny executions (PSPE) is affected by the same independent variables as in part (c); thus, the following regression was carried out:

and the results are provided in the following tables.

Table 5 - OLS estimation results for the single coefficients

	OLS	t	p-value
Constant	3.19E-02	4.70E+00	4.66E-03
BAS	8.70E-02	5.94E-01	2.92E-01
MQV	3.16E-05	5.19E-01	3.16E-01
AP	2.42E-04	1.28E+00	1.35E-01
MC	-6.97E-11	-7.75E+00	7.48E-04
TV	1.25E-09	9.43E+00	3.52E-04

Table 6 - OLS estimation results for the regression

adj R²	F	p-value
-2.14E-01	6.83E-01	6.62E-01

From the analysis of the results, it is possible to conclude that, at a 5% significance level, only market cap and trading volume (together with the constant term) significantly affect the share of D transactions, whereas average bid-ask spread, realized mid-quote volatility and average price do not significantly explain the variations occurring in SD. Moreover, the adjusted coefficient of determination now assumes a negative value: it is telltale indeed as for the low (almost nil) explanatory power of the regression. Finally, through the F-test, it follows that the regression as a whole is not statistically significant at the 5% significant value (in particular, it is even less significant than the regression in part (c)).

October 20, 2011

Analysis of Non-Displayed Liquidity transactions based on one-day time series data set (Nasdaq ITCH feed)

The aim of this project is to find and test potential variables that explain the hidden order flows, i.e. order flows executed against non-displayed liquidity, that happen in a one-day time series through different variables. The data set includes Cisco ITCH trades occurring throughout one day, from 9:30am to 4:00pm.

The GAUSS code which determines the discussed results is provided below:

code
data

Hidden order flows (h) are computed as the proportion of order flows of non-displayed liquidity to the total volume of transactions. Transactions can be of four different kinds:

1. executed order flows (E);

2. order flows executed at a price different than the one originally present in the order book (C);

3. cross trades, coming from Nasdaq matching sessions, generally at the beginning and end of the day (Q);

4. trades against non-displayed liquidity (P).

The proportional hidden volume of transactions is given by

h = P

E + C + P

Hence, cross trades are disregarded in order to find covariants of hidden liquidity order flows.

After selecting the relevant data divided in 5-minute intervals, and computing the proportional hidden volume, it is possible to notice how h varies across time. In particular, two peaks can be observed at the beginning at and at the end of the end, signaling relevant order flows of hidden liquidity occurring when trades open and close. Yet, a high volatility of h is present throughout the day, as the following figure displays.

Then, in order to explain and predict the value of h, it is possible to regress the proportional hidden order flows on some relevant variables, such as one-period lagged hidden order flows, signed order flow, total volume and squared realized volatility of trade price, i.e.

h_t = β₀ + β₁ h_t_-1 + β₂ sof_t + β₃ tv_t + β₄ r_t² + ε_t

The following table reports the ols estimates for the coefficients (confidence intervals in brackets), and the t-Student test statistics.

	β (95% CI)	t
constant	1.4919E-01 (1.2190E-03)	2.0195E+00
h_t_-1	1.4394E-01 (2.2210E-02)	4.5647E-01
sof_t	-1.8165E-05 (2.9011E-09)	-1.5939E-01
tv_t	-9.1790E-08 (9.3422E-15)	-4.4883E-01
r_t²	-4.2962E-06 (7.1418E-12)	-7.5977E-01

All coefficients are hence not significantly different from zero at a 5% significance level, except for the constant term, which is significant at a 5% significance level but not at a 1% significance level. In order to assess the degree of explanatory power of the considered variables as determinants of the proportional hidden volume, it is possible to observe the value of the coefficient of determination:

R₁² = 1.4191E-01

This result implies that a very low fraction of the variability observed in the dependent variable (proportional hidden order flows) is explained by the estimated variability in the right hand side variables.

Now, let us introduce an additional variable among the regressors. In particular, the mean price (mp) in each 5-minute interval is considered as a potentially good indicator of order flows with non-displayed liquidity. The ols estimates of the new model, given by

h_t = β₀ + β₁ h_t_-1 + β₂ sof_t + β₃ tv_t + β₄ r_t² + β₅ mp_t + ε_t

are reported in the following table.

	β (95% CI)	t
constant	3.1046E-02 (1.9108E+01)	3.3567E-03
h_t_-1	1.4399E-01 (2.2214E-02)	4.5660E-01
sof_t	-1.7875E-05 (3.0159E-09)	-1.5383E-01
tv_t	-9.1295E-08 (9.6787E-15)	-4.3857E-01
r_t²	-4.3317E-06 (8.8718E-12)	-6.8732E-01
mp_t	6.9148E-07 (6.5454E-10)	1.2774E-02

Unfortunately, all coefficients, including the constant and the newly added regressor, are now not significantly different from zero. This decrease in the explanatory power is not reflected by the coefficient of determination for the second model which is slightly higher than R₁²:

R₂² = 1.4192E-01

Yet, correcting for the difference in the number of regressors in the two models, obtaining and comparing the two adjusted coefficients of determination,

adjR₁² = 9.4237E-02

adjR₂² = 8.1496E-02

it is apparent how such a loss in the significance of the estimated coefficients occurred in the second model is reflected in the loss of explanatory power if we include the mean price as a covariant of the proportional hidden order flows.