Quantcast aiQUANT » 2007 » December

Archive for December, 2007

Rescaled Range Analysis

Fractal Analysis, Statistics 5 Comments »

In my previous post I mentioned the Hurst Exponent as a measure of the predictability of a time series. I drew a data flow diagram showing the Rescaled Range algorithm, however, thanks to foquant, it has come to our attention that the paper I got the description of the algorithm from does not mention clearly that the algorithm needs to iterate for different sub-periods. foquant attempted to recreate the results in excel and found dissimilar values to the ones shown in the paper [Rasheed et. al.]. Doing a quick google search I came across this excellent presentation which confirms what really needs to be done in order to calculate the Hurst Exponent.

I have re-done the data flow diagram for the algorithm and the steps that need to be performed are shown below:

hurstAlgo2prop.png

I have provided an example below which steps through each stage of the algorithm. Hopefully this would make the algorithm easier to understand.

Algorithm Steps

Here I use a returns time series X = X1, X2, …, X1023, X1024 as an example. We shall operate under log2 domain for regression. Matrix dimensions are shown in square brackets [m x n], where m is the number of rows and n is the number of columns.

1. Calculate number of data points

We do this step so that we have a time series with an integer number of sub-periods over which to calculate the rescaled range.

dataPoints = floor(log2(length(data)));

In this example dataPoints = 10. Hence we have 10 sets of sub-periods over which to iterate the rescaled range algorithm. I shall explain what sub-periods are below.

2. Calculate sub-period boundaries

Since we have 10 data points all we need to do to calculate sub-period boundaries is to raise the base to the power of each of the data point numbers. So in this example the sub-period boundaries are 2^(1 through to 10) = 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024. These points are essentially the t values on the regression graph. So taking the log of these values reverses the procedure and gives us values 1 through to 10 which shall be the x-axis points on the regression graph.

3. Select sub-period boundary

From this point onwards we iterate through each of the sub-period boundaries. In this case we shall iterate 10 times. Select only 1 sub-period at a time.

4. Calculate the sub-period

Having selected a sub-period boundary we now calculate the sub-period.

subPeriod=floor(N/selectedBoundary);

Here N = 1024 and the selectedBoundary will be from step 2. So if selectedBoundary = 2 then subPeriod = 512. If selectedBoundary = 64 then subPeriod = 16. The table below shows this.

subPers.png

5. Calculate sub-period matrix.

Having calculated the sub-period we now evaluate the sub-period matrix. This is hard to explain using the entire time series X = X1, X2, …, X1023, X1024. But the illustration below tries to explain this using a twelve element time series.

subPeriod.png

An n sub-period matrix has n columns. So in our first iteration we will create a 512 sub-period matrix. In the second we create a 256 sub-period matrix. Notice that the table shown in step 4 is basically the dimensions of the sub-period matrix, with selectedBoundary being the number of rows and subPeriod being the number of columns. Rows are horizontal, columns are vertical. So sub-period matrix is of dimension [selectedBoundary x subPeriod].

6. Calculate mean of each sub period

For the matrix created in step 5, we calculate the mean of each column i.e. mean of each sub-period. This gives is a matrix of size [1 x subPeriod].

7. Remove mean from each sub-period

Subtract the calculated mean for each column from each row under that column. Do this step for all columns. This procedure results a matrix [selectedBoundary x subPeriod].

8. Calculate cumulative deviation from sub-period mean.

It’s hard to explain, but here is an example with a 3×6 matrix:

cumsum.png

 

This procedure does not alter the size of the matrix and remains as [selectedBoundary x subPeriod].

9. Calculate min and max of cumulative deviation

Take the min of each column of the cumulative deviation and subtract it from the max of each column. This gives a matrix [1x subPeriod] large.

10. Calculate standard deviation of each sub-period.

Here we calculate the standard deviation of each column of the sub-period matrix. We end up with a matrix of size [1 x subPeriod] large.

11. Calculate rescaled range

Divide matrix from step 9 with the standard deviation from step 10. Note that the matrices from steps 9 and 10 are of similar dimension.

12. Take logarithm of mean of rescaled range

To get the y-axis log(R/S) point for the particular sub-period we average the rescaled range matrix from step 11 and take log2 of the calculated average.

13. Take logarithm of sub-period boundary

To complete we take the log2 of the selected sub-period boundary from step 3. This is the x-axis log2(t) point. We now have a log2(R/S) and log2(t) pair to plot on a graph. We go back to step 3 and repeat for a newly selected sub-period boundary.
Having iterated through all sub-period boundary values and plotted them, we perform a regression to get the line of best fit. The slope of this line corresponds to the Hurst Exponent of the returns time series.

I’m not too sure how this procedure would be implemented in Excel, but I imagine it requiring some custom VBA code, particularly for evaluating the sub-period matrix.

Immune System video

Artificial Immune Systems, Eye Candy No Comments »

I found this excellent video explaining very casually how the Immune System works. I thought it would be a great addition to one of my previous posts “Financial applications of Artificial Immune Systems“. I’ve copied it below for easy view.

Measuring the Predictability of a time series

Fractal Analysis, Statistics 13 Comments »

It is reasonable to ask whether a given financial time series is predictable before going about creating a model to predict it. One of the techniques available to do this is the Rescaled Range algorithm, which provides a numerical estimate of the predictability of a time series known as the Hurst Exponent. The reason the Hurst Exponent is an estimate and not a definitive measure is because the algorithm operates under the assumption that the time series is a pure fractal, which of course is not entirely true for most financial time series. This however is of low importance and what really makes the Hurst Exponent appealing is that it provides a means of classifying time series. This is a very useful statistical measure for comparing a model’s performance across different sections of financial data. Here I describe the R/S algorithm and provide an example of a time series whose hurst we calculate.

In terms of data flow, the Rescaled Range algorithm applied to a financial time series is as follows:

hurstAlgo3.png

The value of the Hurst Exponent is in the range 0 to 1 where 0 means the returns time series is unpredictable and 1 means the returns time series is predictable. It is worth pointing out that the Hurst exponent is calculated for the returns time series, and not the actual time series showing absolute prices. More formally we have

hurstTable2.png

  • H < 0.5 indicates an anti-persistent returns time series which means future values will always have a tendency to return to a longer term mean value. The strength of this mean reversion increases as H approaches 0.
  • H = 0.5 indicates a random walk and so there is a 50% probability that future return values will go either up or down.
  • H > 0.5 indicates the returns time series is trending the strength of which increases as H approaches 1. Series of this type are easier to predict than series falling in the other two categories

Calculating the Hurst Exponent

As outlined in [Rasheed, K. et al], for a returns time series

_1.png

We apply the following steps with

0.png

1. Calculate the mean

1.png

2. Calculate the mean adjusted series

2.png

3. Calculate the cumulative deviate series

3.png

4. Calculate the range series

4.png

5. Calculate the standard deviation series

5.png

6. Calculate the rescaled range series

6.png

7. The rescaled range scales by a power-law as time increases, such that

7.png

8. In order to calculate the hurst exponent we can take the base-n logarithm of the above equation, which gives

last.png

Recall that the form of the above equation is that of a straight line. We can obtain the value of H by calculating the slope of the linear relationship between log(R/S) and log(t). This requires a regression as the line of slope will not always be perfectly straight.

Example

Lets consider the following 1024 bar time series for an equity index. This is a reproduction of results from [Rasheed, K. et al]

 

actual.png

In order to calculate the Hurst for this interval, we first need to obtain the returns time series:

 

returns.png

Applying the algorithm to the returns time series we find individual points align in a near straight line fashion, the slope of which gives us the Hurst exponent. Note that we chose 1024 points so that we can easily apply base 2 logarithm. A different base can be used instead, but base 2 allows calculation with fewest data points.

 

hurstRegress.png

Practical Applications of the Hurst Exponent

  1. The hurst provides a method of classifying time series, which can be beneficial in identifying for instance which stocks have greater short term predictability. We could create a portfolio consisting of stocks with particular hurst values and investigate their profit generating characteristics.
  2. An application involving automated trading could be something like this: If a particular asset has it’s hurst drop below a threshold value, all investment positions in this asset could be closed in response to a “regime shift”.
  3. In conjunction with biologically inspired algorithms, hurst classification can help determine which assets to forecast and which ones to ignore. This can be particularly useful in neural nets where models can focus more on time series with higher predictability.

In a future post I will show that Evolutionary Algorithms generate greater profit when applied to time series with hurst greater than 0.5 than time series with hurst below 0.5.

Trading Puzzle #2

Trading Puzzle 4 Comments »

While scanning the market for good picks, a Trader comes across a stock that exhibits properties of a Geometric Brownian motion.  Determine whether one is going to make or lose money if they decide to invest in this stock.

Hint:  For a Geometric Brownian motion, there is a 50% probability that the path will go either up or down from its current position.

Neural Net sensitivity to Lag

Neural Networks 10 Comments »

Smoothing data prior to training a neural net is important because it provides a way of filtering unwanted noise that may be present in the inputs. In addition to removing noise smoothing inevitably introduces time lag which results in degradation of any correlation the input may have with the target variable. If we over smooth then there is a possibility that the correlation deminishes so much that the neural net is unable to learn important relationships between the input and target.

Let’s consider the following example:

We want to predict the next value of a noisy sine wave. For this we shall use a 2:3:1 neural net

nnTrylr.png

Our noisy input data consists of 400 samples with a further 100 samples used for testing the neural net’s prediction.

sinNoisy.png

Experiment 1:

We first investigate the predictive quality with both input1 and input2 set to 15-bar moving averages of the training data.

1515.png

1515corr.png

MSE = 0.2342

Experiment 2:

We now investigate with input1 set to 5-bar and input2 set to 15-bar moving averages of the training data.

515.png

corr515.png

MSE = 0.1335

Key Points:

  • Excessive filtering of inputs in experiment 1 resulted in noise removal as well as degradation of correlation which shows up in the quality of forecasted output.
  • In experiment 2 we changed one of the inputs to a 5 bar MA and left the other at 15 bar MA. Because the 5-bar MA input suffered less correlation degradation the forecast quality was much better than that of experiment 1.
  • Essentially what the neural net does during training is to identify correlations between the inputs and the target data. If for instance we have reason to believe that the target is influenced by an N-bar moving average then supplying an N-bar smoothed input will do no harm. But if our purpose is solely to filter noise in the input then we have to be aware about change in correlation and it’s consequence.

Code Vectorization Algorithm for Matlab

Matlab No Comments »

Very interesting paper that proposes an algorithm for Vectorizing “loopified” Matlab code:

A Dimension Abstraction Approach to Vectorization in Matlab
Birkbeck, Neil; Levesque, Jonathan; Amaral, Jose Nelson
International Symposium on Code Generation and Optimization, 2007. CGO apos;07. Volume , Issue , 11-14 March 2007 Page(s):115 - 130

vectorizeMatlab.JPG

I briefly skimmed through this paper and found a few interesting ideas such as the “Loop Pattern Database” and the Vectorization algorithm which I think are pretty neat. There is a powerpoint presentation accompanying the paper which I have attached below for easy view.

Trading Puzzle #1

Trading Puzzle 5 Comments »

Lets say we have a trading model that has only a single parameter which can be modified in-order to change the profits the model can generate. Presently, the model’s parameter is set to 160 and generates profits of £15million / year.

A Quant thinks this parameter is not ideal, and that the best value to achieve maximum profits lies in the range 150 to 170.

Calculate the difference in profitability that may be expected as a result of tweaking this value.

Close
E-mail It