Oct 21
Part 8: Trading strategy results and Wrap-up
Our trading strategy takes the following form:
if(predicted5DayChange > 1.5)
trade = long(5);
elseif(predicted5DayChange < -1.5)
trade = short(5);
end
% n is the number of days after which the trade is automatically closed
The performance of this strategy is compared against that of a buy-and-hold strategy. Transaction costs, slippage and other caveats are ignored. The plot below shows the cumulative profit of the trading strategy over different test sets.

A comparative summary is given in the table below:

We have shown that a simple MLP nerual network can generate profits in the absence of trading costs. In summary:
- Predictive quality and hence return on investment degrades as the time-series moves further away from the training set.
- Creating a committee of networks improves predictive quality.
- In the absence of all other costs, the model generates greater profits than a simple buy-and-hold strategy.
I would say these results should be considered a lower bound on the profiteering abilities of a neural network trading system as the model can be improved in a number of ways:
- Evolving weights via Particle Swarm algorithm or some other Evolutionary algorithm
- Modify network to predict market turning points rather than absolute values
- Modify network to detect divergence of other linked markets
- Allow the number of days to stay in a trade to vary.
I am currently in the process of improving this system so that I can deploy it at Collective2. Nonetheless this experiment ends here and we shall next look at something to do with Genetic Algorithms.
Bookmark & Share
Oct 17
Part 7: Application to trading
Lets recap what we have been done so far by looking at the plot below:

We used 889 trading days of the FTSE 100 index for training our stack of twenty-five 9:6:1 MLP neural networks. We then chose 3 sets of testing data consisting of 125 trading days each. Our results showed that the committee model performed best over test set 1. Hence the quality of predicted 5-day forecasts degraded as data points moved further away from the training set. The plot below tries to reflect this:

A trading strategy can be something like this.
if(predicted5DayChange > 1.5)
trade = long(n);
elseif(predicted5DayChange < -1.5)
trade = short(n);
end
% n is an integer number of days which needs to be found so as to maximise profit
Our aim now is to find the best value of n.
Bookmark & Share
Oct 11
Part 5: Creating a committee of networks.
In Part 4 I talked about using a single network to obtain forecasts. We can improve upon that by using a committee of networks so as to improve overall accuracy of forecasts.
Previous research [Zhang et al] shows that using a group of between 20 and 30 networks are sufficient to improve results obtained from a single network. The reason for this can be understood as follows:
Consider a neural network with only 2 taps (or weights). If we change the value of these weights with respect to each other and measure the error produced at the output of the network and plot it, we get a surface with multiple minima similar to that shown below:

During training, the learning algorithm (Levenberg-Marquardt in this case) attempts to find the minimum points on this surface because it aime to minimise the error at the output of the network. Notice that there are four minima on the surface, two of which are global (dark blue ones) and the other two are local (see this). Local search algorithms (like Levenberg-Marquardt) behave such that they follow the path of steepest slope on the cost surface. But they need to be told where to start their search (aka the initialisation point). If we use only one initialisation point then there is a possibility of the network getting stuck at a local minimum. Creating multiple instantiations of initialisation points improves the possibility of reaching a global minimum. This is why we create a committee of networks, as each one will be initialised at a different point on the cost surface. We can therefore expect certain members of the committee to perform better than others, provided the cost surface has local as well as global minima. Our committee structure looks something like:

A stack of 25 networks will be used to provide a forecast which is expected to be more accurate than that of a single network.
In summary:
- We use a committee because 25 brains are better than one.. duh!
- Put more formally, certain networks in the committee will perform better than others because their weights will correspond to those at a global minima on the cost surface.
- We need to avoid at all times networks that have weights corresponding to local minima as this can mean the difference between a good forecast and a bad forecast.
- We could have used tournament selection, but that is more applicable to genetic algorithms, something we shall discuss in due course.
More on optimising cost surfaces later on.
Bookmark & Share
Oct 09
Part 4: Network training
The network structure we will use is a 9:6:1 multi-layer perceptron. That is, there are 9 input nodes, 6 hidden nodes and 1 output node. The is no particular reason why we use 6 nodes in the hidden layer. We chould have chosen a different number, but trial and error shows that 6 nodes learns the time-series well without loss of generality. The network structure is shown below:

Each of the nodes is activated by a tan-sigmoid transfer function. The training algorithm employed is the Levenberg-Marquardt algorithm, which is a very powerfull gradient descent algorithm. It is important to point out that there are a number of neural network training algorithms which we could have used. They have their own advantages and disadvantages provides a framework to decide which one to use for a particular problem:
-
Resilient backpropagation
-
Random order incremental update
-
Polak-Ribiere conjugate gradient descent
-
Powell-Beale conjugate gradient descent
-
Bayesian regularization
These algorithms very rarely get stuck at saddle points because they have a random disturbance which alleviates problems associated with attraction of saddles. Once initialised, they descend until they reach a true minimum point, which might not necessarily be global. Evolutionary algorithms have a special feature of ensuring that a global minimum reached (subject to certain constraints) during training:
Evolutionary algoritms truely enable one to have a predictive edge, as we shall see later on. Next we shall look at the forcasting ability of the trained network.
Related:
Bookmark & Share
Oct 07
Part 1: Setting project goals
Part 2: Analysing the target variable
Part 3: Input data selection and preprocessing
We shall use a variety of indicators to drive out neural network for forecast the % 5-day change of the FTSE100 index. Although there may be thousands of such indicators that we can chose from, our aim should be to pick the ones that have a significant bearing on the target variable being forecasted. In this example we use a range of technical, fundamental and intermarket indicators namely:
-
5-day lagged % change of the FTSE100
-
20-day lagged % change of the FTSE100
-
** 10-day 5-day convergence divergence of the FTSE100
-
** 20-day 10-day convergence divergence of the FTSE100
-
GBP/USD exchange rate
-
S&P 500 composite index
-
Brent crude (US$ per barrel)
-
LIBOR 1-month deposit rate
-
LIBOR 12-month deposit rate
Inputs 1 and 2 provide the neural network model with a measure of momentum in the market, giving the added ability to discern whether a short run 5-day trend agrees with the longer 20-day trend. Input 3 was calculated as the ratio of the 10-day smoothed vs the 5-day smoothed time-series of the FTSE100 index. The smoothing was accomplished using the ** Zero-lag filter that was designed earlier. The GBP/USD exchange rate is included as changes of the GBP against a major currency pair can be expected to impact the domestic and overseas earnings of companies in the FTSE100 index. For similar reasons the price of Brent Crude is also included. The S&P 500 index is included because it is often said that “when the US sneezes the rest of the world catches a cold”. We can safely assume there is correlation between the two indices (infact I know there is!). Two measures of interest rates are provided. namely the LIBOR 1-month and 12-month. Interest rates affect share prices by altering the rate of return that can be earned on competing instruments such as bonds, bank deposits etc since they impact the borrowing cost of firms in the FTSE100 index. The plot below shows the input data drawn from the range 01-Jan-1992 to 15-June-1995, a total of 889 days.

This range for training was deliberately chosen due to the profound effects of black wednesday that is visibly present in all input data as well as the target variable. A deviation of this kind would enable our network to learn black swans in addition to the normal behiavour of the index. To further illustrate the importance of having outliers in your training set, we have the plot below.

Assuming there is correlation between the S&P500 and the FTSE100 (which of course there is!), features present in the training set is representative of pretty much the whole data set. This is good because our network will be in a position to predict a fall in the FTSE100 should a should an event similar to black wednesday occur. A training set which is not representative of all possible events, including black swans is one of the reasons why neural networks fail in their forecasting ability.
Key points:
-
We have selected a range of fundamental, technical and intermarket indicators to use as inputs.
-
We have selected time-series range which includes a black swan event.
-
We avoid the use of a moving average for smoothing because of its terrible lag characteristics. Instead we used the custom designed zero-lag filter.
-
We will normalise all inputs and target to the range (-1, 1) inorder to satisfy the training constraints of the MLP neural network.
Bookmark & Share
Oct 03
Part 1: Setting project goals
Part 2: Analysing the target variable
Our case study is to examine the ability of our MLP Neural Networks to predict the five-day percentage change of the UK’s FTSE100 index during the period July 1995 to December 1996. We shall draw our training data from the range January 1992 to June 1995. Lets have a look at perspectives of the 5-day % change of the FTSE100 index.

The returns series shows that while most changes are in the range +/-2%, larger changes are not uncommon. Infact there a few positive black swans, which are not characteristics of a normal distribution. What we see is a leptokurtic distribution, which is a general characteristic of price changes of equities. Leptokurtic or fat-tailed distributions exhibit more frequent large positive or negative price changes than would be expected if price changes followed a normal distribution. Distributions of financial data also exhibit higher peaks than would be expected if they followed a normal distribution. Hence price changes are not normally distributed. Autocorrelation is a measure of how similar a time-series is to itself when time shifted. The plot shows the returns series is most similar to itself when it is not time shifted at all. At other lags the series exhibits slight similarity, but is negligible. If we were to see peaks at other lags then the returns series would be classified as cyclic, which is not uncommon in financial data.
In Part 1 I mentioned briefly the performance measures we would need to use. Elaborating further, [7, 8, 9] make mention of “thick modelling” which I believe implies that a different statistical approach to lets say sharpe ratio is required to measure performance correctly. Infact we have the Diebold-Mariano error test, the Cross-validation and the Harvey-Leybourne correction tests, which are all specific to neural network models. Perhaps these would be usefull for comparison within a set of different NN models. So we have a rough idea about testing.
Bookmark & Share
Oct 01
Part 1: Setting project goals
It may seem trivial to any market player that one of the objectives is to predict the future value of an asset. This doesn’t necessarily have to be the case because predicting the actual price is extremely difficult, particularly for volatile asset classes. Lets say the FTSE100 index is currently at 6500 and that is changes by an average of 50 points per day. If your objective is to forecast the future value of the index to within 5 points, your model will have to present predictions with an accuracy of 0.8% (50 out of 6500) which is tough, but not impossible. Now lets say the the goal of the model is to predict the one day change in the index rather than the absolute value, the required accuracy drops to 10% (5 out of 50). Changing the predictive target can substantially impact on the easy of the predictive task. Choice is not limited to one kind of predictive target and many others can be used. The key is to make it as simple as possible for your model to make forecasts, without compromising on the quality of predictions.
I would like to experiment with the FTSE 100 index data to build a MLP neural network model that helps predict weekly percentage change. I shall use the model to to backtest over different out of sample periods inorder to identify profiteering capabilities. Its a little tricky to decide what my performance measures should be given that there are so many to chose from. But that is to be discussed later. Let us focus on building the model first.
Bookmark & Share
Sep 30
Constructing financial models is a multi-stage process consisting of the determination of goals for the project followed by data collection, data preprocessing, model construction, data postprocessing, model validation and finally model implementation. The success of any trading model/system depends on the rigor of all the steps in its development process, and not just the sophostication of the predictive algorithm embedded in it. You may have a really good algorithm, but if for instance the input data supplied is not of high quality then predictions might end-up on a similar footing. I now wish to develop a FTSE100 index financial model using MLPs so as to backtest its performance over historical data and compare its profiteering capabilities with other trading strategies.
Bookmark & Share