Quantcast aiQUANT » Forecasting

Archive for the 'Forecasting' Category

Forecasting via Genetic Algorithm: Part 4

Forecasting, Genetic Algorithm, Model Development No Comments »

Part 4: Roulette Selection

So we have initialised our population of chromosomes each of which is made up of 13 genes.  We can represent our population in the from of a matrix where individual chromosomes are represented vertically and it’s genes horizontally.  We have something like:

matrixFitn.png

Each member of the population is assigned a fitness value based on the fitness function, which in our case is defined as

eq2.PNG

 

Why Roulette Selection?

Roulette wheel sampling was first suggested by Goldberg and provides a method of global selection when applied to members of a population.  In order for genes of fit chromosomes to be passed on to future generations, there must be some form of selection method to decide which members of the population will be selected to breed and which ones will be replaced.  It seems sensible to breed from genes belonging to the fittest chromosomes and replace the least fit ones, but this has a draw back: the GA can converge to local minimum as the search space is restricted to the genes covered by the fittest chromosomes only.  One of the mechanisms that enable genes of weaker organism to be passed on while at the same time favouring fitter chromosomes is Roulette Selection.  To understand this consider the following example:

Assume we have a population of 4 chromosomes each of which has their fitness value evaluated. To construct a Roulette wheel for this population we calculate the Roulette probability of each chromosome as a ratio of its fitness to the fitness of the entire population. That is
RouletteProbability=\frac{chromosomeFitness}{populationFitness}
The illustration below shows the Roulette wheel for our hypothetical 4-chromosome population:

rouWeeEg.png

Extending the method to the problem at hand, we have:

rouWee.png

 Note that our Roulette wheel represents a population size of 10.  Chromosome 2 for instance has a higher probability of being picked compared to to Chromosome 6.  This off-course doesn’t mean that Chromosome 2 will always be picked.  It all depends on where the Roulette wheel “stops”.  Also note that different population size would have a different representation to its Roulette wheel because every chromosome is allocated a section corresponding to its fitness value.

Main Points:

  • Roulette Selection is an efficient way of allowing excellent genes embedded in a poor performing chromosome to be passed on to future generations.
  • Roulette Selection enables convergence to global minimum.
  • Think of this method as chromosomes represented on a Roulette wheel which is “spun” whenever a parent needs to be selected for crossover/breeding.

Forecasting via Genetic Algorithm: Part 3

Forecasting, Genetic Algorithm, Model Development No Comments »

Part 3: Constraining the Search Space

 We noted previously that \small\displaystyle\alpha,\beta,\gamma,\delta,\theta,\tau,\mu,a,b,c,d,e,f are integer values which will be evolved by the algorithm.  If we are to assuming that the range of possible values for each one of these parameters range from -infinity to +infinity then our algorithm will take an eternity to evaluate each and every permutation.  In order to improve convergence to an optimum solution it is useful to specify a range of values that encompasses optimum parameter values.  For instance \small\displaystyle\\a,b,c,d,e,f can only have a minimum value of 2 and a maximum value less than or equal to 800.  But clearly numbers closer to 800 seem way off because it is very unlikely that patterns will persist that far back in time given the nature of our prediction function.  Parameters \small\displaystyle\alpha,\beta,\gamma,\delta,\theta,\tau,\mu will be decimal numbers with a resolution of one decimal place and will fall over a wider range.

  • \small\displaystyle\\a,b,c,d,e,f  fall in the integer range 2 to 65  (We will refer to these as Integer Genes)
  • \small\displaystyle\alpha,\beta,\gamma,\delta,\theta,\tau,\mu fall in the decimal range -1000 to 1000  (We will refer to these as Decimal Genes)

\large\displaystyle\overbrace{\alpha,\beta,\gamma,\delta,\theta,\tau,\mu}^{\text{Decimal\hspace{1}Genes}},\large\underbrace{a,b,c,d,e,f}_{\text{Integer\hspace{1}Genes}}

Other constants shall be set to

  • Population Size = 20
  • Mutation Probability = 0.7
  • Mate Probability = 0.8

We don’t need to limit the algorithm to a fixed set of constants.  In-fact we will see later on the effect of altering each of these constants with respect to other parameters and constants.

 1.  Initialising the Population

We know that there are 13 genes in total that form a chromosome and we also know the boundary values individual genes may take.  What is the best way to create the required population of 20 chromosomes in-order to get our algorithm started?  We shall do so by selecting random numbers and scaling them to the permitted range

\large\displaystyle\overbrace{\alpha,\beta,\gamma,\delta,\theta,\tau,\mu}^{\text{\small\\initialisation=[1000-(-1000)]*randomNumber+(-1000)}},\large\underbrace{a,b,c,d,e,f}_{\text{\small\\initialisation= round([65-(2)]*randomNumber+(2))}} 

Note that \small\displaystyle\\randomNumber is a decimal number between 0 and one.  \small\displaystyle\\round() means that the number is rounded to the nearest integer value since the corresponding parameters are Integer Genes.

Essentially what we are saying is that Integer genes will be random numbers from the range 2 to 65 and Decimal genes will be random numbers from the range -1000 to 1000.

Forecasting via Genetic Algorithm: Part 2

Forecasting, Genetic Algorithm, Model Development No Comments »

Part 2:  Algorithm Overview

 The Genetic Algorithm is a subset of Evolutionary Algorithms that model biological processes to optimise highly non-linear cost functions.  They require multiple initialisation points on the cost surface hence giving the algorithm a global outlook towards the minimisation or maximisation task.  A few other advantages of these algorithms over traditional local search algorithms include

  • Don’t need to compute the derivative of the function
  • They can escape local minima
  • They can provide a list of suitable solutions and not just one
  • Simultaneous searching due to wide sampling of the cost surface
  • etc etc

  The figure below shows the main steps in the Genetic Algorithm.

genAlgo.png

1.  Set Constants

Constants are the parameters that drive the rate and quality of evolution in the Genetic Algorithm.  In out example we have three constants namely

  1. Population Size
  2. Mutation Probability
  3. Mating Probability

There is no fixed rule on how to choose the best values for these constants since they can differ from problem to problem.  The Population Size is the number of organisms or sampling points on the cost surface.  The fittest organisms of this population, which is calculated via a fitness function are allowed to breed via crossover with the premise that they will produce fitter organisms.  Mutation adds random diversity to future generations of the population.  Mating Probability ensures certain genes during crossover are unchanged hence adding more diversity.

2.  Initialise Population

In our example the population is composed of different samplings of the parameters we want to evolve.  A single member organism or chromosome is shown below: 

chromo.PNG

A gene refers to an individual entity or parameter belonging to the the prediction function.  Recall that our prediction function is:

eq1.PNG

 Note that the words organism and chromosome are used interchangeably.

3.  Evaluate Fitness

The fitness function determines how fit individual members of the population are with regards to the prediction function.  In our case we shall use a fitness function that compares the predicted price movement with the actial price movement, averaging the absolute value of the difference over a time period of n days.   

eq2.PNG

I shall explain later the variable n.

4.  Sort by Fitness

Each member of the poplation is assigned a fitness value.  If the fitness is above a pre-defined threshold the algorithm is terminates.  Otherwise the algorithm proceeds to inter-breed the population so as to produce fitter organisms.

5.  Select Parents

There are many ways of selecting parents, which for instance can be based on their fitness ranking within the population.  However the actual selection mechanism may be random such as Roulette Selection.  We will use Roulette Selection in our example and I shall talk more about this later on.

6.  Breed

The genes of the selected parents are crossed over with the aim of producing a fitter organism/chromosome.  The Mating Probability ensures that certain genes are unaltered, hence adding diversification to the resulting generation.

7.  Mutate

The downside of cross over during breeding is that values are exchanged between parents but no new ones are introduced.  Mutation allows new genes to show up, hence adding even more diversification to the population.   We are emphasising on diversification because this prevents the algorithm from getting stuck at a local minimum as different members of the population represent a wider range of sampling points on the cost surface.

8.  Remove Twins

The existance of two organisms with identical genomes has limited utility in a Genetic Algorithm.  In our case, we search for two similar organisms and replace the genes of one of the organism with random values.

Forecasting via Genetic Algorithm: Part 1

Forecasting, Genetic Algorithm, Model Development 2 Comments »

So far we have looked at a case study where the MLP Neural Network was applied to forecast a Financial Index.  A derived trading model showed that Neural Networks generate higher returns than that of the buy and hold strategy for an Index traded Fund.  Predictive Edge?… Probably not given the simplicity of the model we have been dealing with.  But we have learnt and discovered a few things about MLPs which will be usefull when we shall build more complicated models.

We have seen that

  • A committee of MLP NNs outperforms a single MLP structure.
  • Dimensionally reduced inputs improve the quality of your NN outputs.  Garbage in garbage out.
  • Smoothing your inputs via minimum lag filters enables your outputs to appear in a timely fashion.

Some challenges to overcome with MLPs:

  • The learning algorithms explored thus far only perform local searches.  We dont know if the weights of our trained NN is occupying a local minimum.  A committee NN structure tries to deal with this problem by having different initialisations for each network, but this does not provide a thorough global search mechanism.
  • We limit ourselves to a fixed MLP structure - how do we know our chosen structure is the best given the infinite set of possible structures we haven’t tested?
  • Our forecasts are relatively good for out-of-sample sets closer to the training set.  How do we know what length of walk forward window to use for re-training without compromising the quality of forecasts?
  • etc, etc.

We now move on to look at another Biologically Inspired Algorithm, the Genetic Algorithm.  We would like to apply the Genetic Algorithm to search for suitable parameters for our stock price prediction function.  To start with we will look at a simple prediction function whose structure is fixed, which will form the basis for developing self-evolving prediction functions later on.  We will then move on to look at a form of Genetic Programming known as Grammatical Evolution which allows us to evolve trading rules.  Hopefully by then we would have enough material to develop a Genetically inspired trading model incorporating all the ideas we would have covered by then. 

Part 1: The case of a fixed prediction function

Our prediction function aims to predict the difference between the closing value and the opening value of the FTSE100 index based on historical values.  Because we are limiting ourselves to using historical values for the FTSE100 time-series, our prediction function needs to incorporate the generalised forms of various technical trading strategies that use historical data to forecast and hence trade the index.  A function with terms relating to filters, closing prices, opening prices etc seems appropriate for this task.  The prediction function shall take the form:

eq1.PNG

Where:

  • eq3.PNG is the difference between the closing price and the opening price of the FTSE100 index on day T
  • eq5.PNG is the closing price on day T-1 minus the average of of the closing prices of T-2 up to and including day T-x
  • eq6.PNG is the closing price on day T-1 minus the the lowest price between days T-2 and T-x
  • eq7.PNG is the closing price on day T-1 minus the the highest price between days T-2 and T-x
  • eq4.PNG are all integer values which will be evolved by our genetic algorithm.  So we have 13 integer parameters to evolve.

The time-series of interest is taken from the range 10/07/2000 to 28/12/2006.  We make a slight modification such that our training set occupies the middle of the time-series.  This would allow us to examine how well our model copes with a range it was trained with as well as a range it needs to forecast. 

  • Test Set 1: 800 trading days.  To check how well our model generalises its forecasts and aviods overfitting.
  • Test Set 2: 36 trading days.  This will be the “true” out-of-sample range to forecast.
  • Training Set: 800 trading days.  This range will be used by the Genetic Algorithm to evolve the 13 parameters of our Prediction Function.

range1.png

Financial Index prediction using MLP Neural Networks: Part 8

Forecasting, Model Development, Neural Networks, Trading Systems 5 Comments »

Part 8: Trading strategy results and Wrap-up

 Our trading strategy takes the following form:

  1.  
  2. if(predicted5DayChange > 1.5) 
  3.     trade = long(5);
  4. elseif(predicted5DayChange < -1.5)
  5.     trade = short(5);
  6. end
  7. % n is the number of days after which the trade is automatically closed

The performance of this strategy is compared against that of a buy-and-hold strategy.  Transaction costs, slippage and other caveats are ignored.  The plot below shows the cumulative profit of the trading strategy over different test sets.

cumReturn.png

A comparative summary is given in the table below:

retnTable.PNG

We have shown that a simple MLP nerual network can generate profits in the absence of trading costs.  In summary:

  • Predictive quality and hence return on investment degrades as the time-series moves further away from the training set.
  • Creating a committee of networks improves predictive quality.
  • In the absence of all other costs, the model generates greater profits than a simple buy-and-hold strategy.

I would say these results should be considered a lower bound on the profiteering abilities of a neural network trading system as the model can be improved in a number of ways:

  • Evolving weights via Particle Swarm algorithm or some other Evolutionary algorithm
  • Modify network to predict market turning points rather than absolute values
  • Modify network to detect divergence of other linked markets
  • Allow the number of days to stay in a trade to vary.

I am currently in the process of improving this system so that I can deploy it at Collective2.  Nonetheless this experiment ends here and we shall next look at something to do with Genetic Algorithms.

Financial Index prediction using MLP Neural Networks: Part 7

Forecasting, Model Development, Neural Networks, Trading Systems No Comments »

Part 7: Application to trading

Lets recap what we have been done so far by looking at the plot below:

ftseTrainTest.png

 We used 889 trading days of the FTSE 100 index for training our stack of twenty-five 9:6:1 MLP neural networks.  We then chose 3 sets of testing data consisting of 125 trading days each.  Our results showed that the committee model performed best over test set 1.  Hence the quality of predicted 5-day forecasts degraded as data points moved further away from the training set.  The plot below tries to reflect this:

forcQuality.png

A trading strategy can be something like this.

  1.  
  2. if(predicted5DayChange > 1.5)
  3.     trade = long(n);
  4. elseif(predicted5DayChange < -1.5)
  5.     trade = short(n);
  6. end
  7. % n is an integer number of days which needs to be found so as to maximise profit

Our aim now is to find the best value of n.

Financial Index prediction using MLP Neural Networks: Part 6

Forecasting, Model Development, Neural Networks 3 Comments »

Part 6:  Network performance measures

Tests are performed with three out-of sample data sets consisting of 125 trading days each.

  • Test set 1: 01/07/95 to 31/12/95
  • Test set 2: 01/01/96 to 28/06/96
  • Test set 3: 01/07/96 to 31/12/96

The figures below shows the mean square error over the entire test set range, with a smaller MSE value indicating better performance.

mseResults.png

 There are two visible trends here: The committee model outperforms the average of individual models across all test sets, although not as much in the third set.  Also the predictive quality of the models degrades over time since the error in both the committee and individual models increase.  This is further investigated in terms of correlation coefficient, which provides an indication of how similar the predictions are with the actual time-series.

corrResults.png

Predictions made as a result of a committee stack show higher correlation (ie similarity) to the actual five-day change than do the average correlations of the individual models.  As expected the correlation coefficients degrades over time, thus confirming that forecasts made on out-of-sample data further away from the training set are prone to being less accurate. 

Financial Index prediction using MLP Neural Networks: Part 5

Forecasting, Model Development, Neural Networks, Trading Systems 1 Comment »

Part 5:  Creating a committee of networks.

In Part 4 I talked about using a single network to obtain forecasts.  We can improve upon that by using a committee of networks so as to improve overall accuracy of forecasts.

Previous research [Zhang et al] shows that using a group of between 20 and 30 networks are sufficient to improve results obtained from a single network.  The reason for this can be understood as follows:

Consider a neural network with only 2 taps (or weights).  If we change the value of these weights with respect to each other and measure the error produced at the output of the network and plot it, we get a surface with multiple minima similar to that shown below:

multiCost.PNG

During training, the learning algorithm (Levenberg-Marquardt in this case) attempts to find the minimum points on this surface because it aime to minimise the error at the output of the network.  Notice that there are four minima on the surface, two of which are global (dark blue ones) and the other two are local (see this).  Local search algorithms (like Levenberg-Marquardt) behave such that they follow the path of steepest slope on the cost surface.  But they need to be told where to start their search (aka the initialisation point).  If we use only one initialisation point then there is a possibility of the network getting stuck at a local minimum.  Creating multiple instantiations of initialisation points improves the possibility of reaching a global minimum.  This is why we create a committee of networks, as each one will be initialised at a different point on the cost surface.  We can therefore expect certain members of the committee to perform better than others, provided the cost surface has local as well as global minima.  Our committee structure looks something like:

nn961comte.PNG

A stack of 25 networks will be used to provide a forecast which is expected to be more accurate than that of a single network.

In summary:

  • We use a committee because 25 brains are better than one.. duh!
  • Put more formally, certain networks in the committee will perform better than others because their weights will correspond to those at a global minima on the cost surface.
  • We need to avoid at all times networks that have weights corresponding to local minima as this can mean the difference between a good forecast and a bad forecast.
  • We could have used tournament selection, but that is more applicable to genetic algorithms, something we shall discuss in due course.

More on optimising cost surfaces later on.

Financial Index prediction using MLP Neural Networks: Part 4

Forecasting, Model Development, Neural Networks, Trading Systems 2 Comments »

Part 4:  Network training

The network structure we will use is a 9:6:1 multi-layer perceptron.  That is, there are 9 input nodes, 6 hidden nodes and 1 output node.  The is no particular reason why we use 6 nodes in the hidden layer.  We chould have chosen a different number, but trial and error shows that 6 nodes learns the time-series well without loss of generality.  The network structure is shown below:

 

nn961.png

 Each of the nodes is activated by a tan-sigmoid transfer function.  The training algorithm employed is the Levenberg-Marquardt algorithm, which is a very powerfull gradient descent algorithm.  It is important to point out that there are a number of neural network training algorithms which we could have used.  They have their own advantages and disadvantages provides a framework to decide which one to use for a particular problem:

  • Resilient backpropagation

  • Random order incremental update

  • Polak-Ribiere conjugate gradient descent

  • Powell-Beale conjugate gradient descent

  • Bayesian regularization

These algorithms very rarely get stuck at saddle points because they have a random disturbance which alleviates problems associated with attraction of saddles.  Once initialised, they descend until they reach a true minimum point, which might not necessarily be global.  Evolutionary algorithms have a special feature of ensuring that a global minimum reached (subject to certain constraints) during training:

  • Simulated Annealing

  • Stochastic search via genetic algorithm

  • Stochastic search via particle swarm

  • Grammatical evolution

Evolutionary algoritms truely enable one to have a predictive edge, as we shall see later on.  Next we shall look at the forcasting ability of the trained network.

Related:

Financial Index prediction using MLP Neural Networks: Part 3

Forecasting, Model Development, Neural Networks, Trading Systems No Comments »

Part 1:  Setting project goals

 Part 2:  Analysing the target variable

 

Part 3:  Input data selection and preprocessing

We shall use a variety of indicators to drive out neural network for forecast the % 5-day change of the FTSE100 index.  Although there may be thousands of such indicators that we can chose from, our aim should be to pick the ones that have a significant bearing on the target variable being forecasted.  In this example we use a range of technical, fundamental and intermarket indicators namely:

  1. 5-day lagged % change of the FTSE100

  2. 20-day lagged % change of the FTSE100

  3. ** 10-day 5-day convergence divergence of the FTSE100

  4. ** 20-day 10-day convergence divergence of the FTSE100

  5. GBP/USD exchange rate

  6. S&P 500 composite index

  7. Brent crude (US$ per barrel)

  8. LIBOR 1-month deposit rate

  9. LIBOR 12-month deposit rate

Inputs 1 and 2 provide the neural network model with a measure of momentum in the market, giving the added ability to discern whether a short run 5-day trend agrees with the longer 20-day trend.  Input 3 was calculated as the ratio of the 10-day smoothed vs the 5-day smoothed time-series of the FTSE100 index.  The smoothing was accomplished using the ** Zero-lag filter that was designed earlier.  The GBP/USD exchange rate is included as changes of the GBP against a major currency pair can be expected to impact the domestic and overseas earnings of companies in the FTSE100 index.  For similar reasons the price of Brent Crude is also included.  The S&P 500 index is included because it is often said that “when the US sneezes the rest of the world catches a cold”.  We can safely assume there is correlation between the two indices (infact I know there is!).  Two measures of interest rates are provided. namely the LIBOR 1-month and 12-month.  Interest rates affect share prices  by altering the rate of return that can be earned on competing instruments such as bonds, bank deposits etc since they impact the borrowing cost of firms in the FTSE100 index.  The plot below shows the input data drawn from the range 01-Jan-1992 to 15-June-1995, a total of 889 days.

the9inputs.png

This range for training was deliberately chosen due to the profound effects of black wednesday that is visibly present in all input data as well as the target variable.  A deviation of this kind would enable our network to learn black swans in addition to the normal behiavour of the index.  To further illustrate the importance of having outliers in your training set, we have the plot below.

 

featuresPlot.png

Assuming there is correlation between the S&P500 and the FTSE100 (which of course there is!), features present in the training set is representative of pretty much the whole data set.  This is good because our network will be in a position to predict a fall in the FTSE100 should a should an event similar to black wednesday occur.  A training set which is not representative of all possible events, including black swans is one of the reasons why neural networks fail in their forecasting ability.

Key points:

  • We have selected a range of fundamental, technical and intermarket indicators to use as inputs.

  • We have selected time-series range which includes a black swan event.

  • We avoid the use of a moving average for smoothing because of its terrible lag characteristics.  Instead we used the custom designed zero-lag filter.

  • We will normalise all inputs and target to the range (-1, 1) inorder to satisfy the training constraints of the MLP neural network.

Close
E-mail It