A Neural Network can help forecast future values of a brownian path. For this purpose we use a multi-layer perceptron network which usually consists of multiple layers of interconnected computing units called nodes. The MLP is feed-forward as the pattern of activation of the network flows in one direction only, from the input to the output layer. During training, the network aims to learn patterns that are present in the input data by minimising a mathematical function describing the quality of network performance. They are trained using a supervised learning paradigm. In supervised learning a set of input data vectors for which the output is already known are presented to the MLP. An algorithm maps the input vectors to the known output vector by systematically adjusting network weights, hence minimising its cost function.
So for our experiment I divide the time series into training set and a test set. As inputs within the training set I use moving average vectors which are derived from values of the brownian time series. For solving real world problems of this nature I would not recommend using moving averages. This is because they have lag and would result in your output having lag. Lag is a major issue with neural networks. If a network is not properly trained, future forcasted values may appear offset towards the right side of the actual timeseries. The real challenge is to ensure your inputs are not delayed with reference to your target vector. Atleast this is true for te case where inputs are derived from the target. I could have used methods other than the MA equations to derive input vectors. But here I shall stick with moving averages because our aim is not to develop a reliable forecasting system, but to show that a time series can be learned.
The MLP takes a 4:5:1 structure. There are 4 inputs, 5 hidden nodes and 1 output. There is no rule as to the number of
hidden nodes to use. I have not found the number of hidden nodes particularly concerning, although it must be pointed out that using too many may result in the network memorising rather than learning input-output relationships. Each node is realted to nodes in the previous layer via a weight value, which changes during network training. A popular method called Network Pruning improves the functionality of the NN by removing memorised patterns embedded within the network structure. For this task I shall not prune the network as there are specific reasons of doing so, which do not crop up here.
The graph shows the network learning curve over 50 epochs. The MSE drop for the earlier parts of the training process is far greater than the drop many epochs later. The challenge is chosing a number of epochs that would allow the network to have learned generalisations rather than specific relationships. The graphs below show the network forecasting ability after being trained over a different number of epochs.
The use of NNs for forecasting a Brownian motion requires a lot of care, particularly when deciding the nature of input vectors to use and number of training epochs to stop at. This experiment hopefully highlights some of these issues.
