Quantcast aiQUANT » 2008 » January

Archive for January, 2008

Links for 31-01-2008

Links 1 Comment »

I have added a few links to my blogroll:

  1. Tom at Neural Market Trends shows how develop AI based financial models using an open source tool called Rapid Miner. He provides a number of tutorials and even has a member’s section where he shares his models through videos and examples. I think this is an excellent way for one to get started developing AI financial models. From what I hear the Rapid Miner tool has a steep learning curve, but Tom alleviates this a great deal through his easy to understand tutorials.
  2. Ali’s blog talks about mostly quant stuff with occasional digressions. I recommend this site for your daily food for thought.
  3. foquant runs a quant blog. He recently developed an SMS order confirmation tool for trading which can be downloaded from the models page at his site. He provides a number of other quant finance models and is always adding newer ones.
  4. Two fish’s blog covers a wide variety of topics ranging from quant finance to ghosts and immigration. Fascinating stuff!
  5. Visit the One Unified Blog for your dose on programming and IT related issues. The blog shares a model for Trade Optimization by Genetic Programming, written for the QuantDeveloper platform.
  6. The Econophysics blog provides a rare blend philosophical issues surrounding quantitative finance and quantitative social science. A must read!

Thoughts on GAs and the like

Artificial Intelligence 8 Comments »

Every once in a while I receive an email from someone somewhere asking me how to develop killer Genetic Algorithm trading model. I find there is a some confusion surrounding the GA, NNs and the like so in this post I wish to clear misconceptions that may lead one to believe that Evolutionary Algorithms will give answers straight away.

  • Lets be clear about it, Genetic Algorithms may sound sexy but under certain cases they are no silver bullet. I say this based experience from applying them to solve different kinds of problems. Under such cases it is not the optimization algorithm which fails you - its a simple case that there is no good solution to the problem at all.
  • You cannot use the GA or its derivatives directly to predict the market, but rather you use it to optimize a pre-defined, or nearly defined model in order to improve its performance. In other words one aims to find the best parameters to the model, and it’s the emerging model which goes about doing the predicting or trading and not the GA itself.
  • On training Neural nets you want to be careful not to use a local search algorithm e.g. the Back propagation algorithm, but rather a global search mechanism such as swarm, simulated annealing, the GA or its derivatives. The reasoning behind this is that local search algorithms will converge to any minimum (which may be local or global). A global search algorithms on the other hand will have a near accurate view of the optimization surface and as a result are more likely to converge to global solutions.
  • Neural Nets have received some bad press about over fitting problems. There are a number of ways of dealing with this, the most notable one being filtering. If the data you are fitting to is clean then your solution will not over fit. If your data is noisy then filtering the noise will ensure that over fitting does not occur.

If there are good solutions to find, then Evolutionary Algorithms are good at discovering these solutions, but if there are no good solutions (which may be an inherent characteristic of the problem), then it is impossible for Evolutionary Algorithm to discover any. The message to take away from here is that one should focus on problems where there is a strong likelihood of something to be discovered, rather than believe that a solution will be discovered out of the blue for any arbitrary problem.

A filter with negative lag

Filter Design, Technical Indicators 6 Comments »

The problem with technical indicators is that they suffer lag, resulting in signals appearing after events have occurred in time series context. Recently I’ve been investigating ways of reducing lag in these filters, building upon the results from a previous post which shows the workings of a zero-lag indicator derived from first principles. It seems something can be done, although the technique as it stands needs further refining. The graph below shows how it compares with the Exponential Moving Average (EMA) and the Moving Average (MA):

predictFil.png

The thing with this filter is that it has negative lag when the market is trending but has high gain when the market is in cycle mode. The reason for this is that in the filtering algorithm there is a part where I multiply the actual price by itself which results in noise components being amplified. To understand this let us consider the following:

Price data can be seen as comprising of two components: a signal part which is useful and a noise part which is not useful.

\small \text{Price=Signal+Noise}

The effect of multiplying Price by itself is a quadratic equation with signal and noise terms:

\small \text{Price*Price}=\left\{\text{(S+N)(S+N)} \\ \text{S^{2}+SN+NS+N^{2}}\right.

We have three noise terms, two of which affect our signal. I think this is the cause of the laggish behavior when the market is cyclic because the high frequency noise components are amplified.

Now there are two ways I can think of improving the performance of this filter

  1. Find a means of modifying the value of alpha when the filter detects that the market is no longer trending. This will require an additional algorithm to estimate whether the market is in trend mode or cycle mode - something which the Hilbert Transform can take care of.
  2. Find an alternative to multiplying the price by itself so that additional noise terms are not introduced.

I am biased towards 1 so shall spend more time on that. Because I consider this work in progress, I am tempted to discuss the actual workings of the filtering algorithm at a future date when we have some acceptable performance.

Transform Algebra in Mathematical Finance

Transform Algebra 2 Comments »

I have hinted at Z-transforms and Hilbert transforms and how they are applied to financial time series. But generally speaking, there are a number of other mathematical transforms which find their use in financial modeling problems.

Mathematical Transforms are important because they help convert functions into other domains thereby making them easier to understand and solve.  Here I give a qualitative description of common transforms and some examples of how they are applied to problems in finance.

traformTable2.png

Obviously the choice for invoking a particular transform depends on the problem being solved.

1. Fischer Transform

The Fischer Transform converts any data set into a modified data set whose probability density function is approximately gaussian. An immediate benefit is that one can then analyze the transformed data set in terms of its deviation from the mean - something which might not have been possible prior to the transformation. Consider for instance absolute prices on a bar chart. Are they normally distributed (i.e. bell shaped)? No. The returns density of the time series are usually near bell shaped, but the actual prices are no where close to bell shaped. This means attaching Gaussian confidence intervals to absolute price data is impossible, but Fischer transformed data will modify the data so that they become bell shaped hence allowing the majority of Gaussian statistical functions to be applied. The benefit is seen in modified technical indicators that are similar to, but more responsive than conventional oscillators such as the Commodity Channel Index (CCI) or Moving Average Convergence Divergence (MACD).

2. Fourier Transform

The Fourier Transform enables conversion of functions or data sets from the time-domain to the frequency domain and vice-versa. In Signal Processing this is essentially what describes the relationship between the time domain and the frequency domain. In terms of option pricing, the Fourier Transform provides a framework for fast price calculation compared to Monte Carlo methods. As shown in [Szymon et all] the Fast Fourier Transform (FFT) algorithm is about 3000 times faster than Monte Carlo simulation.

fftSimTime.png

It is worth noting that FFTs are inappropriate for direct time series analysis because they deliver poor resolution in terms of cycle length. They are only capable of recording integer number of cycles, thereby missing any cycles that have a length falling in between two integer boundaries. A good algorithm which performs better than the FFT for measuring cycle period in financial time series is the Pisarenko harmonic decomposition.

3. Hilbert Transform

The Hilbert Transform is a procedure to create complex signals from the real price data that is plotted on the bar chart. With complex signals available one can compute more accurate and responsive indicators as well as create other indicators which cannot be calculated without the Hilbert Transform. Computations such as Signal-to-Noise ratio, Power Spectral Density and Cycle Period measurement can only be achieved by calculating the Hilbert Transform.

4. Laplace Transform

Laplace Transforms are useful for solving transient state systems that are described by differential equations. The transform helps simplify complex differential equations; notably in the case of partial differential equations used in option pricing formulas. Simplified equations in the Laplace domain are solved algebraically before the inverse Laplace transform is applied to return the solution to the original domain.

5. Wavelet Transform

The Wavelet Transform is similar to the Fourier Transform in that it converts time domain data into frequency domain representations. The main difference is that wavelets provide a number of ways for doing the conversion through a series of wavelet families. Each wavelet family has its own property which is exploited for doing more specific decomposition tasks such feature extraction, noise removal or cycle period estimation of time series. Decomposed time series can be analyzed in several ways to provide more detain into it’s intrinsic properties.

6. Z - Transform

The Z Transform converts discrete time-domain data points into a complex frequency-domain representation. Unlike the Laplace and Fourier Transforms which are applicable to differential equations, the Z Transform provides an excellent framework for working with difference equations in the frequency domain. One can go about describing filters in terms of their transfer functions and apply a host of algebraic techniques found in Digital Signal Processing to design better filters for time series analysis.

Summary

  • Transforms are performed to make problems easier to understand and solve.
  • Any Transform has an accompanying Inverse Transform algorithm so that a dual relationship is maintained between two domains.
  • The Z, Wavelet, Fourier and Laplace transforms are related in that they maintain a relationship between the time domain and the frequency domain.
  • Laplace Transforms are useful for analyzing transient systems where as Fourier Transforms are more suitable for steady state systems.
  • aiQUANT thinks technical indicators and oscillators should be designed in the Z domain. This gives greater understanding of the frequency characteristics - something which is not obvious with just the time domain representation of the indicator.

Matlab Compiler notes

Matlab No Comments »

This is a collection of notes made when using the Matlab compiler under Matlab 6 (2003) and Matlab 7 (2006) which may be of use to Matlab compiler users.

Matlab 6 notes

The Matlab compiler is a useful utility which enables you to turn your Matlab code into C/C++ and compile it. The compiler doesn’t really compile your code at all… it translates your Matlab code into C line for line. The C/C++ code produced is unreadable do don’t count on being able to modify it or even understand it! Mex files can be included, watch out for relative paths names though, some documentation on this on Matlab homepage.

Don’t use the remez filter generation command in Matlab as the compiler can’t compile it. There is a fix but it is complicated, see this.

Other functions such as eval and keyboard are unsupported by the compiler. The save command only supports .mat format, not ascii format.

Make sure you test the executable you’ve produced on other machines especially if you intend to deploy the code for other users! I have found that code would work fine on the machine it was compiled on but produce run-time errors on other machines due to missing dlls & incorrect relative paths.

When you supply a command line argument to the executable when running from the DOS command line, it is passed to the internal code as a string even if it is a number. You need to use a command such as

matComp6.PNG

to convert these into numbers. Of course now if you run this code under Matlab, it doesn’t work… I used a compile variable to execute different parts of code depending on whether we were running under Matlab or in a compiled version of the code:

matComp7.PNG

If you run the compiler command (mcc) from a DOS prompt or Perl script etc, you need to make sure you’ve got the Matlab path saved as a Windows variable or else the compiler can’t find any libraries such as the wavelet toolbox. Use the mccsavepath utility to do this.

When giving executable models to other users, you also need to supply the Matlab run-time libraries. These are supplied with matlab compiler, but can also be downloaded from the mathworks homepage.

Matlab 7 notes

For Matlab 7, the compiler has been revised extensively. It now encrypts your code along with the built-in Matlab functions you’ve used. It presumably decrypts these at run-time.

A few more issues:

You need to be VERY CAREFUL here as the compiler makes no effort to change either directory structures or filenames. If you are using some bespoke algorithm/ method in your system which you regard as proprietary and you name your filename after this, e.g. 16TapRLS.m then the one who uses your deployed application is going to see it!

Compiler doesn’t like working over networked file stores, sometimes it fails for strange reasons.

C files with a mex interface used to be called .dll files are now called .mexw32. Matlab 7 only uses .mexw32 file and will helpfully rename your .dll files to .dll.old when you recompile your .c files rendering them useless under Matlab 6.

Some things have been improved since version 6:

Introduction of isdeployed function to indicate if you are running in Matlab or compiled environment. The following code can be used to handle this whilst retaining backwards run compatability to Matlab 6.

matComp1.PNG

Compiled code still can’t handle numerical inputs from command line.

Compiled code can now load .mat files in same way as native code i.e.

matComp2.PNG

The mex function has the following format:

mexInt.jpg

The Matlab mex documentation gives a full explanation of how to access arguments and return values. The mex function allows multiple values to be returned from a C-function. However, the function arguments are passed by address and can therefore be changed within the C-code; these changes are reflected in the matlab environment.
An example shows a mex function called product, written in a file called product.c as:

matComp3.PNG

The mex call in matlab could be:

matComp4.PNG

The value of c will be 200, b will be 20 but a will now be 200 also. The example seems obvious but can be easily overlooked when dealing with more complex functions.

Other notes:

  • If the mex function creates static variables, these are created dynamically the first time the function is called and remain persistent for future calls.
  • The matlab clear function will clear only matlab variables from the workspace; clear all will properly clear any variables, such as static variables, allocated by the mex function.
  • The mex compiler error LINK : fatal error LNK1168: cannot open your_mexfunction.dll for writing is caused by static variables persisting after your_mexfunction has been called in the Matlab environment. The clear all command will remedy this.
  • The compiled mex function has a .dll extension under Matlab 6 (Matlab 7 uses .mexw32). If the function has been called, the DLL file cannot be deleted until clear your_mexfunction or clear all is used.
  • Creating a mex build script is difficult since matlab string variables cannot be referenced within the mex command. However, the mex command can be simplified by using the triple dot ‘…’ directive to separate the command. Particularly useful for very large build commands, e.g.

matComp5.PNG

GA and GP: Similarities and differences.

Genetic Algorithm, Genetic Programming 1 Comment »

This paper argues that the Genetic Algorithm and Genetic Programming are pretty much the same - at least in as far as the differentiating features are considered:

GA or GP? That is not the question
Woodward, J.R.
Evolutionary Computation, 2003. CEC apos;03. The 2003 Congress on
Volume 2, Issue , 8-12 Dec. 2003 Page(s): 1056 - 1063 Vol.2

GaGpOrNot.gif

Key points emerging from the paper:

  • The greatest distinction between GAs and GP is that of fixed or variable length. In some cases, the size of the required solution sought may be known beforehand, but in other cases, such as evolution of trading rules, imposing a fixed size could reduce the search space hence neglecting optimal solutions.
  • GAs do not necessarily represent the evolution of algorithms, but rather a subset of all possible algorithms. GP typically does not evolve computer programs as we would expect, but typically logical or mathematical expressions are evolved, or if then type rules of a classifier system.
  • In GAs, the mutation of bit strings takes the form of flipping a single bit. Two main crossover operators have been used; uniform crossover uniformly selects a bit from each position from one of the two parents, one point crossover selects a point at random along the genome of two parents
    and takes the left part of parent one and combines this with the right part parent two. In GP, mutation takes the form of replacing a randomly selected sub tree in a program with a randomly generated sub tree. Crossover swaps randomly selected sub trees of two programs.

Essentially the paper states that the differences are unimportant because whether we are considering GAs or GP, whatever the high level representation is, it will ultimately be stored in the computer’s memory as a bit string. I think it’s a fair enough argument but really does not really our day to day understanding and application of the of the two algorithm variants. Because even more “artificial” forms of the algorithm such as Grammatical Evolution (GE) use bit string at lower levels, but this is irrelevant to the problem solver.

Trees in GP

It might be worth hinting at tree representation in GP here.  I have shown two examples of tree representations with their equivalent expression:

gpTree1.png

gpTree2.png

You might want to try drawing trees for the following two expressions. Note that there is a comma at the end of the first line of each expression:

1.
times(times(times(X1,plus(cos(cos(X1)),X1)), sin(plus(cos(X1),sin(plus(cos(X1),X1))))),plus(cos(minus(X1,X1)),X1))

2.
nor(nor(nor(or(X2,X2),or(X2,X3)),and(nand(X2,X3),X1)), and(nand(and(X1,X1),X3),and(nand(X2,X1),or(X3,X1))))

You can find the tree representations for 1 here and for 2 here. 

I shall be using such representations in future posts so this post serves as a reference for any unfamiliarities with tree representations/expressions that may emerge.

Matlab Object Oriented Programming

Matlab No Comments »

I recently started to teach myself OOP techniques for Matlab. To start with I looked for free online resources and found some good links:

Following a specialized textbook on the subject has its own benefits and so I’ve decided to buy the following book:

It covers pretty much everything you need to know about Matlab OOP:

A Guide to MATLAB Object-Oriented Programming is the first book to deliver broad coverage of the documented and undocumented object-oriented features of MATLAB®. Unlike the typical approach of other resources, this guide explains why each feature is important, demonstrates how each feature is used, and promotes an understanding of the interactions between features. Assuming an intermediate level of MATLAB programming knowledge, the book not only concentrates on MATLAB coding techniques but also discusses topics critical to general software development. It introduces fundamentals first before integrating these concepts into example applications. In the first section, the book discusses eight basic functions: constructor, subsref, subsasgn, display, struct, fieldnames, get, and set. Building on the previous section, it explores inheritance topics and presents the Class Wizard, a powerful MATLAB class generation tool. The final section delves into advanced strategies, including containers, static variables, and function fronts. With more than 20 years of experience designing and implementing object-oriented software, the expert author has developed an accessible and comprehensive book that aids readers in creating effective object-oriented software using MATLAB.

Hopefully the concepts discussed in the book will come in handy when I decide to create my own Matlab toolboxes.

Statistical Arbitrage and Genetic Programming

Genetic Programming, Statistical Arbitrage 1 Comment »

The main idea in this presentation is that of co-evolving 2-branch type trees, where one branch represents buy rules and the other represents sell rules for trading. The intuition is that when the branches are evolved together, your final genetic program ends up with buy and sell rules that are duals of each other. Hence an optimum buy rule can be paired with an optimum sell rule - which forms a dual. This is unlike the approach where a sell rule is triggered only if certain criteria for a buy are not met (which may not necessarily be optimum condition for the sell). Although the results in the presentation are somewhat ambiguous, the author points out that it is possible to discover profitable trading rules in the presence of transaction costs under a statistical arbitrage framework. The author also cites a few papers which have applied GP techniques to Statistical Arbitrage. I shall dig up these papers to investigate the approach taken.

Close
E-mail It