As a Matlab user I’ve found writing code that make use of very large data sets can sometimes run very slow.  Memory performance is not increased at the same rate as CPU performance if the structure of the code is such that it is ”memory-bound”.  But with a little knowledge about how Matlab stores and accesses data, we can avoid inefficient memory usage and hence improve execution time.

As an example let us consider two similar snippets of code below:

A1A2.PNG

Code A1 time = 56.56 seconds.  Code A2 time = 0.032 seconds

We see a very big difference in execution times for both codes which do the same thing!  So why is code A2 faster?  Matlab does not require the user to declare the types and sizes of variables before they are used.  The drawback of this is that each time you use an array variable for instance, Matlab must allocate memory for the a new larger array and copy existing data into it.  This can be very memory intensive and hence slow down your code as in A1.  Code A2 has a section of memory already allocated before it enters the for loop and as a result runs 1750% faster.  Try it!

Another example is to do with the fact that Matlab favours Columns over Rows:

B1B2.PNG

 Code B1 time = 1.34 seconds.  Code B2 time = 1.01 seconds

Code B2 is faster because it traverses the elements of the 2-D array by going down the columns in the inner loop, which is similar to the ”fast cache” mechanism CPUs use to reduce the time taken to access buffers in main memory.  Similar results are noted for arrays of N-dimension.

Main points:

  • Preallocate large vectors and matrices before populating them with data they need to hold.
  • When processing 2-D or N-D arrays, access your data in columns rather than rows.
  • Avoid creating unnecessary variables unless they are essential to your algorithm.
  • Try to do most of your processing within variables that already exist.