recipes : programming : Writing better code : Vectorising for speed

Problem

How do I vectorise to increase speed?

Solution

The name "Matlab" is an abbreviation of "Matrix Laboratory" because the software is designed to make working with matrices quick and efficient. The term "vectorising" refers to the process of writing code in a succinct, matrix-oriented way. Generally speaking, vectorised code is that which has been written so as to avoid excessive use of loops. Matlab contains a lot of built-in functions and idioms which make this efficient and easy.

Programming languages (Matlab included) allow you to perform the same task in various different ways. Usually, however, some solutions are better than others. In Matlab it is often the case that the vectorised form executes faster and takes up fewer lines of code. You need to experiment, though, as this is not always the case. Here are some examples of vectorised and non-vectorised code:

Set all numbers greater than zero to zero

%Non-vectorised
data=randn(5000); %Make a big array

%Go through each element and set to zero if necessary
tic
for ii = 1:5000
   for jj = 1:5000
     if data(ii,jj) > 0
           data(ii,jj) = 0;
     end
   end
end
toc
Elapsed time is 2.355893 seconds



%Vectorised example one
data=randn(5000);
tic
f=find(data>0); %creating this variable takes time
data(f)=0;
toc
Elapsed time is 0.458863 seconds


%Vectorised example two
data=randn(5000);
tic, data(data>0)=0; toc
Elapsed time is 0.258838 seconds


%Using the built-in "floor" function
data=randn(5000);
tic, floor(data); toc
Elapsed time is 0.286484 seconds

As you can see, the three versions without a loop are about 5 to 10 times faster than the version using the loop. Once you understand what they're doing, they're also easier to read since they take up fewer lines of code.

Now let's do something a little harder. We'll make one large matrix (big) composed of decimals and a second, smaller, matrix (small) composed of integers. For each integer in small we want to find the index in big that contains the decimal to which it is closest in value. We'll do this the vectorised and the non-vectorised way.

%Let's make up a big data array and add some uniformly-distributed noise. 
n=1E5;
big = 1:n + rand(1,n) ;

%Which indices in "big" contain decimals closest to these?
small = [2,50,101,3420,4521,7301,9020] ;

%One non-vectorised solution:
tic
for ii=1:length(small)
   tmp=abs(big-small(ii));
   [A,IND(ii)]=min(tmp);
end
toc
Elapsed time is 0.698745 seconds.


%As one would expect:
>> IND
 IND = 
   2     50    101   3420   4521   7301   9020


% The vectorised solution
tic
[N,IND] = min(abs(repmat(big',1, length(small)) - repmat(small,length(big),1) )) ;
toc
Elapsed time is 0.596945 seconds.

In this case there is little difference between the vectorised and non-vectorised solutions. Why is that? One reason is that I've cheated here! Remember that many built-in functions are "vectorised", in the sense that they take an operation that could be applied on each individual element and apply it to the whole array at once. That's what you saw in the first example with the nested loops. In this second example we're not really doing that. Yes, our "non-vectorised" version contains a loop but look what else it contains: two lines of code that both are vectorised. Both tmp=abs(big-small(ii)) and [A,IND(ii)]=min(tmp) are vectorised operations. The so-called vectorised solution is a one-liner with no loops and is, indeed, both vectorised and slightly faster. However, in this case these advantages come at a cost: it's harder to read and takes up more RAM.

It's harder to read because it packs rather a lot into a small space. Code written like this needs more comments and can be harder to maintain. I will leave it as an exercise for the reader to figure out how it works (it's not hard). You should find you can make sense of the "non-vectorised" version quicker than you can the "vectorised" version.

It takes up more RAM because the repmat commands produce two matrices containing 100000 rows (the length of big) and 7 columns (the length of small). So the amount of memory needed grows geometrically with the length of small. You will run into problems if your small vector isn't small!

So this second example illustrates an important point. Whilst it's often possible to write a clever one-liner, these may come at a cost. You need to weigh up if the speed increase is worth it.