Why does vectorized code run faster than for loops in MATLAB?
I've read this but I still don't understand why vectorized code is faster.
In for loops, I can use parfor to for parallel computation. If vectorized code is faster, does it means that it is automatically parallelized?
No. You're mixing two important concepts:
- MATLAB is designed to perform vector operations really quickly. MATLAB is an interpreted language, which is why loops are so slow in it. MATLAB sidesteps this issue by providing extremely fast (usually written in C, and optimized for the specific architecture) and well tested functions to operate on vectors. There is really no magic here, it is just a bunch of hard work and many years of constant small improvements.
Consider for example a trivial case such as the following:
s=0; for i=1:length(v), s = s+v(i); end
you should probably use tic and toc to time these two functions to convince yourself of the difference in runtime. There are about 10 similar commonly used functions that operate on vectors, examples are: bsxfun, repmat, length, find. Vectorization is a standard part of using MATLAB effectively. Until you can vectorize code effectively you're just a tourist in the world of MATLAB not a citizen.
- Recent versions of MATLAB provide parfor. parfor is not a silver bullet, it is a tool that can be used and misused (try parfor on the sum example above). Not all fors can be parfored. parfor is designed for task-parallel types of problems where each iteration of the loop is independent of each other iteration. This is a key requirement for using a parfor-loop.
While in many cases parfor can help a lot the type of loops that can be parfored for very large gains occur seldomly.
I agree with carlosdc on his answer. However, it is important to remember that Matlab since release 6.5 has included a JIT compiler for speeding up for-loops and the like.
I made a quick test of your sum example with a million elements in v and got the following results:
- sum(v): 4.3 ms
- for-loop version : 16 ms
- for-loop version, no JIT : 966 ms
The JIT can be turned on and off like this:
feature accel off feature accel on
A factor 4 in improvement by vectorizing code is of course still often worth it, but the for-loops shouldn't be feared as they once were for problems where they are otherwise a good solution. Often though, a piece of well vectorized code can often be simpler, less error prone and faster at the same time.
In modern computers, the registers (temporary memory used for math, among other uses) have many bits and can manipulate multiple numbers together. For example if your data is uint8 (8 bits), you can add a number to each one in one CPU-clock, or you can put 8 of them together in the register and and a number to all of them in one CPU-clock. This way you work 8 times faster than for-loop.
This is in a sense parallelization, but not like parfor. Parfor uses multiple cores of your CPU, and in the above method one core is used more efficiently. If you use them both, you can achieve even higher speeds.