MATLAB is running out of memory but it should not be
The data is <16 x 1036800 double>. This runs our of memory which is too be expected except for the fact that this is a new computer, the computer holds 24GB of RAM for data mining. MATLAB even lists the 24GB available on a memory check.
Is MATLAB actually running out of memory while performing a PCA or is MATLAB not using the RAM to it's full potential? Any information or ideas would be helpful. (I may need to increase the virtual memory but assumed the 24GB would have sufficed.)
For a data matrix of size n-by-p, PRINCOMP will return a coefficient matrix of size p-by-p where each column is a principal component expressed using the original dimensions, so in your case you will create an output matrix of size:
1036800*1036800*8 bytes ~ 7.8 TB
Consider using PRINCOMP(X,'econ') to return only the PCs with significant variance
Alternatively, consider performing PCA by SVD: in your case n<<p, and the covariance matrix is impossible to compute. Therefore, instead of decomposing the p-by-p matrix XX', it is sufficient to only decompose the smaller n-by-n matrix X'X. Refer to this paper for reference.
Here's my implementation, the outputs of this function match those of PRINCOMP (the first three anyway):
function [PC,Y,varPC] = pca_by_svd(X) % PCA_BY_SVD % X data matrix of size n-by-p where n<<p % PC columns are first n principal components % Y data projected on those PCs % varPC variance along the PCs % X0 = bsxfun(@minus, X, mean(X,1)); % shift data to zero-mean [U,S,PC] = svd(X0,'econ'); % SVD decomposition Y = X0*PC; % project X on PC varPC = diag(S'*S)' / (size(X,1)-1); % variance explained end
I just tried it on my 4GB machine, and it ran just fine:
» x = rand(16,1036800); » [PC, Y, varPC] = pca_by_svd(x); » whos Name Size Bytes Class Attributes PC 1036800x16 132710400 double Y 16x16 2048 double varPC 1x16 128 double x 16x1036800 132710400 double
The princomp function became deprecated in favor of pca introduced in R2012b, which includes many more options.
Matlab has hardcoded limitations on matrix sizes. See this link. If you think you're not passing up those limits, then you probably have a bug in your code and actually are.
Mathworks engineer Stuart McGarrity recorded a nice webinar surveying diagnosis techniques and common solutions. If you're data is indeed within allowed limits, the issue might be memory fragmentation - which is easily solvable.