z score with nan values in matlab (vectorized)

I am trying to calculate the zscore for a vector of 5000 rows which has many nan values. I have to calculate this many times so I dont want to use a loop, I was hoping to find a vectorized solution.

the loop solution:

for i = 1:end
   vec(i,1) = (val(i,1) - nanmean(:,1))/nanstd(:,1)
end

a partial vectorized solution:

zscore(vec(find(isnan(vec(1:end) == 0))))

but this returns a vector the length of the original vector minus the nan values. Thus it isn't the same as the original size.

I want to calculated the zscore for the vector and then interpolate missing data after words. I have to do this 100s of times thus I am looking for a fast vectorized approach.

Answers


This is a vectorized solution:

% generate some example data with NaNs.

val = reshape(magic(4), 16, 1);
val(10) = NaN;
val(17) = NaN;

Here's the code:

valWithoutNaNs = val(~isnan(val));
valMean = mean(valWithoutNaNs);
valSD = std(valWithoutNaNs);
valZscore = (val-valMean)/valSD;

Then column vector valZscore contains deviations (Z scores), and has NaN values for NaN values in val, the original measurement data.


Sorry this answer is 6 months late, but for anyone else who comes across this thread:

The accepted answer isn't fully vectorised in that it doesn't do what the real zscore does so beautifully: That is, do zscores along a particular dimension of a matrix.

If you want to calculate zscores of a large number of vectors at once, as the OP says he is doing, the best solution is this:

Z = bsxfun(@divide, bsxfun(@minus, X, nanmean(X)) , 
                   nanstd(X) );

To do it on an arbitrary dimension, just put the dimension inside the nanmean and nanstd, and bsxfun takes care of the rest.

nanzscore = @(X,DIM) bsxfun(@divide, bsxfun(@minus, X, nanmean(X,DIM)), ...
                                     nanstd(X,DIM));

anonymous function:

nanZ = @(xIn)(xIn-nanmean(xIn))/nanstd(xIn);

nanZ(vectorWithNans)


vectorized version of below anonymous function (assumes observations are in rows, variables in columns):

nanZ = @(xIn)(xIn-repmat(nanmean(xIn),size(xIn,1),1))./repmat(nanstd(xIn),size(xIn,1),1);
nanZ(matrixWithNans)

Need Your Help

Architecture for multiple web apps and databases

php database postgresql architecture

We used to have only one web app, but now we are breaking it down into multiple ones. Each one will be packaged as separate product (web app) Some have things in common some do not.

how to execute if else condition with ssh to remote server

shell unix ssh

I am using following script to remote connection to a server and then if connection is successful then echoing success and if success then trying to sudo a file