# Finding islands of zeros in a sequence

Imagine you have a very long sequence. What is the most efficient way of finding the intervals where the sequence is all zeros (or more precisely the sequence drops to near-zero values abs(X)<eps):

For simplicity, lets assume the following sequence:

```sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
```

I'm trying to get the following information:

```startIndex   EndIndex    Duration
3            6           4
12           12          1
14           16          3
25           26          2
30           30          1
```

then using this information, we find the intervals with duration >= to some specified value (say 3), and returning the indices of the values in all these intervals combined:

```indices = [3 4 5 6 14 15 16];
```

That last part is related to a previous question:

This is what I have so far:

```sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
len = length(sig);
thresh = 3;

%# align the signal with itself successively shifted by one
%# v will thus contain 1 in the starting locations of the zero interval
v = true(1,len-thresh+1);
for i=1:thresh
v = v & ( sig(i:len-thresh+i) == 0 );
end

%# extend the 1's till the end of the intervals
for i=1:thresh-1
v(find(v)+1) = true;
end

%# get the final indices
v = find(v);
```

I'm looking to vectorize/optimize the code, but I'm open to other solutions. I have to stress that space and time efficiencies are very important, since I'm processing a large number of long bio-signals.

These are the steps I would take to solve your problem in a vectorized way, starting with a given vector sig:

• First, threshold the vector to get a vector tsig of zeros and ones (zeroes where the absolute value of the signal drops close enough to zero, ones elsewhere):

```tsig = (abs(sig) >= eps);  %# Using eps as the threshold
```
• Next, find the starting indices, ending indices, and duration of each string of zeroes using the functions DIFF and FIND:

```dsig = diff([1 tsig 1]);
startIndex = find(dsig < 0);
endIndex = find(dsig > 0)-1;
duration = endIndex-startIndex+1;
```
• Then, find the strings of zeroes with a duration greater than or equal to some value (such as 3, from your example):

```stringIndex = (duration >= 3);
startIndex = startIndex(stringIndex);
endIndex = endIndex(stringIndex);
```
• Finally, use the method from my answer to the linked question to generate your final set of indices:

```indices = zeros(1,max(endIndex)+1);
indices(startIndex) = 1;
indices(endIndex+1) = indices(endIndex+1)-1;
indices = find(cumsum(indices));
```

You can solve this as a string search task, by finding strings of zeros of length thresh (STRFIND function is very fast)

```startIndex = strfind(sig, zeros(1,thresh));
```

Note that longer substrings will get marked in multiple locations but will eventually be joined once we add in-between locations from intervals start at startIndex to end at start+thresh-1.

```indices = unique( bsxfun(@plus, startIndex', 0:thresh-1) )';
```

Note that you can always swap this last step with the CUMSUM/FIND solution by @gnovice from the linked question.

```function indice=sigvec(sig,thresh)
%extend sig head and tail to avoid 0 head and 0 tail

exsig=[1,sig,1];
%convolution sig with extend sig
cvexsig=conv(exsig,ones(1,thresh));
tempsig=double(cvexsig==0);

indice=find(conv(tempsig,ones(1,thresh)))-thresh;
```

the above answer by genovice can be modified to find the indices of non-zero elements in a vector as:

```    tsig = (abs(sig) >= eps);
dsig = diff([0 tsig 0]);
startIndex = find(dsig > 0);
endIndex = find(dsig < 0)-1;
duration = endIndex-startIndex+1;
```

As gnovice showed, we'll do a threshold test to make "near zero" really zero:

```logcl = abs(sig(:)) >= zero_tolerance;
```

Then find regions where the cumulative sum isn't increasing:

```cs = cumsum(logcl);
islands = cs(1+thresh:end) == cs(1:end-thresh);
```
```v = zeros(1,max(endInd)+1);   %# An array of zeroes
v(startInd) = 1;              %# Place 1 at the starts of the intervals
v(endInd+1) = v(endInd+1)-1;  %# Add -1 one index after the ends of the intervals
indices = find(cumsum(v));  %# Perform a cumulative sum and find the nonzero entries
```

We note that our islands vector already has ones in the startInd locations, and for our purposes endInd always comes thresh spots later (longer runs have runs of ones in islands)

```endcap = zeros(thresh,1);
indices = find(cumsum([islands ; endcap] - [endcap ; islands]))
```

# Test

```sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
logcl = abs(sig(:)) >= .1;
cs = cumsum(logcl);
islands = cs(1+thresh:end) == cs(1:end-thresh);
endcap = zeros(thresh,1);
indices = find(cumsum([islands ; endcap] - [endcap ; islands]))
```
```indices =

2
3
4
5
13
14
15
```

I think the most MATLAB/"vectorized" way of doing it is by computing a convolution of your signal with a filter like [-1 1]. You should look at the documentation of the function conv. Then on the output of conv use find to get the relevant indexes.