# A Better Algorithm to check for non-contiguous groupings in a 2D array

I'm making a word game. The board is represented as a 2D C array. A simplified version of my populated array might look something like this.

```[x][x]
[x][x]
[x][x][x][x][x]
[x]
```

There are no diagonal connections between x blocks, only up, down, left and right.

I want an algorithm to check if there are two non-contiguous 'blocks' of X values. This algorithm, given a particular 2D game array, will enable me to decide between:

• Yes, there is only one contiguous block of Xs
• No, there is more than one contiguous block of Xs

For the example shown above, I would find 'No', because there is more than one contiguous block of Xs.

My first simple idea is to find one contiguous block of Xs and catalogue those values. It is easy for me to get a starting point in the middle of a contiguous block, for reasons that aren't important to this question. If I can find an X anywhere that is not in my catalogued list, I can return 'No', else, 'Yes'. However, this strikes me as a pretty inefficient way of doing it.

I understand that I would need to touch every item of the 2D array to comprehensively rule out another contiguous block. I'm more interested in the quickest way of finding a 'No'.

Because the array is pretty small, I could implement my idea without any real performance problems. However, I'm sure there is a more elegant way, and I thought this problem might interest some algorithm-loving part of the stack overflow hive mind.

This could also be described somewhere in another SO question or the internet at large. Maybe I just don't know the right way to phrase the question. If you do, please enlighten me. Best solution also gets a free copy of the as-yet-to-be-completed iOS word game. :-D

Your array forms a graph, and you are asking if the graph has more than one connected component. The easiest way to do this is probably to copy the array, erase the first connected component you find, then look for another X. If you find one, there were more than one component to begin with.

Erasing is a nice application of graph search. For example, you could do it this way (in C):

```#define N_ROWS 4
#define N_COLS 7
typedef char WORD_BOX[N_ROWS][N_COLS];

void erase_cc(WORD_BOX box, int i, int j)
{
if (i < 0 || j < 0 || i >= N_ROWS || j >= N_COLS || box[i][j] != 'X')
return;
box[i][j] = 'O';
erase_cc(box, i - 1, j);
erase_cc(box, i + 1, j);
erase_cc(box, i, j - 1);
erase_cc(box, i, j + 1);
}

int find_cc(WORD_BOX box, int *i_ret, int *j_ret)
{
int i, j;
for (i = 0; i < N_ROWS; i++)
for (j = 0; j < N_COLS; j++)
if (box[i][j] == 'X') {
*i_ret = i;
*j_ret = j;
return 1;
}
return 0;
}

int more_than_one_cc(WORD_BOX box)
{
int i,  j;
WORD_BOX copy;
memcpy(copy, box, sizeof(WORD_BOX));
if (find_cc(copy, &i, &j)) {
erase_cc(copy, i, j);
return find_cc(copy, &i, &j);
}
return 0;
}
```

The following optimal solution as a running time of O(n) and computes not only whether there are more than two contiguous sequences or not, but also

• returns the exact number of sequences found (horizontal and vertical)

• returns the position and length of every sequence, in both horizontal and vertical directions (with a very small rearrangement of the code).

The solution to your problem can be solved very efficiently and simply, not by using any kind of graph theory, but simply by using your knowledge of arrays, and leveraging your knowledge about how they are laid out in memory. The following C program demonstrates the principle.

```static inline void TrackLength(int **hpnCountPtr, int *pSeqStart)
{
(*((*hpnCountPtr) ? (*hpnCountPtr) : (*hpnCountPtr = pSeqStart)))++;
}

static int FindSequences(char *pcbGrid, int nWidth, int nHeight)
{
int    nColIndex, nRowIndex, nSeqTotal;
int   *panResult, *pnHorzSeq, *panVTable;
int  **panVertSeqs;
size_t nResultLen;
char  *pcbCheckPtr;

nResultLen = (size_t) (2 * nWidth * nHeight) * sizeof(int);
panResult  = (int*) memset(alloca(nResultLen), 0, nResultLen);
panVTable  = panResult + (nWidth * nHeight);

panVertSeqs = (int**) alloca(nWidth * sizeof(int*));
for (nColIndex = 0; nColIndex < nWidth; panVertSeqs[nColIndex++] = (int*) NULL);

for (pcbCheckPtr = pcbGrid, nRowIndex = 0; nRowIndex < nHeight; nRowIndex++)
for (pnHorzSeq = (int*) NULL, nColIndex = 0; nColIndex < nWidth; nColIndex++, pcbCheckPtr++)
{
if (*pcbCheckPtr != 'X')
{
pnHorzSeq = panVertSeqs[nColIndex] = (int*) NULL;
continue;
}

TrackLength(&pnHorzSeq, panResult + (pcbCheckPtr - pcbGrid));
TrackLength(panVertSeqs + nColIndex, panVTable + (pcbCheckPtr - pcbGrid));
}

/* Insert Debug Information Here */

for (nSeqTotal = 0, nColIndex = nWidth * nHeight; nColIndex; nColIndex--)
if (*panResult++ > 1)
nSeqTotal++;

return nSeqTotal;
}
```

The key was in your requirement that only horizontal and vertical sequences need to be found, not diagonal. Being a 2D array, this means that huge optimisations can be had by knowing that 2D arrays are simply flat, contiguous, blocks of memory that can be walked from start to finish in one pass.

Leveraging the memory arrangement, horizontal sequences can be tracked using just a single counter that is incremented for every consecutive memory location that is an X, and reset to zero every time a non-X is encountered. If the counter is forcefully reset at the end of every "row" in the 2D view of the array, this will successfully track the horizontal sequences.

At the same time the horizontal sequences are being tracked, we can track the vertical ones simultaneously. However, as adjacent cells in a column in the array appear at a spacing of width bytes, we must keep an array of width counters which track all vertical sequences possible in a row as we scan the row, one sequence for each column.

Finally, the real beauty of my solution is that the counters are actually pointers into two 2D arrays (one for tracking vertical sequences, and one for tracking horizontal sequences). Each location in each array represents the potential start of a sequence, and when the function completes, each array will contain in each location the length of each sequence that starts at that location.

This is best demonstrated using the diagram below:

```   Grid      Horizontal    Vertical
Sequences    Sequences

....X...X.   0000100010   0000300030
.X..X...X.   0100100010   0300000000
.XXXX..XXX   0400000300   0011000201
.X.....X..   0100000100   0000000000
```

Each digit in the Horizontal Sequences array represents the length of a horizontal sequence that starts at that point. The same goes for the Vertical Sequences array for vertical sequences.

As the program makes one, and exactly one, pass over the input array, and the processing time for each array element is constant, this algorithm runs in O(n) and is optimal for any size of grid.

If you insert the above functions in the program below:

```#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <alloca.h>

#define GRID_WIDTH  18
#define GRID_HEIGHT  9

static char *s_szGrid =
"X X X X X X X X X "
" X X X X X X X X X"
"X X X X X X X X X "
" X X X X X X X X X"
"XXX X X X X X X X "
" X X X X X X X X X"
"XXXXX X X X X X X "
" X X X X X X X X X"
"X X XXX X X X X X ";

void main(void)
{
int nSeqTotal = FindSequences(s_szGrid, GRID_WIDTH, GRID_HEIGHT);

printf("\nGrid has %i contiguous sequences\n", nSeqTotal);

if (nSeqTotal)
printf(">>> %sone sequence detected\n", (nSeqTotal > 1) ? "more than " : "");
}
```

And you insert the following chunk of debug code into the functions where you see the /* Insert Debug Information Here */ comment:

```{
int  nXIndex, nYIndex;
int *pnValue;

printf("Horizontal:\n");
for (pnValue = panResult, nYIndex = 0; nYIndex < nHeight; nYIndex++, printf("\n"))
for (nXIndex = 0; nXIndex < nWidth; nXIndex++, printf("%2i", *pnValue++));

printf("\nVertical:\n");
for (nYIndex = 0; nYIndex < nHeight; nYIndex++, printf("\n"))
for (nXIndex = 0; nXIndex < nWidth; nXIndex++, printf("%2i", *pnValue++));
}
```

Then the program will produce the following output:

```Horizontal:
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
3 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
5 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 3 0 0 0 1 0 1 0 1 0 1 0 1 0

Vertical:
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 5 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 0 0 3 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 0 0 0 0 2 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0

Grid has 6 contiguous sequences
>>> more than one sequence detected
```

If the input grid is changed to this, however:

```static char *s_szGrid =
"X X X X X X X X X "
" X X X X X X X X X"
"X X X X X X X X X "
" X X X X X X X X X"
"X X X X X X X X X "
" X X X X X X X X X"
"X X X X X X X X X "
" X X X X X X X X  "
"X X X X X X X X XX";
```

Then the output will be:

```Horizontal:
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 2 0

Vertical:
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1

Grid has 1 contiguous sequences
>>> one sequence detected
```