Explanation of Merge Sort for Dummies

I found this code online:

def merge(left, right):
    result = []
    i ,j = 0, 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result += left[i:]
    result += right[j:]
    return result

def mergesort(list):
    if len(list) < 2:
        return list
    middle = len(list) / 2
    left = mergesort(list[:middle])
    right = mergesort(list[middle:])
    return merge(left, right)

It works 100% when I run it. I just do not really get how the merge sort works or how the recursive function is able to properly order both a left and a right.

Answers


I believe that the key to understanding merge sort is understanding the following principle -- I'll call it the merge principle:

Given two separate lists A and B ordered from least to greatest, construct a list C by repeatedly comparing the least value of A to the least value of B, removing the lesser value, and appending it onto C. When one list is exhausted, append the remaining items in the other list onto C in order. The list C is then also a sorted list.

If you work this out by hand a few times, you'll see that it's correct. For example:

A = 1, 3
B = 2, 4
C = 
min(min(A), min(B)) = 1

A = 3
B = 2, 4
C = 1
min(min(A), min(B)) = 2

A = 3
B = 4
C = 1, 2
min(min(A), min(B)) = 3

A = 
B = 4
C = 1, 2, 3

Now A is exhausted, so extend C with the remaining values from B:

C = 1, 2, 3, 4

The merge principle is also easy to prove. The minimum value of A is less than all other values of A, and the minimum value of B is less than all other values of B. If the minimum value of A is less than the minimum value of B, then it must also be less than all values of B. Therefore it is less than all values of A and all values of B.

So as long as you keep appending the value that meets those criteria to C, you get a sorted list. This is what the merge function above does.

Now, given this principle, it's very easy to understand a sorting technique that sorts a list by dividing it up into smaller lists, sorting those lists, and then merging those sorted lists together. The merge_sort function is simply a function that divides a list in half, sorts those two lists, and then merges those two lists together in the manner described above.

The only catch is that because it is recursive, when it sorts the two sub-lists, it does so by passing them to itself! If you're having difficulty understanding the recursion here, I would suggest studying simpler problems first. But if you get the basics of recursion already, then all you have to realize is that a one-item list is already sorted. Merging two one-item lists generates a sorted two-item list; merging two two-item lists generates a sorted four-item list; and so on.


When I'm stumbled into diffuculty to uderstand how the algorithm works, I add debug output to check what really happens inside the algorithm.

Here the code with debug output. Try to uderstand all the steps with recursive calls of mergesort and what merge does with the output:

def merge(left, right):
    result = []
    i ,j = 0, 0
    while i < len(left) and j < len(right):
        print('left[i]: {} right[j]: {}'.format(left[i],right[j]))
        if left[i] <= right[j]:
            print('Appending {} to the result'.format(left[i]))           
            result.append(left[i])
            print('result now is {}'.format(result))
            i += 1
            print('i now is {}'.format(i))
        else:
            print('Appending {} to the result'.format(right[j]))
            result.append(right[j])
            print('result now is {}'.format(result))
            j += 1
            print('j now is {}'.format(j))
    print('One of the list is exhausted. Adding the rest of one of the lists.')
    result += left[i:]
    result += right[j:]
    print('result now is {}'.format(result))
    return result

def mergesort(L):
    print('---')
    print('mergesort on {}'.format(L))
    if len(L) < 2:
        print('length is 1: returning the list withouth changing')
        return L
    middle = len(L) / 2
    print('calling mergesort on {}'.format(L[:middle]))
    left = mergesort(L[:middle])
    print('calling mergesort on {}'.format(L[middle:]))
    right = mergesort(L[middle:])
    print('Merging left: {} and right: {}'.format(left,right))
    out = merge(left, right)
    print('exiting mergesort on {}'.format(L))
    print('#---')
    return out


mergesort([6,5,4,3,2,1])

Output:

---
mergesort on [6, 5, 4, 3, 2, 1]
calling mergesort on [6, 5, 4]
---
mergesort on [6, 5, 4]
calling mergesort on [6]
---
mergesort on [6]
length is 1: returning the list withouth changing
calling mergesort on [5, 4]
---
mergesort on [5, 4]
calling mergesort on [5]
---
mergesort on [5]
length is 1: returning the list withouth changing
calling mergesort on [4]
---
mergesort on [4]
length is 1: returning the list withouth changing
Merging left: [5] and right: [4]
left[i]: 5 right[j]: 4
Appending 4 to the result
result now is [4]
j now is 1
One of the list is exhausted. Adding the rest of one of the lists.
result now is [4, 5]
exiting mergesort on [5, 4]
#---
Merging left: [6] and right: [4, 5]
left[i]: 6 right[j]: 4
Appending 4 to the result
result now is [4]
j now is 1
left[i]: 6 right[j]: 5
Appending 5 to the result
result now is [4, 5]
j now is 2
One of the list is exhausted. Adding the rest of one of the lists.
result now is [4, 5, 6]
exiting mergesort on [6, 5, 4]
#---
calling mergesort on [3, 2, 1]
---
mergesort on [3, 2, 1]
calling mergesort on [3]
---
mergesort on [3]
length is 1: returning the list withouth changing
calling mergesort on [2, 1]
---
mergesort on [2, 1]
calling mergesort on [2]
---
mergesort on [2]
length is 1: returning the list withouth changing
calling mergesort on [1]
---
mergesort on [1]
length is 1: returning the list withouth changing
Merging left: [2] and right: [1]
left[i]: 2 right[j]: 1
Appending 1 to the result
result now is [1]
j now is 1
One of the list is exhausted. Adding the rest of one of the lists.
result now is [1, 2]
exiting mergesort on [2, 1]
#---
Merging left: [3] and right: [1, 2]
left[i]: 3 right[j]: 1
Appending 1 to the result
result now is [1]
j now is 1
left[i]: 3 right[j]: 2
Appending 2 to the result
result now is [1, 2]
j now is 2
One of the list is exhausted. Adding the rest of one of the lists.
result now is [1, 2, 3]
exiting mergesort on [3, 2, 1]
#---
Merging left: [4, 5, 6] and right: [1, 2, 3]
left[i]: 4 right[j]: 1
Appending 1 to the result
result now is [1]
j now is 1
left[i]: 4 right[j]: 2
Appending 2 to the result
result now is [1, 2]
j now is 2
left[i]: 4 right[j]: 3
Appending 3 to the result
result now is [1, 2, 3]
j now is 3
One of the list is exhausted. Adding the rest of one of the lists.
result now is [1, 2, 3, 4, 5, 6]
exiting mergesort on [6, 5, 4, 3, 2, 1]
#---

Merge sort has always been one of my favorite algorithms.

You start with short sorted sequences and keep merging them, in order, into larger sorted sequences. So simple.

The recursive part means you're working backwards - starting with the whole sequence and sorting the two halves. Each half is also split, until the sort becomes trivial when there's only zero or one element in the sequence. As the recursed functions return the sorted sequences get larger just as I said in the initial description.


A couple of ways to help yourself understand this:

Step through the code in a debugger and watch what happens. Or, Step through it on paper (with a very small example) and watch what happens.

(personally I find doing this kind of thing on paper more instructive)

Conceptually it works like this: The input list keep getting chopped up in to smaller and smaller pieces by being halved (e.g. list[:middle] is the first half). Each half is halved again and again until it has a length of less than 2. I.e. until it's nothing at all or a single element. These individual pieces are then put back together by the merge routine, by appending or interleaving the 2 sub lists to the result list, and hence you get a sorted list. Because the 2 sub lists must be sorted, the appending/interleave is a fast (O(n)) operation.

The key to this (in my view) is not the merge routine, that is pretty obvious once you understand that the inputs to it will always be sorted. The "trick" (I use quotes because it's not a trick, it's computer science:-) ) is that in order to guarantee that the inputs to merge are sorted you have to keep recursing down until you get to a list that must be sorted, and that is why you keep recursively calling mergesort until the list is less than 2 elements long.

Recursion, and by extension merge sort, can be non-obvious when you first come across them. You might want to consult a good algorithms book (e.g. DPV is available online, legally and for free), but you can get a long way by stepping through the code you have. If you realy want to get in to it the Stanford/Coursera algo course will be running again soon and he covers Merge sort in fine detail.

If you really want to understand it, read chapter 2 of that book reference, then throw away the code above and re-write from scratch. Seriously.


A picture is worth a thousand words, and an animation worth 10,000.

Checkout the following animation taken from Wikipedia that will help you visualize how the merge sort algorithm actually works.

Detailed animation with explanation for each step in the sorting process for the inquisitive ones.

Another interesting animation of various types of sorting algorithms.


basically you get your list, then you split it and then sort it, but you apply this method recursively so you end up spliting it again, and again until you have a trivial set that you can sort easily and then merge all the simples solutions to get a fully sorted array.


You can have a good visualization on how the merge sort works here:

http://www.ee.ryerson.ca/~courses/coe428/sorting/mergesort.html

I hope it helps.


As the Wikipedia article explains, there are many valuable ways to accomplish a merge sort. The way to accomplish a merge also depends upon the collection of things to be merged, certain collections enabling certain tools that the collection has at its disposal.

I'm not going to answer this question using Python, simply because I can't write it; however, a taking a part of the "merge sort" algorithm seems really to be at the heart of the question, at large. A resource that helped me is the K.I.T.E's rather outdated webpage on the algorithm (written by a professor), simply because the author of the content eliminates context-meaningful identifiers.

My answer is derived from this resource.

Remember, merge sort algorithms work by taking apart the supplied collection and then putting each of the individual pieces together again, comparing the pieces to each other as the collection is re-built.

Here, is the "code" (look to the end for a Java "fiddle"):

public class MergeSort {

/**
 * @param a     the array to divide
 * @param low   the low INDEX of the array
 * @param high  the high INDEX of the array
 */
public void divide (int[] a, int low, int high, String hilo) {


    /* The if statement, here, determines whether the array has at least two elements (more than one element). The
     * "low" and "high" variables are derived from the bounds of the array "a". So, at the first call, this if 
     * statement will evaluate to true; however, as we continue to divide the array and derive our bounds from the 
     * continually divided array, our bounds will become smaller until we can no longer divide our array (the array 
     * has one element). At this point, the "low" (beginning) and "high" (end) will be the same. And further calls 
     * to the method will immediately return. 
     * 
     * Upon return of control, the call stack is traversed, upward, and the subsequent calls to merge are made as each 
     * merge-eligible call to divide() resolves
     */
    if (low < high) {
        String source = hilo;
        // We now know that we can further divide our array into two equal parts, so we continue to prepare for the division 
        // of the array. REMEMBER, as we progress in the divide function, we are dealing with indexes (positions)

        /* Though the next statement is simple arithmetic, understanding the logic of the statement is integral. Remember, 
         * at this juncture, we know that the array has more than one element; therefore, we want to find the middle of the 
         * array so that we can continue to "divide and conquer" the remaining elements. When two elements are left, the
         * result of the evaluation will be "1". And the element in the first position [0] will be taken as one array and the
         * element at the remaining position [1] will be taken as another, separate array.
         */
        int middle = (low + high) / 2;

        divide(a, low, middle, "low");
        divide(a, middle + 1, high, "high");


        /* Remember, this is only called by those recursive iterations where the if statement evaluated to true. 
         * The call to merge() is only resolved after program control has been handed back to the calling method. 
         */
        merge(a, low, middle, high, source);
    }
}


public void merge (int a[], int low, int middle, int high, String source) {
// Merge, here, is not driven by tiny, "instantiated" sub-arrays. Rather, merge is driven by the indexes of the 
// values in the starting array, itself. Remember, we are organizing the array, itself, and are (obviously
// using the values contained within it. These indexes, as you will see, are all we need to complete the sort.  

    /* Using the respective indexes, we figure out how many elements are contained in each half. In this 
     * implementation, we will always have a half as the only way that merge can be called is if two
     * or more elements of the array are in question. We also create to "temporary" arrays for the 
     * storage of the larger array's elements so we can "play" with them and not propogate our 
     * changes until we are done. 
     */
    int first_half_element_no       = middle - low + 1;
    int second_half_element_no      = high - middle;
    int[] first_half                = new int[first_half_element_no];
    int[] second_half               = new int[second_half_element_no];

    // Here, we extract the elements. 
    for (int i = 0; i < first_half_element_no; i++) {  
        first_half[i] = a[low + i]; 
    }

    for (int i = 0; i < second_half_element_no; i++) {  
        second_half[i] = a[middle + i + 1]; // extract the elements from a
    }

    int current_first_half_index = 0;
    int current_second_half_index = 0;
    int k = low;


    while (current_first_half_index < first_half_element_no || current_second_half_index < second_half_element_no) {

        if (current_first_half_index >= first_half_element_no) {
            a[k++] = second_half[current_second_half_index++];
            continue;
        }

        if (current_second_half_index >= second_half_element_no) {
            a[k++] = first_half[current_first_half_index++];
            continue;
        }

        if (first_half[current_first_half_index] < second_half[current_second_half_index]) {
            a[k++] = first_half[current_first_half_index++];
        } else {
            a[k++] = second_half[current_second_half_index++];
        }
    }
}

I've also got a version, here, that will print out helpful information and provide a more visual representation of what is going on above. The syntax highlighting is also better, if that is helpful.


Need Your Help

How do I check (at runtime) if one class is a subclass of another?

python subclass assert

Let's say that I have a class Suit and four subclasses of suit: Heart, Spade, Diamond, Club.