Sorting in Virtual Memory

Quicksort is a well known internal sorting method. Code for an implementation of quicksort is given below, but a general understanding of the algorithm is all that is needed here.

Suppose that we use quicksort to sort a large file which is mapped into the virtual memory of an EMAS/Multics style system. Assuming that the file is much bigger than the real memory available to the sorting process, estimate the number of page transfers required to sort N items, given that M items will fit into one page. You may assume that the data is such that quicksort performs well (ie that is does not suffer from one of the possible worst case scenarios).

///////////////////////////////////////////////////////////////////////
//                  Quicksort -- quicksort.c                         //
//                                                                   //
// To sort a number of integers into ascending order.                //
//                                                                   //
///////////////////////////////////////////////////////////////////////

const int SENTINEL = 0x80000000;  // -2147483648  Smallest int value

void sort(int a[], int l, int r)
{
    int i, j;
    int v, t;
    
    if (r > l)
    {
        v = a[r];
        i = l - 1;
        j = r;
        
        for (;;)
        {
            while (a[++i] < v) { };
            while (a[--j] > v) { };
            if (i > j) break;
            t = a[i];
            a[i] = a[j];
            a[j] = t;
        }
        
        t = a[i];
        a[i] = a[r];
        a[r] = t;

        sort(a, l, j);
        sort(a, i + 1, r);
    }
}

void quicksort(int a[], int r)
{
    a[0] = SENTINEL;
    sort(a, 1, r);
}

[Quicksort partitions the data so that the values of items to the left of the partition item are less than or equal to the partition item's value and so that the values of items to the right of the partition item are greater than or equal to the partition item's value. It then calls itself recursively to sort the left and right partitions.

Apart from pathological situations the depth of recursion is typically log2(N) and the amount of work proportional to N*log2(N).]

When sorting data which is much larger than available memory, the partitioning process will cause all of the data to be paged into memory. In principle, we would need only two page frames for this as the data is fetched sequentially from the left and right ends of the data being partitioned. A clever paging system could write the pages out immediately they were no longer needed. So to do the partitioning of N items we would have to do 2*N/M page transfers.

The amount of work at each level of recursion is approximately the same so that we have to multiply by the number of levels for which page fetching and writing is necessary. This is log2(N/M). As soon as the length of a sub-array is less than M, all of the sorting takes place in memory, so no more fetching is necessary.

So the number of page transfers is about 2*(N/M)*log2(N/M).

The answer is basically what one would get if one explicitly organised the disc transfers.