If there's a bonus for doing a faster sort, with 3 stacks you can implement a bottom up merge sort O(n log(n)). As pointed out by greybeard, a poly phase bottom up merge sort (a method oriented towards tape drives or other sequential devices), should be the fastest 3 stack sort.
A simpler merge sort would move every run (initial size == 1) from A to B and C, in an alternating pattern, even runs to B, odd runs to C, then 2 way merge B and C back to A, double run size, repeat until run size >= stack size. Poly phase eliminates the move / split steps, except for an initial distribute step that moves some of the elements from A to B and C.
Setting up the initial descending / ascending state (reverses the sense of a compare), and tracking when the run size on a stack changes (+1 or -1) due to dummy elements was a bit tricky. I used a table of 47 Fibonacci integers for initial distribution setup (handles stack size up to 1/2 billion elements). Stack size is known at the start, but this could be generated by doing a single copy (copy order doesn't matter since initial run size is 1).
Initial distribution for n elements: Assume that fib(m+1) > n > fib(m). n-fib(m) elements are moved to B. fib(m+1)-n elements are moved to C. n-fib(m) elements from A and B are merged (pushed) to C. After the first merge, C ends up with n-fib(m) runs of size 2, and fib(m+1)-n runs of size 1 = fib(m-1) runs. B is emptied. A ends up with (n) - (fib(m+1)-n) - 2(n-fib(m)) = 2 fib(m) - fib(m+1) = fib(m) - fib(m-1) = fib(m-2) runs of size 1. In the case that n = fib(m), then fib(m-1) elements are moved to B, leaving fib(m-2) elements in A.
Wiki article also describes a similar situation to the 3 stack sort with tape drives written forward and read backwards, but doesn't mention the details of how to distribute dummy runs (runs of size 0) at the start, but this was probably included in that 55 year old publication mentioned by greybeard.
http://en.wikipedia.org/wiki/Polyphase_merge_sort
I wrote a C++ example, but since the question asked for Java (example code below), I'll provide a link to a zip for the C++ example. Instead of a stack class, the C++ example uses arrays with a stack pointer for each array (ppmrg3s.cpp). The zip also has a regular poly phase merge sort using arrays (ppmrg.cpp).
http://rcgldr.net/misc/ppmrg.zip
Example java code. On my system, Intel 2600K, 3.4ghz, Win 7 64 bit, it sorts 16 million doubles in about 4 seconds.
public class ppmrg3s {
static final int[] FIBTBL =
{ 0, 1, 1, 2, 3, 5,
8, 13, 21, 34, 55, 89,
144, 233, 377, 610, 987, 1597,
2584, 4181, 6765, 10946, 17711, 28657,
46368, 75025, 121393, 196418, 317811, 514229,
832040, 1346269, 2178309, 3524578, 5702887, 9227465,
14930352, 24157817, 39088169, 63245986, 102334155, 165580141,
267914296, 433494437, 701408733,1134903170,1836311903};
// return index of largest fib() <= n
static int flfib(int n)
{
int lo = 0;
int hi = 47;
while((hi - lo) > 1){
int i = (lo + hi)/2;
if(n < FIBTBL[i]){
hi = i;
continue;
}
if(n > FIBTBL[i]){
lo = i;
continue;
}
return i;
}
return lo;
}
// poly phase merge sort using 3 stacks
static void ppmrg3s(dstack a, dstack b, dstack c)
{
if(a.size() < 2)
return;
int ars = 1; // init run sizes
int brs = 1;
int asc = 0; // no size change
int bsc = 0;
int csc = 0;
int scv = 0-1; // size change value
boolean dsf; // == 1 if descending sequence
{ // block for local variable scope
int f = flfib(a.size()); // FIBTBL[f] >= size >= FIBTBL[f-1]
dsf = ((f%3) == 0); // init compare flag
if(FIBTBL[f] == a.size()){ // if exact fibonacci size,
for (int i = 0; i < FIBTBL[f - 1]; i++) { // move to b
b.push(a.pop());
}
} else { // else move to b, c
// update compare flag
dsf ^= 1 == ((a.size() - FIBTBL[f]) & 1);
// i = excess run count
int i = a.size() - FIBTBL[f];
// j = dummy run count
int j = FIBTBL[f + 1] - a.size();
// move excess elements to b
do{
b.push(a.pop());
}while(0 != --i);
// move dummy count elements to c
do{
c.push(a.pop());
}while(0 != --j);
csc = c.size();
}
} // end block scope
while(true){ // start merge pass
if(asc == a.size()){ // check for size count change
ars += scv; // (due to dummy run size == 0)
scv = 0-scv;
asc = 0;
csc = c.size();
}
if(bsc == b.size()){
brs += scv;
scv = 0-scv;
bsc = 0;
csc = c.size();
}
int arc = ars; // init run counters
int brc = brs;
while(true){ // start merge pair of runs
if(dsf ^ (a.peek() <= b.peek())){
c.push(a.pop()); // move a to c
if(--arc != 0) // if not end a
continue; // continue back to compare
do{ // else move rest of b run to c
c.push(b.pop());
}while(0 != --brc);
break; // and break
} else {
c.push(b.pop()); // move b to c
if(0 != --brc) // if not end b
continue; // continue back to compare
do{ // else move rest of a run to c
c.push(a.pop());
}while(0 != --arc);
break; // and break
}
} // end merge pair of runs
dsf ^= true; // toggle compare flag
if(b.empty()){ // if end b
if(a.empty()) // if end a, done
break;
b.swap(c); // swap b, c
brs += ars;
if (0 == asc)
bsc = csc;
} else { // else not end b
if(!a.empty()) // if not end a
continue; // continue back to merge pair
a.swap(c); // swap a, c
ars += brs;
if (0 == bsc)
asc = csc;
}
}
a.swap(c); // return sorted stack in a
}
I created a fast stack class that uses a fixed maximum size array of doubles that includes a swap function member:
class dstack{
double []ar; // array
int sz; // size
int sp; // stack pointer
public dstack(int sz){ // constructor with size
this.ar = new double[sz];
this.sz = sz;
this.sp = sz;
}
public void push(double d){
this.ar[--sp] = d;
}
public double pop(){
return this.ar[sp++];
}
public double peek(){
return this.ar[sp];
}
public boolean empty(){
return sp == sz;
}
public int size(){
return sz-sp;
}
public void swap(dstack othr){
double []tempar = othr.ar;
int tempsz = othr.sz;
int tempsp = othr.sp;
othr.ar = this.ar;
othr.sz = this.sz;
othr.sp = this.sp;
this.ar = tempar;
this.sz = tempsz;
this.sp = tempsp;
}
}
Test program. It uses random integers (nextInt), that get converted to doubles during a.push(...). This made the early debugging easier. For other platforms, or to follow with debug, use a smaller number for NUMELEM, which is the number of elements.
static final int NUMELEM = 16*1024*1024;
public static void main(String[] args) {
dstack a = new dstack(NUMELEM);
dstack b = new dstack(NUMELEM);
dstack c = new dstack(NUMELEM);
Random r = new Random();
for(int i = 0; i < NUMELEM; i++){
a.push(r.nextInt(NUMELEM));
}
long bgn, end;
bgn = System.currentTimeMillis();
ppmrg3s(a, b, c);
end = System.currentTimeMillis();
double d;
d = a.pop();
while(!a.empty()){
if(d > a.peek()){
System.out.println("error");
break;
}
d = a.pop();
}
System.out.println("milliseconds");
System.out.println(end-bgn);
}