A comparison of parallel sorting algorithms on different architectures
NANCY M. A MATO amato@cs.tamu.edu R AVISHANKAR I YER ravi@cs.tamu.edu S HARAD S UNDARESAN sharad@cs.tamu.edu YAN W U yanwu@cs.tamu.edu Department of Computer Science, Texas A&M University, College Station, TX 77843-3112
Technical Report 98-029 Department of Computer Science Texas A&M University January 1996
Abstract
In this paper, we present a comparative performance evaluation of three different parallel sorting algorithms: bitonic sort, sample sort, and parallel radix sort. In order to study the interaction between the algorithms and the architecture, we implemented all the algorithms on three different architectures: a MasPar MP1202, a mesh-connected computer with 2048 processing elements; an nCUBE 2, a message-passing hypercube with 32 processors; and a Sequent Balance, a distributed shared-memory machine with 10 processors. For each machine, we found that the choice of algorithm depends upon the number of elements to be sorted. In addition, as expected, our results show that the relative performance of the algorithms differed on the various machines. It is our hope that our results can be extrapolated to help select appropriate candidates for implementation on machines with architectures similar to those that we have studied. As evidence for this, our findings on the nCUBE 2, a 32 node hypercube, are in accordance with the results obtained by Blelloch et al. [5] on the CM-2, a hypercube with 1024 processors. In addition, preliminary results we have obtained on the SGI Power Challenge, a distributed shared-memory machine, are in accordance with our findings on the Sequent Balance.
This research supported in part by NCSA.
1 Introduction
Sorting is one of the fundamental problems of computer science, and parallel algorithms for sorting have been studied since the beginning of parallel computing. Batcher’s log2 n-depth bitonic sorting network [2] was one of