COMP 5704: Parallel Algorithms and Applications in Data Science
|
|
School of Computer Science |
Sorting is one of the most critical non-numerical algorithms and covers use cases in a wide range of scientific applications. Although we can draw on excellent research from the past decades, scaling up to thousands of processing units in modern multi-core architectures reveals a gap between theory and practice. Using a partition-based sorting algorithm, that partitions the data before distributing it further, proves to be fruitful in reducing the communication cost. Sampling helps partition data using a subset of keys to represent each partition and histogramming aids repetitive evaluation of a selected partition. So, clubbing these algorithms parallely provides not only a theoretical improvement in terms of communication and computation but also a practical development over most other algorithms. This algorithms aims to reduce the communication cost at a small cost of increase in the number of parallel steps.
Startup Paper(s):
Deliverables:
Relevant References: