COMP 5704: Parallel Algorithms and Applications in Data Science


School of Computer Science
Carleton University, Ottawa, Canada


Project Title: Histogram sort with Sampling

Name: Megha Agarwal

E-Mail: meghaagarwal@cmail.carleton.ca


Project Outline:

Sorting is one of the most critical non-numerical algorithms and covers use cases in a wide range of scientific applications. Although we can draw on excellent research from the past decades, scaling up to thousands of processing units in modern multi-core architectures reveals a gap between theory and practice. Using a partition-based sorting algorithm, that partitions the data before distributing it further, proves to be fruitful in reducing the communication cost. Sampling helps partition data using a subset of keys to represent each partition and histogramming aids repetitive evaluation of a selected partition. So, clubbing these algorithms parallely provides not only a theoretical improvement in terms of communication and computation but also a practical development over most other algorithms. This algorithms aims to reduce the communication cost at a small cost of increase in the number of parallel steps.

Startup Paper(s):

  1. Histogram Sort with Sampling
  2. Engineering a distributed histogram sort

Deliverables:

Relevant References: