Goal: Sorting a list of N values when they all fit in RAM is easy. Here we study BigSort for a list of N values, when only m (<< N) values fit in RAM.The values are stored in a big file of 'M' disk pages. Each page has upto `k' values.
Why does it matter? In big data problems, we often want to sort 100s of GBs to Terabytes and Petabytes of data, with only Gigabytes of RAM. BigSort is a fundamental building block for efficiently working with big data.
Example Input: Table with 400 values (data rows). These are stored in big file of 58 pages with 7 rows per page. Constraint: RAM can fit B=4 pages (out of 58 pages) at a time
Output: Sorted table of 58 pages with 7 rows per page
Algorithm
Notes: