5.1   String Sorts


This section under major construction.


LSD radix sort.

Program LSD.java implements LSD radix sort for fixed length strings. It includes a method for sorting 32-bit integers, treating each integer as a 4-byte string. When N is large, this algorithm is 2-3x faster than the system sort.

MSD radix sort.

Program MSD.java implements MSD radix sort.

3-way string quicksort.

Program Quick3string.java implements 3-way string quicksort.

Q+A

Exercises

  1. Frequency counts. Read in a list of strings and print out their frequency counts. Algorithm: read strings into an array, 3-way radix quicksort them, and compute their frequency counts. Bonus speedup: compute the counts during the 3-way partitioning. Disadvantage: uses space to store all the strings. Alternate: TST.
  2. Sorting uniformly distributed data. Given N random real numbers from [0, 1], consider the following algorithm for sorting them: Partition [0, 1] into N equally spaced sub-intervals. Rearrange (ala cumulative counts) the N elements so that each element is in its appropriate bucket. Insertion sort the elements in each bucket (or equivalently, just insertion sort the whole file). That is, MSD radix sort for one level, then cutoff to insertion sort. [Try to do in-place?] Solution: It will take O(N) time in total on average. Let n_i be the number of elements in bucket i. The expected time to insertion sort all of the buckets is O(n) since E[sum_i (n_i)^2] <= 2n.
  3. Given an array of N decimal integers of different lengths, describe how to sort them in O(N + K) time, where K is the total number of digits overall all the N integers.
  4. American flag sort. (in-place key-indexed counting) Given an array with N distinct values between 0 and R-1, rearrange them in ascending order in linear time and with O(R) extra space. Leads to an (essentially) in-place string sort.

    Hint:: compute the count[] array, which tells you where the keys need to go. Scan over the input array. Take the first key, find the bin in which it belong, and swap it into place (updating the corresponding count[] entry). Repeat with the second key, but be careful to skip over keys already known to be where they belong.

Web Exercises

  1. 2-sum. Given an array a[] of N 64-bit integers and a target value T, determine whether there are two distinct integers i and j such that a[i] + a[j] equals T. Your algorithm should run in linear time in the worst case.

    Solution. Radix sort the array in linear time. Scan a pointer i from left to right and a pointer j from right to left: consider a[i] + a[j]. If it is bigger than T, advance the j pointer; if it is smaller than T, advance the i pointer; if it is equal to T, we have found the desired indices.

    Note that the array of integers can be radix sorted in linear time and constant extra space using the advanced radix sorting algorithm of Franceschini, Muthukrishnan, and Patrascu.

  2. Binary search in a sorted array of strings. Implement a version of binary search for sorted arrays of strings that keeps track of how many characters are known to be in common between the query string and the lo and hi endpoints. Use this information to avoid character compares during binary search. Compare the performance of this algorithm with a version that calls compareTo(). (The compareTo() approach has the advantage that it doesn't need to make calls to charAt() becasue it is implemented as an instance method of the String data type.)