2.5   Sorting Applications



Sorting algorithms and priority queues are widely used in a broad variety of applications. Our purpose in this section is to briefly survey some of these applications.

Sorting various types of data.

Our implementations sort arrays of Comparable objects. This Java convention allows us to use Java's callback mechanism to sort arrays of objects of any type that implements the Comparable interface.

Which sorting algorithm should I use?

Knowing which algorithm is best possible depends heavily on details of the application and implementation, but we have studied some general-purpose methods that can be nearly as effective as the best possible for a wide variety of applications. The following table is a general guide that summarizes the important characteristics of the sort algorithms that we have studied in this chapter.

Performance characteristics of sorting algorithms

Property. Quicksort is the fastest general-purpose sort.

In most practical situations, quicksort is the method of choice. If stability is important and space is available, mergesort might be best. In some performance-critical applications, the focus may be on just sorting numbers, so it is reasonable to avoid the costs of using references and sort primitive types instead.

Reductions.

The idea that we can use sorting algorithms to solve other problems is an example of a basic technique in algorithm design known as reduction. A reduction is a situation where an algorithm developed for one problem is used to solve another. We begin with a few elementary examples for sorting.

A brief survey of sorting applications.


Exercises

  1. Consider the following implementation of the compareTo() method for String. How does the third line help with efficiency?
    public int compareTo(String t) {
       String s = this;
       if (s == t) return 0;  // this line
       int n = Math.min(s.length(), t.length());
       for (int i = 0; i < n; i++) {
          if      (s.charAt(i) < t.charAt(i)) return -1;
          else if (s.charAt(i) > t.charAt(i)) return +1;
       }
       return s.length() - t.length();
    }
    

    Solution: it avoid directly comparing individual characters if s and t are references to the same string.

  2. Criticize the following implementation of a class intended to represent customer account balances. Why is compareTo() a flawed implementation of the Comparable interface?
    public class Customer implements Comparable<Customer> {
        private String name;
        private double balance;
        
        public int compareTo(Customer that) {
            if (this.balance < that.balance - 0.005) return -1;
            if (this.balance > that.balance + 0.005) return +1; 
            return 0;
       }
    }
    

    Solution: it violates the Comparable contract. It is possible that a.compareTo(b) and b.compareTo(c) are both 0, but a.compareTo(c) is positive (or negative).

  3. Explain why selection sort is not stable.

    Solution. It exchanges nonadjacent elements. On the example below, the first B gets swapped to the right of the second B.

    Selection sort is unstable

  4. Write a program Frequency.java that reads strings from standard input and prints the number of times each string occurs, in descending order of frequency.

Creative Problems

  1. Scheduling. Write a program SPT.java that reads job names and processing times from standard input and prints a schedule that minimizes average completion time, as described in the text.
  2. Load balancing. Write a program LPT.java that takes an integer M as a command-line argument, reads N job names and processing times from standard input and prints a scheduling assignment the jobs to M processors that approximately minimizes the time when the last job completes, as described in the text.

    Remark. The resulting solution is guaranteed to be within 33% of the best possible (actually 4/3 - 1/(3N)).

  3. Sort by reverse domain. Write a data type Domain.java that represents domain names, including an appropriate compareTo() method where the natural order is in order of the reverse domain name. For example, the reverse domain of cs.princeton.edu is edu.princeton.cs. This is useful for web log analysis. Write a client that reads domain names from standard input and prints the reverse domains in sorted order.
  4. Spam campaign. To initiate an illegal spam campaign, you have a list of email addresses from various domains (the part of the email address that follows the @ symbol). To better forge the return addresses, you want to send the email from another user at the same domain. For example, you might want to forge an email from wayne@princeton.edu to rs@princeton.edu. How would you process the email list to make this an efficient task?

    Solution. First sort by reverse domain.

  5. Unbiased election. In order to thwart bias against candidates whose names appear toward the end of the alphabet, California sorted the candidates appearing on its 2003 gubernatorial ballot by using the following order:
    R W Q O J M V A H B S G Z X N T C I E K U P D Y F L
    

    Create a data type California.java where this is the natural order. Write a client that sorts strings according to this ordering. Assume that each string is comprised solely of uppercase letters.

  6. Kendall tau distance. Write a program KendallTau.java that computes the Kendall tau distance between two permutations in linearithmic time.
  7. Stable priority queue. Develop a stable priority-queue implementation StableMinPQ.java (which returns duplicate keys in the same order in which they were inserted).
  8. Points in the plane. Write three static static comparators for the Point2D.java data type, one that compares points by their x coordinate, one that compares them by their y coordinate, and one that compares them by their distance from the origin. Write two non-static comparators for the Point2D data type, one that compares them by their distance to a specified point and one that compares them by their polar angle with respect to a specified point.
  9. Interval 1D data type. Write three static comparators for Interval1D.java, one that compares intervals by their left endpoing, one that compares intervals by their right endpoint, and one that compares intervals by their length.
  10. Sort files by name. Write a program FileSorter.java that takes the name of a directory as a command line input and prints out all of the files in the current directory, sorted by filename. Hint: use the java.io.File data type.
  11. Boerner's theorem. True or false: If you sort each column of a matrix, then sort each row, the columns are still sorted. Justify your answer.

    Answer. True.

  12. Distinct values. Write a program Distinct.java that takes integers M, N, and T as command-line arguments, then uses the code given in the text to perform T trials of the following experiment: Generate N random int values between 0 and M–1 and count the number of distinct values generated. Run your program for T = 10 and N = 10^3, 10^4, 10^5, and 10^6, with M = 1/2 N, N, and 2N. Probability theory says that the number of distinct values should be about M(1 – e^(-alpha)), where alpha = N/M— print a table to help your confirm that your experiments validate this formula.

Web Exercises

  1. Counter data type. Modify Counter.java so that it implements the Comparable interface, comparing counters by tally.
  2. Grade data type. Write a program Grade.java to represent a data type for grades (A, B+, etc.). It should implement the Comparable interface using the natural ordering on grades by GPA.
  3. Student data type. Write an data type Student.java that represents a student in a college course. Each student should have a login (String), a section number (integer), and a grade (Grade).
  4. Case insensitive order. Write a code fragment to read in a sequence of strings and sort them in ascending order, ignoring case.
    String[] a = new String[N];
    for (int i = 0; i < N. i++) {
       a[i] = StdIn.readString();
    }
    Arrays.sort(a, String.CASE_INSENSITIVE_ORDER);
    

  5. Case insensitive comparator. Implement your own version of the comparator String.CASE_INSENSITIVE_ORDER.
    public class CaseInsensitive implements Comparator<String> {
       public int compare(String a, String b) {
          return a.compareToIgnoreCase(b);
       }
    }
    
  6. Descending order string comparator. Implement a comparator that sorts string in descending order instead of ascending order.
    public class Descending implements Comparator<String> {
       public int compare(String a, String b) {
          return b.compareToIgnoreCase(a);
       }
    }
    
    Alternatively, you can use Collections.reverseOrder(). It returns a Comparator that imposes the reverse of the natural ordering of objects that implement the Comparable interface.

  7. Sorting strings from non-English alphabets. Write a program to sort strings according to non-English alphabets, for accents, umlauts, and pre-composed character like ch in Spanish.

    Hint: Use Java's java.text.Collator API. For example in UNICODE, Rico occurs lexicographically before Réal, but in French, Réal occurs first.

    import java.util.Arrays;
    import java.text.Collator;
    ...
    Arrays.sort(words, Collator.getInstance(Locale.FRENCH));
    
  8. Smith's rule. The following problem arises in supply chain management. You have a bunch of jobs to schedule on a single machine. (Give example.) Job j requires p[j] units of processing time. Job j has a positive weight w[j] which represents its relative importance - think of it as the inventory cost of storing the raw materials for job j for 1 unit of time. If job j finishes being processed at time t, then it costs t * w[j] dollars. The goal is to sequence the jobs so as to minimize the sum of the weighted completion times of each job. Write a program SmithsRule.java that reads in a command line parameter N and a list of N jobs specified by their processing time p[j] and their weight w[j], and output an optimal sequence in which to process their jobs. Hint: Use Smith's rule: schedule the jobs in order of their ratio of processing time to weight. This greedy rule turns out to be optimal.
  9. Rhyming words. For your poetry class, you would like to tabulate a list of rhyming words. A crude way to accomplish this task is as follows:
    • Read in a dictionary of words into an array of strings.
    • Reverse the letters in each word, e.g., confound becomes dnuofnoc.
    • Sort the resulting array of words.
    • Reverse the letters in each word back to their original state.

    Now the word confound will be next to words like astound and compound. Write a program Rhymer.java that reads in a sequence of words from standard input and prints them out in the order specified above.

    Now repeat, but use a customized Comparator that sorts lexicographically from right-to-left.

  10. Mode. Give an O(N log N) algorithm for computing the mode (value that occurs most frequently) of a sequence of N integers.
  11. Closest 1d pair. Given a sequence of N real numbers, find the pair of integers that are closest in value. Give a O(N log N) algorithm.
  12. Farthest 1d pair. Given a sequence of N real numbers, find the pair of integers that are farthest apart in value. Give a O(N) algorithm.
  13. Sorting with many duplicates. Suppose you have a sequence of N elements, with at most log N distinct ones. Describe how to sort them in O(N log log N) time.
  14. Nearly sorted. Given an array of N elements, each which is at most k positions from its target position, devise an algorithm that sorts in O(N log k) time.
  15. Sorting a linked list. Given a singly linked list of N elements, how could you sort it in guaranteed O(N log N) time, stably, and with O(1) extra space?
  16. Goofysort (Jim Huggins). Argue that Goofy.java sorts the array in ascending order. What is the best-case running time of as a function of the number of items to be sorted N? What is the worst-case running time of as a function of the number of items to be sorted N?
  17. Feel-good interval. Given an array of N nonnegative integers (representing a person's emotional value on each day), the happiness in an interval is the sum of the values in that interval multiplied by the smallest integer in that interval. Design an O(N log N) divide-and-conquer algorithm to find the happiest interval.

    Solution. Here's a mergesort style solution.

    • Divide the elements in the middle: a[l..m-1], a[m], a[m+1..r]
    • Recursively compute the optimal interval entirely in the left half
    • Recursively compute the optimal interval entirely in the right half
    • Compute the optimal interval containing a[m]
    • Return the best of the three intervals
    The key step for efficiency is computing the optimal interval containing a[m] in linear time. Here's a greedy solution: If the optimal interval containing a[m] contains one element, it is simply a[m]. If it contains more than one element, it must contain the larger of a[m-1] and a[m+1], so add this to the interval. Repeat, etc. Return the best interval of any size constructed by this process.
  18. Equality detector. Suppose that you have N elements and you want to determine if at least N/2 are equal. Assume the only operation on the elements you can perform is equality testing. Design an algorithm that performs O(N log N) equality tests to find a representative element if it exists. Hint: divide-and-conquer. Note: can also do in O(N) tests.
  19. Maxima. Given a set of n points in the plane, point (xi, yi) dominates (xj, yj) if xi > xj and yi > yj. A maxima is a point that is not dominated by any other point in the set. Devise an O(n log n) algorithm to find all maxima. Application: on x-axis is space efficiency, on y-axis is time efficiency. Maxima are useful algorithms. Hint: sort in ascending order according to x-coordinate; scan from right to left, recording the highest y-value seen so far, and mark these as maxima.
  20. Min and max. Given an array of N elements, find the min and max using as few compares as possible. Brute force: find the max (N-1 compares), then find the min of the remaining elements (N-2 compares).

    Solution 1. Divide and conquer: find min and max in each half (2T(N/2) compares), return min of 2 and max of 2 (2 compares). T(1) = 0, T(2) = 1, T(N) = 2T(N/2) + 2. Recurrence solution: T(N) = ceil(3N/2) - 2.

    Solution 2. Divide the elements into pairs and compare two elements in each pair. Put the smallest elements in A and the largest in B. If n is odd, put element n in both A and B. This requires floor(n/2) comparisons. Now directly compute the minimum in A (ceil(n/2) - 1 comparisons) and the maximum in B (ceil(N/2) - 1) comparisons. [In fact, this is best possible.]

  21. Sorting by reversals. [ Mihai Patrascu] Given an array a[1..N], sort using the following type operation: pick two indices i and j and reverse the elements in a[i..j]. This operation costs j-i+1. Goal: O(N log^2 N).

    Hint: first sort (by reversals) a sequence of N 0s and 1s in O(N log N) as follows: Divide the array into two pieces; sort each half (by reversals) recursively. The array now looks like 000001111111110000011111. To complete the sort, reverse the middle two sequences.

    To sort a sequence of N elements, use a quicksort style algorithm. Choose a partitioning element at random. Identify all the elements that need to go on the left side with 0s and the right side with 1s. Move them into position using the previous 0/1 algorithm.

  22. L1 norm. There are N circuit elements in the plane. You need to run a special wire (parallel to the x-axis) across the circuit. Each circuit element must be connected to the special wire. Where should you put the special wire? Hint: median minimizes L1 norm.
  23. Median given two sorted arrays. Given two sorted arrays of size N1 and N2, find the median of all elements in O(log N) time where N = N1 + N2. Or find the kth overall largest in O(log k).
  24. Three nearby numbers in an array. Given a floating-point array a[], design a linearithmic algorithm to find three distinct integers i, j, and k such that |a[i] - a[j]| + |a[j] - a[k]| + |a[k] - a[i]| is minimum.

    Hint: if a[i] <= a[j] <= a[k], then |a[i] - a[j]| + |a[j] - a[k]| + |a[k] - a[i]| = 2 (a[k] - a[i]).

  25. Three nearby numbers in three arrays. Given three floating-point arrays a[], b[], and c[], design a linearithmic algorithm to find three integers i, j, and k such that |a[i] - b[j]| + |b[j] - c[k]| + |c[k] - a[i]| is minimum.
  26. Minimum dot product. Given two vectors of the same length, find a permutation of the two vectors such that the dot product of the two vectors is as small as possible.
  27. Two-sum. Given an array of N integers, design a linearithmic algorithm to find a pair of integers whose sum is closest to zero.

    Solution: sort by absolute value—the best pair is now adjacent.

  28. 3-sum in quadratic time. The 3-sum problem is to find, in an array of integers, the triple whose sum is closest to zero. Describe an algorithm for this problem that uses linear space and quadratic time.

    Hint: solve the following subproblem. Given a sorted list of N integers and a target integer x, determine in linear time the two whose sum is closest to x.

  29. Bandwidth. Given intervals with bandwidth requirements, find the maximum bandwidth requirement (and the interval for which that maximum is required).

    Solution. Sort the intervals by start time; insert the intervals into PQ in this order, but using the ending time as the key. Before inserting the next interval, compare its start time to ending time of the minimum interval on the PQ: if it is greater, delete the minimum interval on the PQ. Always keep track of the cumulative bandwidth on the PQ.

  30. Time stamps. Given N time stamps when file is requested from web server, find largest interval of time at which no file arrives. Solution: sort by time stamp. Scan sorted list to identify maximum gap. (Same as idle time.)
  31. Ticket ranges. Given a list of ticket seats of the form A1, A2, A11, A10, B7, B9, B8, B3, find the largest non-empty block of adjacent seats, e.g., A3-A9. (Same as idle time.)
  32. Decimal dominant. Given an array with N comparable keys, design an algorithm to check if there is a value that appears more than N/10 times. Your algorithm should run in expected linear time.

    Solution. Use quickselect to find the N/10th largest value; check if it is a dominant; if not, recur in the subarray with 9N/10 values.

    Alternatively, use 9 counters.

  33. Local min and max. Given N distinct comparable items, rearrange them so that each internal item is either greater than both items right before and after it or less than both items right before and after it.

    Hint: sort and interleave the first and second halves.

  34. h-index. Given an array of N positive integers, its h-index is the largest integer h such that there are at least h entries in the array greater than or equal to h. Design an algorithm to compute the h-index of an array.

    Hint: median or quicksort-like partitioning and divide-and-conquer.

  35. Stable selection sort. What modifications do you need to do to make selection sort stable?

    Solution: first, when finding the minimum remaining key, always choose the leftmost entry; second, instead of moving the minimum key to the front with one exchange, move all elements to its left that are large one position to the right.