Sorting

Sorting #

Sorting a collection (of values or objects) is needed in a variety of scenarios.

Sorting criterion #

In order to sort a collection of values (resp. objects), we need a sorting criterion, i.e. a way to compare two values (resp. objects).

Some data types come with a natural way to compare two values. For instance, two real numbers can be compared w.r.t. $\le$.

Observation. Some data type have several natural sorting criteria. For instance, strings may be sorted alphabetically from left to right (in English, Hindi, Russian, etc.) or from right to left (in Arabic, Hebrew, Persian, etc.).

Question. What about objects, i.e. which sorting criterion can be used to sort a collection of people, cities, sets, lists, trees, etc.?

Answer. Any total preorder (i.e. a total, reflexive and transitive binary relation) can be used to sort a collection.

Which of the following are total preorders (i.e. can be used as a sorting criterion)?

  1. Compare two real numbers w.r.t. to their absolute value.
  2. Compare two people w.r.t. to their age.
  3. Compare two people w.r.t. to their age and size, i.e.

$\qquad \qquad p_1 \preceq p_2$ iff $\big(p_1.{\text{age}} \le p_2.{\text{age}}$ and $p_1.{\text{size}} \le p_2.{\text{size}}\big)$.

  1. Compare two people w.r.t. to their age or size, i.e.

$\qquad \qquad p_1 \preceq p_2$ iff $\big(p_1.{\text{age}} \le p_2.{\text{age}}$ or $p_1.{\text{size}} \le p_2.{\text{size}}\big)$.

  1. Compare two people w.r.t. to their age, and then their size (if they have the same age).
  2. Compare two points w.r.t. to their X coordinate, and then their Y coordinate (if they have the same X coordinate).
  3. Compare two trees w.r.t. to their number of nodes.
  4. Compare two sets w.r.t. to set inclusion.
  5. Compare two sets of characters by first sorting each set (by alphabetical order), and then comparing the two resulting sequences lexicographically.
    1. is a partial preorder,
    1. is not transitive, and
    1. is a partial order.

The others are total preorders.

To see why 4. is not transitive, consider the example:

Person Age Height
Alice 24 175
Bob 25 160
Carol 21 170

Properties of sorting algorithms #

Sorting algorithms have been extensively studied. We will not cover them this semester (with one exception), because this is part of another course of the bachelor.

We only highlight some of their properties.

Stability #

Definition. A sorting algorithm is stable if it preserves the initial order (in the input collection) of two elements that are equivalent w.r.t. the sorting criterion.

Example. Consider an array [u1, u2] of type Unit[], where (the objects referenced by) u1 and u2 both have health 2. And let us sort the elements of this array by health.

  • A stable sorting algorithm outputs [u1, u2].
  • A non-stable sorting algorithm may output [u1, u2] or [u2, u1].

In place #

Definition. A sorting algorithm is in place if it does not use additional data structures.

in Java #

Comparing #

Java provides several native ways to define a sorting criterion (a.k.a. total preorder). We highlight here two of them.

Comparable #

A class T can implement the interface Comparable<T>.

Example. We can create a class City that implements Comparable<City> as follows:

public class City implements Comparable<City> {
    ...
}

The interface Comparable<T> specifies a single method

int compareTo (T otherObject);

This method should define a total preorder $\preceq$ over instances of T, as follows.

Let $o_1$ be the current object (i.e. the instance used to call the method), and let $o_2$ be the other object (i.e. the argument of the method compareTo). Then this method should return:

  • a negative integer if $o_1 \prec_o o_2$ (i.e. if $o_1 \preceq_o o_2$ and $o_2 \not\preceq_o o_1$),
  • 0 if $o_1 =_o o_2$ (i.e. if $o_1 \preceq_o o_2$ and $o_2 \preceq_o o_1$),
  • a positive integer otherwise.

Example (continued). Here is an implementation of the method compareTo where cities are compared by zip code:

public class City implements Comparable<City> {

    public String name;
    public int zipCode;

    public City(String name, int zipCode) {
        this.name = name;
        this.zipCode = zipCode;
    }

    @Override
    public int compareTo(City otherCity) {
        return zipCode - otherCity.zipCode;
    }
}

Hint. Some native Java classes already implement Comparable, in the expected way. For instance Integer, String (left to right alphabetical order), etc.

Hint. Comparator methods are also available for primitive types (like int, or bool). For instance, Integer has a static method compare(int x, int y) that behaves like compareTo. So the method compareTo in the example above can be simplified as follows:

public class City implements Comparable<City> {
  ...

    @Override
    public int compareTo(City otherCity) {
        return Integer.compare(zipCode, otherCity.zipCode);
    }
}

Warning. The method compareTo is implicitly used by some of Java’s native data structures (e.g. TreeSet). So it is usually recommended to implement compareTo in such a way that it “complies” with equals (and hashCode). This means that

$\qquad \qquad $o1.compareTo(o2) should return 0 iff o1.equals(o2) returns true.

If you want to use a sorting criterion that does not satisfy this constraint (or if you want to use alternative sorting criteria for the same class), then we recommend using a Comparator instead (explained below).

Comparator #

A Comparator in Java is intuitively a total preorder.

The Java interface Comparator<T> specifies one method that must be implemented by each instance, with signature:

int compare(T o1, T o2)

The return value (negative integer, 0 or positive integer) has the same meaning as the one of Comparable.compareTo.

Example (continued). We can create another comparator for our class City, which uses the city’ name rather than zip code.

public class CityNameComparator implements Comparator<City> {

    @Override
    public int compare(City c1, City c2) {
        return c1.name.compareTo(c2.name);
    }
}

Note. Observe that in this example, we used the instance method compareTo (described above) implemented in the class String.

The interface Comparator also provides convenient (default) methods. For instance, the method thenComparing returns the lexicographic product of two comparators.

Sorting #

Java provides several methods to sort an array or collection. We highlight here a few of them.

Sorting an array #

For an array of type T[], the static method Arrays.sort(T[] array) can be called as follows:

Example.

int[] integers = new integer[]{2,1,3,2};
Arrays.sort(integers)

// Outputs [1,2,2,3]
System.out.println(Arrays.toString(array));

This method sorts the array using the so-called “natural ordering” for type T. The “natural ordering” is:

  • the one expected for primitive types (like int), boxed types (like Integer) and a few other native types (like String or Date); a list can be found here,
  • the one defined by T.compareTo if T implements Comparable,
  • underspecified if T is a class that does not implement Comparable.

The method Arrays.sort(T[] array, Comparator<? super T> c) is similar, but it uses the input comparator to sort the array.

Example(continued).

City trento =  new City("Trento", 38121);
City bologna =  new City("Bologna", 40100);
City[] cities = new City[]{trento, bologna};

// After this, the array 'cities' contains [bologna, trento]
Arrays.sort(cities, new CityNameComparator());

Property. Both methods guarantee stable sorting. For instance:

City trento1 =  new City("Trento", 38122);
City bologna =  new City("Bologna", 40100);
City trento2 =  new City("Trento", 38121);
City[] cities = new City[]{trento1, bologna, trento2};

// After this, the array 'cities' contains [bologna, trento1, trento2]
Arrays.sort(cities, new CityNameComparator());

Note. For an array with primitive type (like int[]), these two methods use (a version of) the Quicksort algorithm, even though Quicksort is not stable. Stability is irrelevant for these types: for instance, permuting the elements of the array [5,5] yields the same array.

However, for an array with (arbitrary) reference type (like City[]), these methods use (a version of) the MergeSort algorithm, which is stable.

Sorting a list #

The class Collections provides a static method Collections.sort(List<T> list), whose behavior is analogous to Arrays.sort. In particular, it also guarantees stable sorting. Like Arrays.sort, it comes in two flavors (with and without comparator).

Example(continued).

City bologna =  new City("Bologna", 40100);
City trento =  new City("Trento", 38122);
List<City> cities = new LinkedList();
cities.add(bologna);
cities.add(trento);

// After this, the list 'cities' contains [trento, bologna],
// due to the way 'City' implements 'compareTo'.
Collections.sort(cities);

// After this, the list 'cities' contains [bologna, trento],
Collections.sort(cities, new CityNameComparator());

The interface List also provides a default method sort that takes a comparator as argument. So in the above example, we could have used:

cities.sort(new CityNameComparator());

instead of

Collections.sort(cities, new CityNameComparator());

Usage #

Consider the following class:

public class Hero {

    String name;
    int health;

    public Hero(String name, int health){
        this.name = name;
        this.health = health;
    }
}

Write a Java method void printHeroesOcc(Hero[] heroes) that takes as input an array of type Hero[], and prints the number of occurrences of each hero in this array, where two heroes are considered identical if they have the same name and health.

Constraint. For this exercise, you cannot use an associative array (a.k.a. Java Map).

Create a comparator (compare first by health, then by name):

public class HeroComparator implements Comparator<Hero> {

    @Override
    public int compare(Hero h1, Hero h2) {
        int healthComparison = Integer.compare(h1.health, h2.health);
        return healthComparison != 0 ?
            healthComparison :
            h1.name.compareTo(h2.name);
    }

Sort the array and iterate over it to count the number of occurrences of each hero:

void printHeroesOcc(Hero[] heroes) {
    if (heroes.length == 0) {
        return;
    }
    Comparator<Hero> comparator = new HeroComparator();
    // sort the input array w.r.t. to the comparator
    Arrays.sort(heroes, comparator);
    // keeps track of the last hero seen so far
    Hero previousHero = heroes[0];
    // occurrences of the last hero seen so far
    int occ = 1;
    // iterate over the array, starting fom the second hero
    for (int i = 1; i < heroes.length; i++) {
        // if the current hero and the previous one have the same
        // name and amount of health
        if (comparator.compare(previousHero, heroes[i]) == 0) {
            occ++;
        } else {
            printHero(previousHero, occ);
            previousHero = heroes[i];
            occ = 1;
        }
    }
    printHero(previousHero, occ);
}

private void printHero(Hero hero, int occ) {
    System.out.println(hero.name + "," + hero.health + ": " + occ);
}

Note. The solution to the exercise above relies on sorting the input array. But the same problem could be solved with an associative array, as follows:

  • override equals and hashCode in the class Hero,
  • compute a Map<Hero, Integer> that maps each hero to its number of occurrences in the input array,
  • iterates over the entries of this map and print them.

When it comes to running time, the latter solution is more efficient on average (assuming a hash map), but less efficient in the worst case.