Sorting #
Sorting a collection (of values or objects) is needed in a variety of scenarios.
Sorting criterion #
In order to sort a collection of values (resp. objects), we need a sorting criterion, i.e. a way to compare two values (resp. objects).
Some data types come with a natural way to compare two values. For instance, two real numbers can be compared w.r.t. $\le$.
Observation. Some data type have several natural sorting criteria. For instance, strings may be sorted alphabetically from left to right (in English, Hindi, Russian, etc.) or from right to left (in Arabic, Hebrew, Persian, etc.).
Question. What about objects, i.e. which sorting criterion can be used to sort a collection of people, cities, sets, lists, trees, etc.?
Answer. Any total preorder (i.e. a total, reflexive and transitive binary relation) can be used to sort a collection.
Which of the following are total preorders (i.e. can be used as a sorting criterion)?
- Compare two real numbers w.r.t. to their absolute value.
- Compare two people w.r.t. to their age.
- Compare two people w.r.t. to their age and size, i.e.
$\qquad \qquad p_1 \preceq p_2$ iff $\big(p_1.{\text{age}} \le p_2.{\text{age}}$ and $p_1.{\text{size}} \le p_2.{\text{size}}\big)$.
- Compare two people w.r.t. to their age or size, i.e.
$\qquad \qquad p_1 \preceq p_2$ iff $\big(p_1.{\text{age}} \le p_2.{\text{age}}$ or $p_1.{\text{size}} \le p_2.{\text{size}}\big)$.
- Compare two people w.r.t. to their age, and then their size (if they have the same age).
- Compare two points w.r.t. to their X coordinate, and then their Y coordinate (if they have the same X coordinate).
- Compare two trees w.r.t. to their number of nodes.
- Compare two sets w.r.t. to set inclusion.
- Compare two sets of characters by first sorting each set (by alphabetical order), and then comparing the two resulting sequences lexicographically.
-
- is a partial preorder,
-
- is not transitive, and
-
- is a partial order.
The others are total preorders.
To see why 4. is not transitive, consider the example:
Person | Age | Height |
---|---|---|
Alice | 24 | 175 |
Bob | 25 | 160 |
Carol | 21 | 170 |
Properties of sorting algorithms #
Sorting algorithms have been extensively studied. We will not cover them this semester (with one exception), because this is part of another course of the bachelor.
We only highlight some of their properties.
Stability #
Definition. A sorting algorithm is stable if it preserves the initial order (in the input collection) of two elements that are equivalent w.r.t. the sorting criterion.
Example. Consider an array
[u1, u2]
of typeUnit[]
, where (the objects referenced by)u1
andu2
both have health2
. And let us sort the elements of this array by health.
- A stable sorting algorithm outputs
[u1, u2]
.- A non-stable sorting algorithm may output
[u1, u2]
or[u2, u1]
.
In place #
Definition. A sorting algorithm is in place if it does not use additional data structures.
in Java #
Comparing #
Java provides several native ways to define a sorting criterion (a.k.a. total preorder). We highlight here two of them.
Comparable #
A class T
can implement the interface Comparable<T>
.
Example. We can create a class
City
that implementsComparable<City>
as follows:public class City implements Comparable<City> { ... }
The interface Comparable<T>
specifies a single method
int compareTo (T otherObject);
This method should define a total preorder $\preceq$ over instances of T
, as follows.
Let $o_1$ be the current object (i.e. the instance used to call the method), and let $o_2$ be the other object (i.e. the argument of the method compareTo
).
Then this method should return:
- a negative integer if $o_1 \prec_o o_2$ (i.e. if $o_1 \preceq_o o_2$ and $o_2 \not\preceq_o o_1$),
0
if $o_1 =_o o_2$ (i.e. if $o_1 \preceq_o o_2$ and $o_2 \preceq_o o_1$),- a positive integer otherwise.
Example (continued). Here is an implementation of the method
compareTo
where cities are compared by zip code:public class City implements Comparable<City> { public String name; public int zipCode; public City(String name, int zipCode) { this.name = name; this.zipCode = zipCode; } @Override public int compareTo(City otherCity) { return zipCode - otherCity.zipCode; } }
Hint. Some native Java classes already implement
Comparable
, in the expected way. For instanceInteger
,String
(left to right alphabetical order), etc.
Hint. Comparator methods are also available for primitive types (like
int
, orbool
). For instance,Integer
has a static methodcompare(int x, int y)
that behaves likecompareTo
. So the methodcompareTo
in the example above can be simplified as follows:public class City implements Comparable<City> { ... @Override public int compareTo(City otherCity) { return Integer.compare(zipCode, otherCity.zipCode); } }
Warning. The method
compareTo
is implicitly used by some of Java’s native data structures (e.g.TreeSet
). So it is usually recommended to implementcompareTo
in such a way that it “complies” withequals
(andhashCode
). This means that$\qquad \qquad $
o1.compareTo(o2)
should return0
iffo1.equals(o2)
returnstrue
.If you want to use a sorting criterion that does not satisfy this constraint (or if you want to use alternative sorting criteria for the same class), then we recommend using a
Comparator
instead (explained below).
Comparator #
A Comparator
in Java is intuitively a total preorder.
The Java interface Comparator<T>
specifies one method that must be implemented by each instance, with signature:
int compare(T o1, T o2)
The return value (negative integer, 0
or positive integer) has the same meaning as the one of Comparable.compareTo
.
Example (continued). We can create another comparator for our class
City
, which uses the city’ name rather than zip code.public class CityNameComparator implements Comparator<City> { @Override public int compare(City c1, City c2) { return c1.name.compareTo(c2.name); } }
Note. Observe that in this example, we used the instance method
compareTo
(described above) implemented in the classString
.
The interface Comparator
also provides convenient (default) methods.
For instance, the method thenComparing returns the lexicographic product of two comparators.
Sorting #
Java provides several methods to sort an array or collection. We highlight here a few of them.
Sorting an array #
For an array of type T[]
, the static method Arrays.sort(T[] array)
can be called as follows:
Example.
int[] integers = new integer[]{2,1,3,2}; Arrays.sort(integers) // Outputs [1,2,2,3] System.out.println(Arrays.toString(array));
This method sorts the array using the so-called “natural ordering” for type T
.
The “natural ordering” is:
- the one expected for primitive types (like
int
), boxed types (likeInteger
) and a few other native types (likeString
orDate
); a list can be found here, - the one defined by
T.compareTo
ifT
implementsComparable
, - underspecified if
T
is a class that does not implementComparable
.
The method Arrays.sort(T[] array, Comparator<? super T> c)
is similar, but it uses the input comparator to sort the array.
Example(continued).
City trento = new City("Trento", 38121); City bologna = new City("Bologna", 40100); City[] cities = new City[]{trento, bologna}; // After this, the array 'cities' contains [bologna, trento] Arrays.sort(cities, new CityNameComparator());
Property. Both methods guarantee stable sorting. For instance:
City trento1 = new City("Trento", 38122); City bologna = new City("Bologna", 40100); City trento2 = new City("Trento", 38121); City[] cities = new City[]{trento1, bologna, trento2}; // After this, the array 'cities' contains [bologna, trento1, trento2] Arrays.sort(cities, new CityNameComparator());
Note. For an array with primitive type (like
int[]
), these two methods use (a version of) the Quicksort algorithm, even though Quicksort is not stable. Stability is irrelevant for these types: for instance, permuting the elements of the array[5,5]
yields the same array.However, for an array with (arbitrary) reference type (like
City[]
), these methods use (a version of) the MergeSort algorithm, which is stable.
Sorting a list #
The class Collections
provides a static method Collections.sort(List<T> list)
, whose behavior is analogous to Arrays.sort
.
In particular, it also guarantees stable sorting.
Like Arrays.sort
, it comes in two flavors (with and without comparator).
Example(continued).
City bologna = new City("Bologna", 40100); City trento = new City("Trento", 38122); List<City> cities = new LinkedList(); cities.add(bologna); cities.add(trento); // After this, the list 'cities' contains [trento, bologna], // due to the way 'City' implements 'compareTo'. Collections.sort(cities); // After this, the list 'cities' contains [bologna, trento], Collections.sort(cities, new CityNameComparator());
The interface List
also provides a default method sort
that takes a comparator as argument.
So in the above example, we could have used:
cities.sort(new CityNameComparator());
instead of
Collections.sort(cities, new CityNameComparator());
Usage #
Consider the following class:
public class Hero {
String name;
int health;
public Hero(String name, int health){
this.name = name;
this.health = health;
}
}
Write a Java method void printHeroesOcc(Hero[] heroes)
that takes as input an array of type Hero[]
, and prints the number of occurrences of each hero in this array,
where two heroes are considered identical if they have the same name and health.
Constraint. For this exercise, you cannot use an associative array (a.k.a. Java Map
).
Create a comparator (compare first by health, then by name):
public class HeroComparator implements Comparator<Hero> {
@Override
public int compare(Hero h1, Hero h2) {
int healthComparison = Integer.compare(h1.health, h2.health);
return healthComparison != 0 ?
healthComparison :
h1.name.compareTo(h2.name);
}
Sort the array and iterate over it to count the number of occurrences of each hero:
void printHeroesOcc(Hero[] heroes) {
if (heroes.length == 0) {
return;
}
Comparator<Hero> comparator = new HeroComparator();
// sort the input array w.r.t. to the comparator
Arrays.sort(heroes, comparator);
// keeps track of the last hero seen so far
Hero previousHero = heroes[0];
// occurrences of the last hero seen so far
int occ = 1;
// iterate over the array, starting fom the second hero
for (int i = 1; i < heroes.length; i++) {
// if the current hero and the previous one have the same
// name and amount of health
if (comparator.compare(previousHero, heroes[i]) == 0) {
occ++;
} else {
printHero(previousHero, occ);
previousHero = heroes[i];
occ = 1;
}
}
printHero(previousHero, occ);
}
private void printHero(Hero hero, int occ) {
System.out.println(hero.name + "," + hero.health + ": " + occ);
}
Note. The solution to the exercise above relies on sorting the input array. But the same problem could be solved with an associative array, as follows:
- override
equals
andhashCode
in the classHero
,- compute a
Map<Hero, Integer>
that maps each hero to its number of occurrences in the input array,- iterates over the entries of this map and print them.
When it comes to running time, the latter solution is more efficient on average (assuming a hash map), but less efficient in the worst case.