Grouping

Classifying elements into groups based on some criteria is a very common operation. An example is classifying CDs into groups according to the number of tracks on them (this sounds esoteric, but it will illustrate the point). Such an operation can be accomplished by the collector returned by the groupingBy() method. The method is passed a classifier function that is used to classify the elements into different groups. The result of the operation is a classification map whose entries are the different groups into which the elements have been classified. The key in a map entry is the result of applying the classifier function on the element. The key is extracted from the element based on some property of the element—for example, the number of tracks on the CD. The value associated with a key in a map entry comprises those elements that belong to the same group. The operation is analogous to the group-by operation in databases.

There are three versions of the groupingBy() method that provide increasingly more control over the grouping operation.

Click here to view code image

static <T,K> Collector<T,?,Map<K,List<T>>> groupingBy(
       Function<? super T,? extends K> classifier)
static <T,K,A,D> Collector<T,?,Map<K,D>> groupingBy(
       Function<? super T,? extends K> classifier,
       Collector<? super T,A,D>        downstream)
static <T,K,D,A,M extends Map<K,D>> Collector<T,?,M> groupingBy(
       Function<? super T,? extends K> classifier,
       Supplier<M>                     mapSupplier,
       Collector<? super T,A,D>        downstream)

The Collector returned by the groupingBy() methods implements a group-by operation on input elements to create a classification map.
The classifier function maps elements of type T to keys of some type K. These keys determine the groups in the classification map.

The collector returned by the single-argument method produces a classification map of type Map<K, List<T>>. The keys in this map are the results from applying the specified classifier function to the input elements. The input elements that map to the same key are accumulated into a List by the default downstream collector Collector.toList().

The two-argument method accepts a downstream collector, in addition to the classifier function. The collector returned by the method is composed with the specified downstream collector that performs a reduction operation on the input elements that map to the same key. It operates on elements of type T and produces a result of type D. The result of type D produced by the downstream collector is the value associated with the key of type K. The composed collector thus results in a classification map of type Map<K, D>.

The three-argument method accepts a map supplier as its second parameter. It creates an empty classification map of type M that is used by the composed collector. The result is a classification map of type M whose key and value types are K and D, respectively.

Figure 16.16 illustrates the groupingBy() operation by grouping CDs according to the number of tracks on them. The classifier function CD::noOfTracks extracts the number of tracks from a CD that acts as a key in the classification map (Map<Integer, List<CD>>). Since the call to the groupingBy() method in Figure 16.16 does not specify a downstream collector, the default downstream collector Collector.to-List() is used to accumulate CDs that have the same number of tracks. The number of groups—that is, the number of distinct keys—is equal to the number of distinct values for the number of tracks on the CDs. Each distinct value for the number of tracks is associated with the list of CDs having that value as the number of tracks.

Figure 16.16 Grouping

The three stream pipelines below result in a classification map that is equivalent to the one in Figure 16.16. The call to the groupingBy() method at (2) specifies the downstream collector explicitly, and is equivalent to the call in Figure 16.16.

Click here to view code image

Map<Integer, List<CD>> map22 = CD.cdList.stream()
    .collect(Collectors.groupingBy(CD::noOfTracks, Collectors.toList()));  // (2)

The call to the groupingBy() method at (3) specifies the supplier TreeMap:new so that a TreeMap<Integer, List<CD>> is used as the classification map.

Click here to view code image

Map<Integer, List<CD>> map33 = CD.cdList.stream()
    .collect(Collectors.groupingBy(CD::noOfTracks,                         // (3)
                                   TreeMap::new,
                                   Collectors.toList()));

The call to the groupingBy() method at (4) specifies the downstream collector Collector.toSet() that uses a set to accumulate the CDs for a group.

Click here to view code image

Map<Integer, Set<CD>> map44 = CD.cdList.stream()
    .collect(Collectors.groupingBy(CD::noOfTracks, Collectors.toSet()));   // (4)

The classification maps created by the pipelines above will contain the three entries shown below, but only the groupingBy() method call at (3) can guarantee that the entries will be sorted in a TreeMap<Integer, List<CD>> according to the natural order for the Integer keys.

Click here to view code image

{
6=[<Jaav, “Java Jam”, 6, 2017, JAZZ>],
8=[<Jaav, “Java Jive”, 8, 2017, POP>,
   <Genericos, “Keep on Erasing”, 8, 2018, JAZZ>],
10=[<Funkies, “Lambda Dancing”, 10, 2018, POP>,
    <Genericos, “Hot Generics”, 10, 2018, JAZZ>]
}

In general, any collector can be passed as a downstream collector to the groupingBy() method. In the stream pipeline below, the map value in the classification map is a count of the number of CDs having the same number of tracks. The collector Collector.counting() performs a functional reduction to count the CDs having the same number of tracks (p. 998).

Click here to view code image

Map<Integer, Long> map55 = CD.cdList.stream()
    .collect(Collectors.groupingBy(CD::noOfTracks, Collectors.counting()));
//{6=1, 8=2, 10=2}

Leave a Reply

Your email address will not be published. Required fields are marked *