Multilevel Partitioning – Streams
By Stephen Trude / May 23, 2024 / No Comments / Certifications of Oracle, Consumer Action on Stream Elements
Multilevel Partitioning
Like the groupingBy() method, the partitioningBy() operation can be used in multilevel classification. The downstream collector in a partitioningBy() operation can be created by another partitioningBy() operation, resulting in a multilevel partitioning operation—also known as a cascaded partitioning operation. The downstream collector can also be a groupingBy() operation.
In the stream pipeline below, the CDs are partitioned at (1): one partition for CDs that are pop music CDs, and one for those that are not. The CDs that are associated with a key are grouped by the year in which they were released. Note that the CDs that were released in a year are accumulated into a List by the default downstream collector Collector.toList() that is employed by the groupingBy() operation at (2).
Map<Boolean, Map<Year, List<CD>>> map1 = CD.cdList.stream()
.collect(Collectors.partitioningBy(CD::isPop, // (1)
Collectors.groupingBy(CD::year))); // (2)
Printing the contents of the resulting map would show the following two entries, not necessarily in this order.
{false={2017=[<Jaav, “Java Jam”, 6, 2017, JAZZ>],
2018=[<Genericos, “Keep on Erasing”, 8, 2018, JAZZ>,
<Genericos, “Hot Generics”, 10, 2018, JAZZ>]},
true={2017=[<Jaav, “Java Jive”, 8, 2017, POP>],
2018=[<Funkies, “Lambda Dancing”, 10, 2018, POP>]}}
Filtering Adapter for Downstream Collectors
The filtering() method of the Collectors class encapsulates a predicate and a downstream collector to create an adapter for a filtering operation. (See also the filter() intermediate operation, p. 912.)
static <T,A,R> Collector<T,?,R> filtering(
Predicate<? super T> predicate,
Collector<? super T,A,R> downstream)
Returns a Collector that applies the predicate to input elements of type T to determine which elements should be passed to the downstream collector. This downstream collector accumulates them into results of type R, where the type parameter A is the intermediate accumulation type of the downstream collector.
The following code uses the filtering() operation at (2) to group pop music CDs according to the number of tracks on them. The groupingBy() operation at (1) creates the groups based on the number of tracks on the CDs, but the filtering() operation only allows pop music CDs to pass downstream to be accumulated.
// Filtering downstream from grouping.
Map<Integer, List<CD>> grpByTracksFilterByPopCD = CD.cdList.stream()
.collect(Collectors.groupingBy(CD::noOfTracks, // (1)
Collectors.filtering(CD::isPop, Collectors.toList()))); // (2)
Printing the contents of the resulting map would show the entries below, not necessarily in this order. Note that the output shows that there was one or more CDs with six tracks, but there were no pop music CDs. Hence the list of CDs associated with key 6 is empty.
{6=[],
8=[<Jaav, “Java Jive”, 8, 2017, POP>],
10=[<Funkies, “Lambda Dancing”, 10, 2018, POP>]}
However, if we run the same query using the filter() intermediate stream operation at (1) prior to grouping, the contents of the result map are different, as shown below.
// Filtering before grouping.
Map<Integer, List<CD>> filterByPopCDGrpByTracks = CD.cdList.stream()
.filter(CD::isPop) // (1)
.collect(Collectors.groupingBy(CD::noOfTracks, Collectors.toList()));
Contents of the result map show that only entries that have a non-empty list as a value are contained in the map. This is not surprising, as any non-pop music CD is discarded before grouping, so only pop music CDs are grouped.
{8=[<Jaav, “Java Jive”, 8, 2017, POP>],
10=[<Funkies, “Lambda Dancing”, 10, 2018, POP>]}
There are no surprises with partitioning, regardless of whether filtering is done before or after the partitioning, as partitioning always results in a map with two entries: one for the Boolean.TRUE key and one for the Boolean.FALSE key. The code below partitions CDs released in 2018 according to whether a CD is a pop music CD or not.
// Filtering downstream from partitioning.
Map<Boolean, List<CD>> partbyPopCDsFilterByYear = CD.cdList.stream() // (1)
.collect(Collectors.partitioningBy(CD::isPop,
Collectors.filtering(cd -> cd.year().equals(Year.of(2018)),
Collectors.toList()))); // (2)
// Filtering before partitioning.
Map<Boolean, List<CD>> filterByYearPartbyPopCDs = CD.cdList.stream() // (2)
.filter(cd -> cd.year().equals(Year.of(2018)))
.collect(Collectors.partitioningBy(CD::isPop, Collectors.toList()));
Both queries at (1) and (2) above will result in the same entries in the result map:
Click here to view code image {false=[<Genericos, “Keep on Erasing”, 8, 2018, JAZZ>,
<Genericos, “Hot Generics”, 10, 2018, JAZZ>],
true=[<Funkies, “Lambda Dancing”, 10, 2018, POP>]}