Sunday, November 29, 2015

Java 8 Streams API: Grouping and Partitioning a Stream

This post shows how you can use the Collectors available in the Streams API to group elements of a stream with groupingBy and partition elements of a stream with partitioningBy.

Consider a stream of Employee objects, each with a name, city and number of sales, as shown in the table below:

+----------+------------+-----------------+
| Name     | City       | Number of Sales |
+----------+------------+-----------------+
| Alice    | London     | 200             |
| Bob      | London     | 150             |
| Charles  | New York   | 160             |
| Dorothy  | Hong Kong  | 190             |
+----------+------------+-----------------+

Grouping

Let's start by grouping employees by city using imperative style (pre-lamba) Java:

Map<String, List<Employee>> result = new HashMap<>();
for (Employee e : employees) {
  String city = e.getCity();
  List<Employee> empsInCity = result.get(city);
  if (empsInCity == null) {
    empsInCity = new ArrayList<>();
    result.put(city, empsInCity);
  }
  empsInCity.add(e);
}

You're probably familiar with writing code like this, and as you can see, it's a lot of code for such a simple task!

In Java 8, you can do the same thing with a single statement using a groupingBy collector, like this:

Map<String, List<Employee>> employeesByCity =
  employees.stream().collect(groupingBy(Employee::getCity));

This results in the following map:

{New York=[Charles], Hong Kong=[Dorothy], London=[Alice, Bob]}

It's also possible to count the number of employees in each city, by passing a counting collector to the groupingBy collector. The second collector performs a further reduction operation on all the elements in the stream classified into the same group.

Map<String, Long> numEmployeesByCity =
  employees.stream().collect(groupingBy(Employee::getCity, counting()));

The result is the following map:

{New York=1, Hong Kong=1, London=2}

Just as an aside, this is equivalent to the following SQL statement:

select city, count(*) from Employee group by city

Another example is calculating the average number of sales in each city, which can be done using the averagingInt collector in conjuction with the groupingBy collector:

Map<String, Double> avgSalesByCity =
  employees.stream().collect(groupingBy(Employee::getCity,
                               averagingInt(Employee::getNumSales)));

The result is the following map:

{New York=160.0, Hong Kong=190.0, London=175.0}

Partitioning

Partitioning is a special kind of grouping, in which the resultant map contains at most two different groups - one for true and one for false. For instance, if you want to find out who your best employees are, you can partition them into those who made more than N sales and those who didn't, using the partitioningBy collector:

Map<Boolean, List<Employee>> partitioned =
  employees.stream().collect(partitioningBy(e -> e.getNumSales() > 150));

This will produce the following result:

{false=[Bob], true=[Alice, Charles, Dorothy]}

You can also combine partitioning and grouping by passing a groupingBy collector to the partitioningBy collector. For example, you could count the number of employees in each city within each partition:

Map<Boolean, Map<String, Long>> result =
  employees.stream().collect(partitioningBy(e -> e.getNumSales() > 150,
                               groupingBy(Employee::getCity, counting())));

This will produce a two-level Map:

{false={London=1}, true={New York=1, Hong Kong=1, London=1}}