In this article we will see how to aggregate data in Java 8 streams, we have a list of "Hits" that contains information like geoId, cliks, impressions and siteId.

Using Java 8 groupBy() function we can aggreagte data on any dimension like "geoId" or "siteId" or composite dimensions like "geoId+SiteUId", where the end result will reveal total of metrics like click and impressions on each geoId or siteId.
class Hit {
	private int geoId;
	private int clicks;
	private int impressions;
	private int siteId;

	// Getter Setters, constructor and toString()

}


Dummy data: Following dummy data contains impressions and clicks for geoId 1,2 and siteId 1,2, we will try to find total clicks and impressions on each geoId and siteId.
		List<Hit> hits = new ArrayList<>();
		hits.add(new Hit(1, 10, 100, 1));
		hits.add(new Hit(2, 10, 100, 1));
		hits.add(new Hit(2, 10, 100, 2));
		hits.add(new Hit(1, 10, 100, 2));



1) Aggregate total clicks based on geoId

		// this will print total clicks by geoId
		Map<Integer, Integer> clicksByGeoId = hits.stream()
				.collect(Collectors.groupingBy(Hit::getGeoId, Collectors.summingInt(Hit::getClicks)));
		clicksByGeoId.forEach((geoId, clicks) -> System.out.println("GeoId:" + geoId + "- Clicks:" + clicks));
Output: Output of above code will look something like this:


GeoId:1- Clicks:20
GeoId:2- Clicks:20


2) Aggregate total impressions based on siteId

// this will print total impressions by siteId
		Map<Integer, Integer> impressionsBySiteId = hits.stream()
				.collect(Collectors.groupingBy(Hit::getSiteId, Collectors.summingInt(Hit::getImpressions)));
		impressionsBySiteId.forEach(
				(siteId, impressions) -> System.out.println("SiteId:" + siteId + "- Impressions:" + impressions));

Output: Output of above code will look something like this:


SiteId:1- Impressions:200
SiteId:2- Impressions:200


3) Aggregate total impressions and clicks based on siteId

In order to aggregate data on a dimension with a result of more than one metric, a function needs to be added in the data POJO as shown below:
	public Hit add(Hit record) {
		// Dimensions
		this.geoId = record.geoId;
		this.siteId = record.siteId;

		// Metrics
		this.clicks = record.clicks;
		this.impressions = record.impressions;

		return this;
	}
Now the "Hit.java" will look something like this:
class Hit {
	private int geoId;
	private int clicks;
	private int impressions;
	private int siteId;

	public Hit() {
		super();
	}

	public Hit add(Hit record) {
		// Dimensions
		this.geoId = record.geoId;
		this.siteId = record.siteId;

		// Metrics
		this.clicks += record.clicks;
		this.impressions += record.impressions;

		return this;
	}

	// Getters and Setters

	public Hit(int geoId, int clicks, int impressions, int siteId) {
		super();
		this.geoId = geoId;
		this.clicks = clicks;
		this.impressions = impressions;
		this.siteId = siteId;
	}

	@Override
	public String toString() {
		return "Hit [geoId=" + geoId + ", clicks=" + clicks + ", impressions=" + impressions + ", siteId=" + siteId
				+ "]";
	}

}
Aggregate data based on geoId, to get total clicks and impressions.
		Collector<Hit, Hit, Hit> agg = Collector.of(Hit::new, Hit::add, Hit::add);

		// this will print total impressions and clicks based on geoId
		Map<Integer, Hit> dataByGeo = hits.stream().collect(Collectors.groupingBy(Hit::getSiteId, agg));
		dataByGeo.forEach((geoId, hit) -> System.out.println("geoId:" + geoId + "- hit:" + hit));

Output: Output of above code will look something like this:


geoId:1- hit:Hit [geoId=2, clicks=20, impressions=200, siteId=1]
geoId:2- hit:Hit [geoId=1, clicks=20, impressions=200, siteId=2]

  • By Techburps.com
  • Oct 6, 2018
  • Java 8