Kafka Streams Aggregate API Tutorial

This tutorial, the third and final tutorial in the quick-start series, will show how to define the API of an aggregate, how to reference that API from another aggregate, and how to interact with parts of a system that predate Creek.

ProTip: An aggregate is simply a logical grouping of services that, together, provide some business function via a defined api. i.e. An aggregate is a level of abstraction above a single microservice. This is also known as a Bounded Context in DDD nomenclature.

The tutorial will extend the Basic Kafka Streams tutorial. As a quick recap, that tutorial defined a handle-occurrence-service which consumed data from the twitter.tweet.text Kafka topic, searched each Tweet for Twitter handles, and output any occurrences encountered to the twitter.handle.usage Kafka topic.

In the original tutorial, the handle-occurrence-service declared it owned the twitter.tweet.text input topic. This kept the tutorial self-contained, but was noted as usual. In this tutorial, the ownership will be changed to an imagined upstream Twitter ingestion system.

ProTip: The concept of topic ownership defines which service, or aggregate, and hence which team within an organisation, is responsible for a topic, its configuration, and the data it contains.

The ingestion system, which, for the sake of example, doesn’t use Creek, will be defined as the ingestion aggregate. This allows the tutorial to demonstrate how to integrate with non-Creek systems.

Features covered

The key features this tutorial is designed to highlight are:

How to declare the API of a Creek aggregate. If you wish to jump straight to this, see the Creek aggregate API section.
How to interop with parts of the system that predate or don’t use Creek. If you wish to jump straight to this, see the Non-Creek aggregate API section.
Why defining aggregates is a powerful architectural pattern. If you wish to jump straight to this, see the Why Aggregates? section.

Prerequisites

The tutorial requires the following:

A GitHub account.
Git installed for source code control.
Docker desktop installed for running containerised system tests.

Design

The design changes covered by this tutorial fall into two main tasks:

The first task is to define the public API of the tutorial’s own aggregate. The existing handle-occurrence-service will see its twitter.handle.usage output topic promoted to being part of the aggregating public API. The topic will be conceptually owned by the aggregate.

The second task is to define the public API of the ingestion aggregate. As noted above, this aggregate doesn’t use Creek. Maybe the system predates Creek, or maybe its managed by an unenlightened team that doesn’t use Creek… yet .

If the aggregate did use Creek, it would define its own public API. As it doesn’t, the tutorial will show how to codify its API, allowing services to more easily interact with it.

The ingestion aggregate will define a single twitter.tweet.text output topic, as consumed by the existing handle-occurrence-service.

The handle-occurrence-service will be updated to make use of these two aggregate APIs.

The target architecture looks like:

Complete solution

The completed tutorial can be viewed on GitHub .

View on GitHub