Basic Kafka Streams demo

This tutorial will lead you through building a simple Kafka Streams based microservice that consumes a Kafka topic containing tweets and extracts the usage of Twitter handles, e.g. @elonmusk, from the Tweet text, producing usage counts to another topic.

Note: This is a deliberately simplistic service, allowing the tutorial to focus on demonstrating Creek’s core features.

Features covered

By the end of this tutorial you should know:

How to bootstrap a new repo from the aggregate-template repository.
How to add new microservices to an aggregate repository.
How to define a service descriptor: metadata about the API of a service, i.e. its input and output topics.
How to obtain a kafka topic’s serde, for use in a Kafka Streams topologies.
How to build and execute a Kafka Streams topology, using Creek.
How to write black-box system tests of the service’s Docker image.
How to write unit tests of the service’s topology.
How to debug a service, running in a Docker container, when things aren’t working as expected.
How to capture code-coverage metrics.
How Creek enables all of the above to accelerate Microservice development, allowing you to focus on business logic, not boilerplate.

Prerequisites

The tutorial requires the following:

A GitHub account.
Git installed for source code control.
Docker desktop installed for running containerised system tests.
(Optional) IntelliJ IDE installed for code development.
(Optional) AttachMe IntelliJ plugin installed for debugging containerised services.

Design

To keep things simple, this example design assumes an upstream gateway service is consuming tweets from the Twitter api, and producing records to a Kafka topic named twitter.tweet.text. The produced records have the tweet id in the key and the tweet text in the value.

In a normal system, the upstream gateway service would likely own its twitter.tweet.text output topic. To keep this tutorial self-contained, the tutorial’s service will own its twitter.tweet.text input topic.

ProTip: The concept of topic ownership defines which service / aggregate, and hence team within an organisation, is responsible for the topic, its configuration, and the data it contains.

The service will search each tweet for Twitter handles, e.g. @BarackObama. For each handle, the service will produce a record to the twitter.handle.usage Kafka topic. The produced records have the Twitter handle in the key and the number of occurrences in the value.

Complete solution

The completed tutorial can be viewed on GitHub .

View on GitHub