Home>Articles>Kafka Streams
Event Based Data Architecture. Kafka
Articles

Kafka Streams

Kafka Stream is a stream processing library that transforms data in real-time. It utilizes a Kafka cluster to its full capabilities by leveraging horizontal scalability, fault tolerance, and exactly-once semantics. It is a true stream processing engine that analyzes and transforms the data stored in Kafka topics record by record. Kafka streams are written as standard Java application that takes input from Kafka topics & processes the output in Kafka topics. It requires no separate cluster and works on the application of any size. It can be used in real-time critical applications such as Fraud monitoring & alerts etc.

What makes Kafka Stream different from others?

Four Kafka stream listed

Renowned Companies using Kafka Streams

Kafka streams are making its place in the market because of its simplicity & power. It is already adopted by topmost companies for their specific business use cases such as Pinterest, trivago, Doodle, New York Times, etc. These companies are using Kafka streams for different business use cases such as distributing content to readers in real-time, predicting infrastructure for advertising, to send alert notifications to customers in real-time.

Five brand logo icons

Kafka Streams Terminologies

Five Kafka stream terminologies listed

KStreams versus KTables

You can read the records from Kafka topics either as KStreams or KTables.
KStreams is an abstraction on an infinite stream of records coming from Kafka topic and is very similar to logs where each record is appended to KStreams whereas KTables are just like tables which support upserts & delete operations on the primary key.

KStreams versus KTables snapshot

Below are the juxtapositions of above behavior:

• If record key already exists, in KStreams record is simply appended whereas in KTables record is updated.
• If a record has a null value, in KStreams record is simply appended to existing Stream whereas in KTables if there is an existing record with the same key is deleted from the table.
• If record key is null, in KStreams record is simply appended whereas in KTables such record is ignored.

Reading from Kafka topics

You can read a topic from Kafka as KStreams or KTables or GlobalKTables.

Reading from Kafka topics snapshot

Writing to Kafka Topics

Using KStreams API, you can write to transformed KStreams or KTables back to Kafka topics. There are two API’s for doing it -” to” API writes the transformed KStreams/KTables back to Kafka topic without returning anything whereas “through” API will write and returns the updated stream.

Writing to Kafka Topics snapshot

Processing Data in K Streams/KTables

MapValues

MapValues transformation is supported by both KStreams & KTables. It works only on the value part of the record and it takes one value and produces exactly one value. It does not change the key and does not trigger repartition. In the below example, we are transforming the values of the records to uppercase.

MapValue snapshot

Map

Map transformation is only supported by KStream. It can work on both the key and the value of the record. It takes produces exactly one record for each incoming record and it triggers repartition. In the below example, we are transforming the values of the records to uppercase.

Map snapshot

Filter

As the name suggests, Filter API is used to filter out records based on some condition. It can work with both KStreams/KTables to get the filtered stream. Each record produces zero or one result depending on the condition provided in the Filter API. In the below example, we are going to filter out only those records whose value is greater than 2.

Filter snapshot

FlatMap

FlatMap transformation is very similar to Map but it can produce zero or more output from a single record whereas Map API produces one exactly one output for every input. This operation is only supported by KStreams and it triggers data repartition.

FlatMap snapshot

Branch

Using Branch transformation, we can also split our incoming KStreams into multiple KStreams based on some conditions. In the below example, we will split our incoming stream based on the value of the record which is the city name. If record belongs to ‘London’ city, it will land in KStream1, if a record belongs to Las Vegas, it will land in KStream2 and if a record belongs to Melbourne, it will land in KStream3.

Branch snapshot

Branch snapshot

SelectKey

SelectKey transformation is used to change the existing key of the record. assigns a new key to the record. In the below operation, we are assigning the first two letters of the key as a new Key.

SelectKey snapshot

KStreams/KTables Duality

KStreams and KTables are interchangeable. A KStream can be represented by the changelog of KTable. A KTable can be considered as a snapshot of KStream at any point in time. We can easily change KTable into KStream and KStream into KTable using the below code.

KStreams KTables Duality snapshot

Writing Data to External Systems

Though it is possible to write the transformed records to some external system using Kafka Streams because at the end of the day, they are simply java programs and anything is doable with them but the recommended approach to write to some external system is to use Kafka Connect APIs.

Exactly Once Semantics

It is now possible to have exactly-once semantics in Kafka which is the most desirable guarantee mechanism required in many business use cases such as Banking System. In Kafka Streams, you just have to set the property “processing.guarantee” to “exactly_once” and it will make the processing as exactly once. Exactly once semantics is only possible with the Kafka ecosystem.

Conclusion

All real-time processing framework is still evolving at a constant pace with new APIs every day, but Kafka streams undoubtedly have an edge over others because of its integration with Kafka, exactly-once semantics, its simplicity, and its real-time processing power.

Author: Navdeep Kaur

 

For further articles, please read further:

Introduction to Kafka Streams
Benefits of Stream Processing and Apache Kafka Use Cases
Apache Kafka 2.5 – Overview of Latest Features, Updates, and KIPs
Tips and Tricks for Apache Kafka®