Kafka Stream is a stream processing library that transforms data in real-time. It utilizes a Kafka cluster to its full capabilities by leveraging horizontal scalability, fault tolerance, and exactly-once semantics. It is a true stream processing engine that analyzes and transforms the data stored in Kafka topics record by record. Kafka streams are written as standard Java application that takes input from Kafka topics & processes the output in Kafka topics. It requires no separate cluster and works on the application of any size. It can be used in real-time critical applications such as Fraud monitoring & alerts etc.
What makes Kafka Stream different from others?
Renowned Companies using Kafka Streams
Kafka streams are making its place in the market because of its simplicity & power. It is already adopted by topmost companies for their specific business use cases such as Pinterest, trivago, Doodle, New York Times, etc. These companies are using Kafka streams for different business use cases such as distributing content to readers in real-time, predicting infrastructure for advertising, to send alert notifications to customers in real-time.
Kafka Streams Terminologies
KStreams versus KTables
You can read the records from Kafka topics either as KStreams or KTables.
KStreams is an abstraction on an infinite stream of records coming from Kafka topic and is very similar to logs where each record is appended to KStreams whereas KTables are just like tables which support upserts & delete operations on the primary key.
Below are the juxtapositions of above behavior:
• If record key already exists, in KStreams record is simply appended whereas in KTables record is updated.
• If a record has a null value, in KStreams record is simply appended to existing Stream whereas in KTables if there is an existing record with the same key is deleted from the table.
• If record key is null, in KStreams record is simply appended whereas in KTables such record is ignored.
Reading from Kafka topics
You can read a topic from Kafka as KStreams or KTables or GlobalKTables.
Writing to Kafka Topics
Using KStreams API, you can write to transformed KStreams or KTables back to Kafka topics. There are two API’s for doing it -” to” API writes the transformed KStreams/KTables back to Kafka topic without returning anything whereas “through” API will write and returns the updated stream.
Processing Data in K Streams/KTables
MapValues transformation is supported by both KStreams & KTables. It works only on the value part of the record and it takes one value and produces exactly one value. It does not change the key and does not trigger repartition. In the below example, we are transforming the values of the records to uppercase.
Map transformation is only supported by KStream. It can work on both the key and the value of the record. It takes produces exactly one record for each incoming record and it triggers repartition. In the below example, we are transforming the values of the records to uppercase.
As the name suggests, Filter API is used to filter out records based on some condition. It can work with both KStreams/KTables to get the filtered stream. Each record produces zero or one result depending on the condition provided in the Filter API. In the below example, we are going to filter out only those records whose value is greater than 2.
FlatMap transformation is very similar to Map but it can produce zero or more output from a single record whereas Map API produces one exactly one output for every input. This operation is only supported by KStreams and it triggers data repartition.
Using Branch transformation, we can also split our incoming KStreams into multiple KStreams based on some conditions. In the below example, we will split our incoming stream based on the value of the record which is the city name. If record belongs to ‘London’ city, it will land in KStream1, if a record belongs to Las Vegas, it will land in KStream2 and if a record belongs to Melbourne, it will land in KStream3.
SelectKey transformation is used to change the existing key of the record. assigns a new key to the record. In the below operation, we are assigning the first two letters of the key as a new Key.
KStreams and KTables are interchangeable. A KStream can be represented by the changelog of KTable. A KTable can be considered as a snapshot of KStream at any point in time. We can easily change KTable into KStream and KStream into KTable using the below code.
Writing Data to External Systems
Though it is possible to write the transformed records to some external system using Kafka Streams because at the end of the day, they are simply java programs and anything is doable with them but the recommended approach to write to some external system is to use Kafka Connect APIs.
Exactly Once Semantics
It is now possible to have exactly-once semantics in Kafka which is the most desirable guarantee mechanism required in many business use cases such as Banking System. In Kafka Streams, you just have to set the property “processing.guarantee” to “exactly_once” and it will make the processing as exactly once. Exactly once semantics is only possible with the Kafka ecosystem.
All real-time processing framework is still evolving at a constant pace with new APIs every day, but Kafka streams undoubtedly have an edge over others because of its integration with Kafka, exactly-once semantics, its simplicity, and its real-time processing power.
Author: Navdeep Kaur
For further articles, please read further: