Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
October 28, 2022 08:14 pm GMT

What does 'batching' mean when we're talking about Apache Kafka?

Today I learned that when you hear the word 'batch' in the context of Apache Kafka, it can mean one of two things:

  1. A reference to batch-only data processing systems. Batch-only systems process data in a bounded way. That means that there's a start time and an end-time. Whether this batching is done in large or micro-batches, it is processed all at once. That's in contrast to the continuous data streaming that Apache Kafka enables, in which data is processed in event-sized pieces.

  2. Within the data streaming context, there's something called producer batching. It's a bit of a misnomer because it's not really related to the batch-only data processing systems. A Kafka producer, the client that publishes records to the Kafka cluster, compresses messages via a process called batching to increase throughput. This batching is part of the process handling data at once and in event-sized pieces, so it doesn't mean the same thing as batch-only data processing.

In conclusion, 'batching' means, in a very general way, 'grouping stuff together'. But 'producer batching' and 'batch-only data processing systems' do not share the term in any significant sense, because they are referring to the completely different functions I described above.


Original Link: https://dev.to/cerchie/what-does-batching-mean-when-were-talking-about-apache-kafka-293a

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To