In the last decade, organisations have become reliant on multiple systems and applications to fulfil their business needs. To work effectively, these systems and applications must be able to communicate with each other in a secure and efficient way. Messaging frameworks have become a critical part of the big data stack for these data-driven organisations, although it is difficult to choose which platform will suit their needs.
There are currently three types of messaging frameworks:
- Messaging Queue Frameworks – The traditional message queue paradigm, which is to be used only when there is a fixed end-to-end messaging system to support it.
- Distributed Messaging Pub-Sub Frameworks – Publish–subscribe is a sibling of the message queue paradigm. This pattern provides greater network scalability and a more dynamic network topology, with a resulting decreased flexibility to modify the publisher and the structure of the published data.
- Distributed Stream Processing Frameworks – Stream processing frameworks are runtime libraries which help developers write code to process streaming data, without dealing with lower level streaming mechanics.
In this blog we give an in-depth overview of these three types of messaging frameworks and a comparison of the specific platforms available in today’s market.
Messaging Queue Frameworks
Active MQ / RabbitMQ / ZeroMQ / RocketMQ
- These are earlier traditional message brokers with more emphasis on queuing rather than streaming.
- They are built over point to point messaging models.
- These are recommended only when there is a fixed end to end communication system.
Distributed Messaging Pub-Sub Frameworks
Apache Kafka
- Apache Kafka is more mature and stable distributed and scalable publish-subscribe data streaming platform with simple producer-consumer, distributed broker, message topics, append only logs and distributed partitions modal.
Apache Pulsar
- Similarly to Kafka, Apache Pulsar is also an open-source distributed and scalable pub-sub messaging system - originally created at Yahoo and now part of the Apache Software Foundation.
Distributed Stream Processing Frameworks
Apache Samza
- Apache Samza is a distributed and scalable real time stream processing framework. Samza allows you to build stateful applications that process data in real-time from multiple sources including Apache Kafka.
Apache Flink
- Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.
Apache Spark
- Apache Spark is a unified analytics engine for large-scale data processing. It achieves high performance for batch and streaming data engine, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.
Apache Storm
- Apache Storm is an open source distributed real time computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for real time processing what Hadoop did for batch processing.
Recommendations
- Messaging Queue Frameworks - Active MQ / RabbitMQ / ZeroMQ / RocketMQ
- These should only be chosen when there is a fixed point to point communication system with standard messaging format.
- These are not designed to be distributed and scalable.
- Distributed Messaging Pub-Sub Frameworks - Kafka / Pulsar
- Pub-Sub Frameworks are most suitable for current data streaming challenges.
- Kafka is more popular, based on huge community support and partner support on multiple technology providers.
- It’s highly simple, flexible, scalable, highly available, fault-tolerant architecture.
- Distributed Stream Processing Frameworks - Spark, Samza, Flink, Storm
- Stream processing is an add-on feature for all distributed big data processing frameworks.
- Apache Spark is more popular and proven with multi partner support on data platforms.
- It’s highly simple distributed in-memory processing.
Conclusions:
Distributed Messaging Broker platform (Kafka) is actively evolved in the market as a nervous connection network for any data platforms or any type of data engines.
If you would like to find out how to bring best practice in your Kafka deployment and optimise the performance and scalability of your Kafka clusters, then give us a call on +44 (0)203 475 7980 or email us at Salesforce@coforge.com
Other useful links:
The Business Sense of Artificial Intelligence
Coforge Expert Kafka Services
7 steps to Predictive Analytics