🎉 🎉. RAPIDMINER 9.8 IS OUT!!! 🎉 🎉
RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance
[New Extension] Kafka Connector
This extension contains operators to work with the Apache Kafka message broker.
It adds two operators to interact with a specific Kafka topic:
Read messages from a topic, either old messages in a batch, or collect new published messages
Write data from an example set as new messages into a topic on a Kafka cluster
Both operators collect the data batch wise and work best in combination with either background execution or a deployment on AI Hub or with a Real-Time-Scoring agent.To ease the access to existing Kafka servers, the extension supports a new Connection Object to store and share connection details.
Use Case examples
- Read past messages from a topic
With the "Read Kafka Topic" operator and the polling strategy set to earliest, you can retrieve past messages stored under a given topic. This can be used to analyze and potentially build a model on historic data.
- Read new messages from a topic
With the "Read Kafka Topic" operator and the polling strategy set to latest, you can retrieve newly published messages from a topic. There is an option to either wait until a specified number of messages are retrieved or to wait a fixed amount of time and return all messages published during that time. This can be used in a deployment to apply a learned model on fresh data or check for specific events.
- Write messages to a topic
With the "Write Kafka Topic" operator, any example set data can be converted into a message and be pushed to a given topic. Each example or row is converted to a message (either as simple string or in a JSON format) and send to the message broker. There is an option to either send all messages in a bulk or in a specified time interval. This can be used for example to publish scored or transformed data back into a Kafka stream.
- Train on historic data and score on new messages
With a combination of read and write it's easy to build a model that is trained on past messages that are stored on the cluster and then periodically retrieve new messages and apply the model on them. The scored data can then even be published again into a new topic.
This is an early version of the extension, so be aware of missing features. Feedback is much appreciated.
The extension itself only functions as a message consumer or producer. There is no shipped Kafka cluster with the extension, but it's also not needed for simple reading or writing operations.