The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

[New Extension] Kafka Connector

David_ADavid_A Administrator, Moderator, Employee-RapidMiner, RMResearcher, Member Posts: 297 RM Research
edited February 2021 in Knowledge Base

Summary

This extension contains operators to work with the Apache Kafka message broker.

It adds two operators to interact with a specific Kafka topic:

  • Read messages from a topic, either old messages in a batch, or collect new published messages

  • Write data from an example set as new messages into a topic on a Kafka cluster

Both operators collect the data batch wise and work best in combination with either background execution or a deployment on AI Hub or with a Real-Time-Scoring agent.

To ease the access to existing Kafka servers, the extension supports a new Connection Object to store and share connection details.


Use Case examples

  1. Read past messages from a topic
    With the "Read Kafka Topic" operator and the polling strategy set to earliest, you can retrieve past messages stored under a given topic. This can be used to analyze and potentially build a model on historic data.


  2. Read new messages from a topic
    With the "Read Kafka Topic" operator and the polling strategy set to latest, you can retrieve newly published messages from a topic. There is an option to either wait until a specified number of messages are retrieved or to wait a fixed amount of time and return all messages published during that time. This can be used in a deployment to apply a learned model on fresh data or check for specific events.



  3. Write messages to a topic
    With the "Write Kafka Topic" operator, any example set data can be converted into a message and be pushed to a given topic. Each example or row is converted to a message (either as simple string or in a JSON format) and send to the message broker. There is an option to either send all messages in a bulk or in a specified time interval. This can be used for example to publish scored or transformed data back into a Kafka stream.



  4. Train on historic data and score on new messages
    With a combination of read and write it's easy to build a model that is trained on past messages that are stored on the cluster and then periodically retrieve new messages and apply the model on them. The scored data can then even be published again into a new topic.


Disclaimer

This is an early version of the extension, so be aware of missing features. Feedback is much appreciated.

The extension itself only functions as a message consumer or producer. There is no shipped Kafka cluster with the extension, but it's also not needed for simple reading or writing operations.


Tagged:
Sign In or Register to comment.