The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to split a text into several pieces?
I want to to split a text into several pieces for retrieval-augmented generation under Generative Models Extension.
I have checked the document at https://docs.rapidminer.com/latest/studio/generative-ai/#retrieval-augmented-generation
but I don't know how to reproduce the process. Can someone provide the process? Further, I have tried text processing extensions with "create document" and "window document". But, I get "no elements in this collection" from "window document". Any help? Thanks.
Regards
Frank
I have checked the document at https://docs.rapidminer.com/latest/studio/generative-ai/#retrieval-augmented-generation
but I don't know how to reproduce the process. Can someone provide the process? Further, I have tried text processing extensions with "create document" and "window document". But, I get "no elements in this collection" from "window document". Any help? Thanks.
Regards
Frank
0
Answers
I'd recommend looking into the Text Analysis course on the RapidMiner Academy, as it gives a nice overview of how you can load and manipulate text data.
To split up text, as a starting point generally I would use the Tokenize Operator inside a Process Documents operator. This splits each document by some form or regular pattern, which usually for me ends up being whitespace. Also just make beforehand you set the column data type to Text, and also use a Data to Documents operator.
Hope this makes sense.
Best,
Roland