The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"text input from a single text file using text plugin"
Hi,
I am new to text plugin, I am trying to do some text clustering using rapidminer with text plugin. I have all the text in one file in which each line needs to be considered as a different document. I tried using SplitSegmenter, but since a new file is created for every line, the space in blowing up which will hamper scalability.
Can someone suggest a way i can cluster the different lines in the same text so i dont hae to create different files.
Appreciate your response
Regards
Angshu
I am new to text plugin, I am trying to do some text clustering using rapidminer with text plugin. I have all the text in one file in which each line needs to be considered as a different document. I tried using SplitSegmenter, but since a new file is created for every line, the space in blowing up which will hamper scalability.
Can someone suggest a way i can cluster the different lines in the same text so i dont hae to create different files.
Appreciate your response
Regards
Angshu
Tagged:
0
Answers
this is possible. You have to do a little trick: Load the file using the CSVExampleSource operator. Configure the operator in a way, that only one column is created from the file! In order to do so, specify a text never occuring in the field for the column separtion regular expression. Then insert a Nominal2String operator to change the value type to string. After this, using the StringTextInput, you can transform the texts into wordvectors for clustring. To simplify your life, I append a sample process:
Greetings,
Sebastian
Just to add to what Sebastian was saying, in GUI form, you can use the following operator flow,
1. Examplesource - configure your input( tab/csv delimited; format of input fields(nominal or string,etc); type of variable( label for dependent variable and attribute for independent variables, id for keys) ;then save it in attribute file.
2. Stringtextinput - for generating word vectors ; for further info visit,http://kmandcomputing.blogspot.com/2008/06/opinion-mining-with-rapidminer-quick.html
I had faced the same problem and the flow mentioned above helped.
Thanks,
Ram
Best Regards
Angshu