The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Which operator?
Hello,
I am considering using Rapidminer for a piece of PhD research on webforums and I'm feeling my way around the program.
What I want to do is use Rapidminer to test a large data set drawn from web forum databases to see three things:
a) how often certain phrases that I am interested in appear;
b) whether this reduces over time - depending on the date of posting in the forum);
c) and whether references to these phrases are favourable.
My dataset is several CSV files that contain 7 colums, and thousands of rows. Each row contains posting details of a forum posting, and the complete text of that posting, meaning that the "Message" field can be hundreds of words long. Colums are: "MessageID" "ThreadID" "ThreadName" "MemberID" "MemberName" "P_Date" "Message".
My question is, which operator should I use to load this kind of CSV that would allow me to use all seven columns?
I am using both Rapidminer 4.6 and 5 to see which is the easiest to learn, and would appreciate any guidance members have on this.
I am considering using Rapidminer for a piece of PhD research on webforums and I'm feeling my way around the program.
What I want to do is use Rapidminer to test a large data set drawn from web forum databases to see three things:
a) how often certain phrases that I am interested in appear;
b) whether this reduces over time - depending on the date of posting in the forum);
c) and whether references to these phrases are favourable.
My dataset is several CSV files that contain 7 colums, and thousands of rows. Each row contains posting details of a forum posting, and the complete text of that posting, meaning that the "Message" field can be hundreds of words long. Colums are: "MessageID" "ThreadID" "ThreadName" "MemberID" "MemberName" "P_Date" "Message".
My question is, which operator should I use to load this kind of CSV that would allow me to use all seven columns?
I am using both Rapidminer 4.6 and 5 to see which is the easiest to learn, and would appreciate any guidance members have on this.
0
Answers
I would recommend RapidMiner 5.0. It not only lowers the learning curve a lot, but also has the more advanced text processing capabilities.
You can load your data with the read csv operator or simply import it using the wizards (File / Import Data). After this you will be able to use the Process Documents from data operator of the Text Processing Extension to analyse each single Text. By default this operator will generate texts from all attributes of type text. So you might want to change the type of your attribute that stores the text with the operator Nominal to Text.
Greetings,
Sebastian