The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to do text classification only by java code
Hi all,
I am new to RapidMiner communuty. Recently, I'm planning to use Rapidminer for text classification. I want to develop a small demo system (which means do not write xml file) in order to get familiar with the source code of Rapidminer. I tried rapidminer5.0 at first, as there isn't enough documentation and sample for rapidminer5.0, I decided to use 4.6 instead. Unfortunately, I still do not know how to finish that only by java code.
I meet 2 problems :
1: Which operator could help me in transforming all the original messages stored in particular folder into single file which contains the word vector or feature vector. I know the Text Processing plugin, but I'm not sure how to do that from reading original file and only using java code. could anybody show me how to do that?
2: For training the feature vector, which is the easiest way for me to do if I want to use only java code? Is there any sample code could show me how to reading a feature vector file and generate a mod file. (like using weka)
I know these are all stupid questions, it's just I have know idea how to do this. I would be very very appreciated if somebody could give me some sample code (for rapidminer4.6) to show me how the whole process work. Thanks.
I am new to RapidMiner communuty. Recently, I'm planning to use Rapidminer for text classification. I want to develop a small demo system (which means do not write xml file) in order to get familiar with the source code of Rapidminer. I tried rapidminer5.0 at first, as there isn't enough documentation and sample for rapidminer5.0, I decided to use 4.6 instead. Unfortunately, I still do not know how to finish that only by java code.
I meet 2 problems :
1: Which operator could help me in transforming all the original messages stored in particular folder into single file which contains the word vector or feature vector. I know the Text Processing plugin, but I'm not sure how to do that from reading original file and only using java code. could anybody show me how to do that?
2: For training the feature vector, which is the easiest way for me to do if I want to use only java code? Is there any sample code could show me how to reading a feature vector file and generate a mod file. (like using weka)
I know these are all stupid questions, it's just I have know idea how to do this. I would be very very appreciated if somebody could give me some sample code (for rapidminer4.6) to show me how the whole process work. Thanks.
Tagged:
0
Answers
If there was not enough for you in RM 5.00 on text mining then good luck with 4.6 which is no longer supported by RM staff on this forum. If you want to write your own extensions Seb has written a guide you can pay for, but that is only for version 5. Weka is actually supported by Pentaho, who run a forum for that.
Is there any sample java code for that? does rapidminer 4.6 could use weka 5.0 plugin? I have no idea how to implement them.
Thanks, again.
I suppose you should first make yourself familiar with RapidMiner and the Text Processing Extension before actually thinking of integrating it. Your questions are not a matter of coding but of using RapidMiner itself. For everything after this, there is the White Paper for writing Extensions, which will give you a good understanding how RapidMiner works under the hood and the API documentation which gives you the details.
After all you can book webinars telling you how to use Text Mining.
If you decide to use RapidMiner 4.x: Good luck. I know the code of the former Text Plugin, you will need it. For most of it's parts it was easier to rewrite it from scratch than revise it.
Greetings,
Sebastian