The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"StopWordFilterFile" doesn't work

IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
Original message from SourceForge forum at http://sourceforge.net/forum/forum.php?thread_id=2039566&;forum_id=390413

Hi,

I want to use the "StopWordFilterFile"-operator in a Java-application to filter terms based on a list in an external file. 
My code looks like the following:


=====================================================

...
StopWordFilterFile stopfilter = new StopWordFilterFile(new FileReader(new File(Constants.STOPWORDS_PATH + filename)), false);
config.setConfigurationRule(WVTConfiguration.STEP_WORDFILTER, new WVTConfigurationFact(stopfilter));
...

=====================================================


But when I use the constructor to set the stopword-file and the case-sensitive-flag which should be provided in the 4.1-version of the RapidMiner I only got the following error-message:

=====================================================
cannot find symbol
[javac] symbol : constructor StopWordFilterFile(java.io.FileReader,boolean)
[javac] location: class edu.udo.cs.wvtool.generic.wordfilter.StopWordFilterFile
[javac] StopWordFilterFile stopfilter = new StopWordFilterFile(new FileReader(new File(Constants.STOPWORDS_PATH + filename)), false);
=====================================================


Also the method "setMinNumChars" isn't provided anymore.

=====================================================
((AbstractStopWordFilter) filter).setMinNumChars(1);
=====================================================

Because in my application I also want to give the user the option to use no stopword-filter. Otherwise if no WVTConfiguration.STEP_WORDFILTER is set, a default-stopword-filter will be executed which filters all 3-character-words which isn't what I want.

Hope someone can help me.

Greetings,
Mary-Anne


Answer by Ingo Mierswa:

Hello Mary-Anne,

I must admit that I don't see any problem with the line

StopWordFilterFile stopfilter = new StopWordFilterFile(new FileReader(new File(Constants.STOPWORDS_PATH + filename)), false);

The constructor is still there and therefore this should not be problem. Although the problem might be that we changed the way text processing should be performed: now inner operators should be used instead of the one big WVToolOperator in previous releases. The single big operator is now deprecated and will no longer be supported in future versions. Maybe this is the reason for the problem.


Instead of using the method "setMinNumChars" you could now use the operator "TokenLengthFilter".

I would really recommend to redefine you text processing process in the GUI by using inner operators (have a look at the samples delivered together with the text plugin) and change you program according to this new architecture. It is actually more convenient and more powerful now.

Cheers,
Ingo
Sign In or Register to comment.