"custom word weight in vector"

mirt · September 2012

I'm using 'Process Documents from Files' operator and i want to use different from standard (tf-idf, binary term occur. etc.) word weight. What the best way to archive it ? Only by using API ?

MariusHelf · September 2012

Hi,

yes, if you want to use a different word vector creation algorithm, you have to implement it yourself. If you are only interested in the word vector and don't want to pass it to other operators, you can probably calculate it inside RapidMiner: after using the WordList to Data operator, you can use any operators like Aggregate etc. on the word vector and on the processed documents.
If you need a proper WordVector object to pass it to the next Process Documents operator, you will have to program it in java, or in Groovy with the help of the Execute Script operator.

Best, Marius

mirt · September 2012

Thank you.

mirt · October 2012

Kind of a stupid question ._. I have ran rapidminer with eclipse properly; operator 'Process Documents from Files' works (where as i expect tf-idf calculation happens). But i can't find this operator in the code! I have found the TFIDFFilter, but it seems it doesn't involved in process. The 'src/com/rapidminer/operator/text' directory is empty. I only found the some mentions about this operator in xml documentation file.

MariusHelf · October 2012

Hi,

did you also download the code of the text processing extension?
Then you can simply search in the OperatorsTextProcessing.xml for the operator name with underscores, e.g. process_document_from_file, and you will see an entry with the class name - in this case its com.rapidminer.operator.text.io.FileDocumentInputOperator.

Best, Marius

mirt · October 2012

Yes! I download plugins svn repo and found it. The file TFIDF.java is what i was looking for. But it seems the only place where such file exists it that directory which i get from svn just now. However the vector creating in Unuk with eclipse works before i downloaded it. So i conclude that it somehow takes the compiled plugins from not developing version of RM that i used before. The question is how to force Unuk to use plugins from svn.

MariusHelf · October 2012

Hey,

you have to open the "ant" view in Eclipse, drag the build.xml from the extension into the view and double-click the install target.
This will create a jar file for the text processing extension and copy it into the libs folder of RapidMiner. If that file is present, it will have priority over the pre-installed plugin versions.

Prior to building the extension you have to drag the build.xml of RapidMiner_Unuk into the same view and double click "createJar".

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"custom word weight in vector"

Answers