The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"custom word weight in vector"

mirtmirt Member Posts: 4 Contributor I
edited June 2019 in Help
I'm using 'Process Documents from Files' operator and i want to use different from standard (tf-idf, binary term occur. etc.) word weight. What the best way to archive it ? Only by using API ?

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    yes, if you want to use a different word vector creation algorithm, you have to implement it yourself. If you are only interested in the word vector and don't want to pass it to other operators, you can probably calculate it inside RapidMiner: after using the WordList to Data operator, you can use any operators like Aggregate etc. on the word vector and on the processed documents.
    If you need a proper WordVector object to pass it to the next Process Documents operator, you will have to program it in java, or in Groovy with the help of the Execute Script operator.

    Best, Marius
  • mirtmirt Member Posts: 4 Contributor I
    Thank you.
  • mirtmirt Member Posts: 4 Contributor I
    Kind of a stupid question ._. I have ran rapidminer with eclipse properly; operator 'Process Documents from Files' works (where as i expect tf-idf calculation happens). But i can't find this operator in the  code! I have found the TFIDFFilter, but it seems it doesn't involved in process. The 'src/com/rapidminer/operator/text' directory is empty. I only found the some mentions about this operator in xml documentation file.
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    did you also download the code of the text processing extension?
    Then you can simply search in the OperatorsTextProcessing.xml for the operator name with underscores, e.g. process_document_from_file, and you will see an entry with the class name - in this case its com.rapidminer.operator.text.io.FileDocumentInputOperator.

    Best, Marius
  • mirtmirt Member Posts: 4 Contributor I
    Yes! I download plugins svn repo and found it. The file TFIDF.java is what i was looking for. But it seems the only place where such file exists it that directory which i get from svn just now. However the vector creating in Unuk with eclipse works before i downloaded it. So i conclude that it somehow takes the compiled plugins from not developing version of RM that i used before. The question is how to force Unuk to use plugins from svn.
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hey,

    you have to open the "ant" view in Eclipse, drag the build.xml from the extension into the view and double-click the install target.
    This will create a jar file for the text processing extension and copy it into the libs folder of RapidMiner. If that file is present, it will have priority over the pre-installed plugin versions.

    Prior to building the extension you have to drag the build.xml of RapidMiner_Unuk into the same view and double click "createJar".
Sign In or Register to comment.