The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Filter Stopwords (Dictionary): how to connect the dictionary"

KrystynaKrystyna Member Posts: 2 Contributor I
edited June 2019 in Help
Hello everybody,

I have RapidMiner 5.2.0003, where Filter Stopwords (Dictionary) module differes from the previous version. I can not manage to connest the file with stopwords anymore. Earlier i just selected the txt-file. Now there is an input-file parameter. I tried to use retrieve, read from... etc. but it doesn't work. Could you please advise?

Thanks a lot!

My best
Krystyna

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    RapidMiner 5.2.3 has been released more than 7 months ago. Please update both RapidMiner and the Text Processing extension to the latest version, and if the problem still occurs, please give a detailed problem description with an example process according to the post linked in my signature.

    Best, Marius
  • KrystynaKrystyna Member Posts: 2 Contributor I
    Hi Marius,

    My softrawe is updated. In video tutorials I only habe seen examples for older vesrion, where Modul Filter Stopwords (Dictionary) had another structure. this is my process:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" breakpoints="after" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="476" width="547">
          <operator activated="true" class="retrieve" compatibility="5.2.003" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="Nachfrager 2012-07_Lexikon"/>
          </operator>
          <operator activated="true" class="text:process_document_from_data" compatibility="5.2.004" expanded="true" height="76" name="Process Documents from Data" width="90" x="179" y="210">
            <parameter key="keep_text" value="true"/>
            <parameter key="prunde_below_percent" value="2.0"/>
            <parameter key="prune_above_percent" value="100.0"/>
            <list key="specify_weights"/>
            <process expanded="true" height="763" width="785">
              <operator activated="true" class="text:transform_cases" compatibility="5.2.004" expanded="true" height="60" name="Transform Cases" width="90" x="112" y="120"/>
              <operator activated="true" class="text:tokenize" compatibility="5.2.004" expanded="true" height="60" name="Tokenize" width="90" x="246" y="210"/>
              <operator activated="true" class="text:stem_german" compatibility="5.2.004" expanded="true" height="60" name="Stem (German)" width="90" x="380" y="300"/>
              <connect from_port="document" to_op="Transform Cases" to_port="document"/>
              <connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Stem (German)" to_port="document"/>
              <connect from_op="Stem (German)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
          <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Your example process does not help much, since it does not even contain the Filter Stopwords operator (that's how we call "modules" in RapidMiner: "Operator"). However, if you disconnect the file input port, the option to select a text file will re-appear. The file input port is supposed to be used together with the Open File operator, which can also read from web resources and thus makes operators relying on file input more flexible. But as I said, just disconnect the port to get the old behaviour back.

    Happy Mining!
    ~Marius
Sign In or Register to comment.