The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Textmining with Excel Source

mario_playing_wmario_playing_w Member Posts: 5 Contributor II
edited September 2019 in Help
Hi,

I am trying to build my first textmining process on an excel example set. The datasource has a scope of around 5000 lines consisting of a label and text comments.

If I run my process on a subset of around 300 lines, everything works fine, if I use the whole dataset following error occurs:

Process failed!
The setup does not seem to contain any obvious errors, but you should check the log messages or activate the debug mode in the settings...

Including breakpoints unveils that the problem lies within the StringTextInput Queue

  <operator name="Nominal2String" class="Nominal2String">
            </operator>
            <operator name="StringTextInput" class="StringTextInput" expanded="yes">
                <parameter key="default_content_language" value="german"/>
                <list key="namespaces">
                </list>
                <operator name="StringTokenizer (2)" class="StringTokenizer">
                </operator>
                <operator name="ToLowerCaseConverter (2)" class="ToLowerCaseConverter">
                </operator>
                <operator name="GermanStopwordFilter" class="GermanStopwordFilter">
                </operator>
                <operator name="TokenLengthFilter" class="TokenLengthFilter">
                    <parameter key="max_chars" value="40"/>
                </operator>
                <operator name="GermanStemmer" class="GermanStemmer">
                </operator>
            </operator>

Did I miss something? What can I do in order to prevent this error. The Log only says the following in case of the whole dataset:

G Mar 3, 2010 11:01:33 AM: [Fatal] NullPointerException occured in 1st application of StringTextInput (StringTextInput)
G Mar 3, 2010 11:01:33 AM: [Fatal] Process failed: operator cannot be executed. Check the log messages...


Thanks in advance for any hints,

Mario
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Mario,
    I would suggest to switch to RapidMiner 5.0. The new Text Processing Extension will ease your work a lot, for example this example would work :)
    Unfortunately we could not maintain process compatibility for the old Text Mining Plugin of 4.x, so if you are going to switch some day later, you will have to rebuilt your processes completely. So starting directly with 5.0 would really make things easier.

    Greetings,
      Sebastian
  • mario_playing_wmario_playing_w Member Posts: 5 Contributor II
    Hi Sebastian,

    thank you for your reply. It seems that Rapid Miner 5 is far more complicated that 4 was, since the processes aint that easy to build. It wont even let me connect the data flow inside the subprocess correctly, though i ve got a text imput. Probably i ve to play around with the tool first before coming back to textmining.

    Funny thing was that the first error which struck me was nearly the same as in version number 4.  ::)

    I ll come back with something as soon as version 5 likes me.

    Mario
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Mario,
    it might surprise you, but you are the first one who says that process design is more complicated in 5.0 than in 4.x. You can bet, I am surprised. I thought it would be more natural to drag and drop operators on the plane and connect wires at in and output ports?

    Greetings,
      Sebastian
  • mario_playing_wmario_playing_w Member Posts: 5 Contributor II
    Hi Sebastian,

    probably theres a strong correlation between the fact that i had a training on rapidminer 4 and not on 5. ;)

    In between i got some results at least and a working process. Maybe you could tell me how i can export the distribution table of a naive bayes classification? I tried the report and write csv functionality but it just throws several errors or an empty file. :(

    Thanks,

    Mario
Sign In or Register to comment.