The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Text Input from DB in RM-5"

klerxklerx Member Posts: 4 Contributor I
edited June 2019 in Help
Hi

I tried to import text fields, to calculate word vectors, from a Mysql DB with the following process chain:

Read Database
Nominal to Text
Data to Documents
Tokenize

and i got the failure message "Expected Document but received IOObjectCollection.

Is there a mistake in my chain?

I tried more or less every other combination from text processing operators but i was not able to calculate the word vector.

With the old plugin I used the StringTextInput Operator but as mentioned in another post this operator is depreciated in RM-5.

Did anyone manage this with RM-5?

bw joachim

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Joachim.
    you need to include a Process Documents operator for processing single documents. In your case, when you have the data in an example set's text attribute, you must choose the Process Documents from Data operator. All Process Documents operators are Super operators, that have a subprocess. You must put the tokenize Operator into this subprocess.
    The Data to Documents will just generate Documents from an Example Set. This might be needed for arbitrary purpose, but not in this special case. If you have a Collection of Documents anyway (which you can recognize on the doubled line on the document output port), you might process it with the Process Documents operator.

    Greetings,
      Sebastian
  • klerxklerx Member Posts: 4 Contributor I


    after a few hours of frustrating search, i discovered, that you can access the subprocess by double click an the super operator ;-) (maybe this is helpfull for other ...)

    now it works ...

    Thank you for your help, ...

    bw Joachim
  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Glad you found it  ;)

    Without nesting processes, RapidMiner is only worth the half  :D

    The documentation - which of course also covers this - is on its way. Until then, the video tutorials at

    http://rapid-i.com/content/view/189/198/

    might be useful. There you can see how you can access subprocesses (among other nice features...).

    Cheers,
    Ingo
  • guitarslingerguitarslinger Member Posts: 12 Contributor II
    Hi there,

    I am facing an issue on a related topic:

    I use an retrieve operator to get a column with text out of an MySQL-DB, having this one connected to an "Process Documents from Files" Operator.

    Here I get the error "The example set must contain at least one text attribute"

    I set an alias in the SQL-Query when building the repository entry for the DB naming it "text" and i set the field type in MySQL to "text" as well but still can't manage to get it coonected.

    What am I doing wrong?

    THx for your help in advance!

    Regards GS
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    you must change the attribute type of the attribute that contains the text to "text". Use the Nominal to Text operator on this attribute.

    And I guess you mean Process Documents from Data instead of files? Otherwise you cannot use the ExampleSet at all.

    Greetings,
      Sebastian
  • guitarslingerguitarslinger Member Posts: 12 Contributor II
    Hi, worked!

    Thank you very much, such an incredible software you created!
  • guitarslingerguitarslinger Member Posts: 12 Contributor II
    Hi, me again:

    I am now trying to create a word list as result of my process showing the occurence and the frequency of the tokenized terms in the texts coming from the database.

    But I don't manage to get the colums "occurence" and "frequency" in the result word list as I have seen in the tutorial video on text mining.
    The only difference seems to be that in the video the text ist loaded from various documents, I load them from a database, convert them to text, and then process them.


    Thx in advance fpr your help
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    which version of RapidMiner and Text Processing Extension do you use? If I remember correctly, this feature was added in one of the update releases of final 5.0.

    Greetings,
      Sebastian
  • guitarslingerguitarslinger Member Posts: 12 Contributor II
    Rapid Miner 5.0.3
    Text Ext: 5.0.2

    Thx GS
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    are there the columns Total Occurrences and Document Occurrences? These are the renamed columns from the tutorial. "occurrence" and "frequency" isn't very meaningful, so we decided to rename them.

    Greetings,
      Sebastian
Sign In or Register to comment.