"Text Input from DB in RM-5"

klerx · March 2010

Hi

I tried to import text fields, to calculate word vectors, from a Mysql DB with the following process chain:

Read Database
Nominal to Text
Data to Documents
Tokenize

and i got the failure message "Expected Document but received IOObjectCollection.

Is there a mistake in my chain?

I tried more or less every other combination from text processing operators but i was not able to calculate the word vector.

With the old plugin I used the StringTextInput Operator but as mentioned in another post this operator is depreciated in RM-5.

Did anyone manage this with RM-5?

bw joachim

land · March 2010

Hi Joachim.
you need to include a Process Documents operator for processing single documents. In your case, when you have the data in an example set's text attribute, you must choose the Process Documents from Data operator. All Process Documents operators are Super operators, that have a subprocess. You must put the tokenize Operator into this subprocess.
The Data to Documents will just generate Documents from an Example Set. This might be needed for arbitrary purpose, but not in this special case. If you have a Collection of Documents anyway (which you can recognize on the doubled line on the document output port), you might process it with the Process Documents operator.

Greetings,
Sebastian

klerx · March 2010

after a few hours of frustrating search, i discovered, that you can access the subprocess by double click an the super operator ;-) (maybe this is helpfull for other ...)

now it works ...

Thank you for your help, ...

bw Joachim

IngoRM · March 2010

Glad you found it

Without nesting processes, RapidMiner is only worth the half

The documentation - which of course also covers this - is on its way. Until then, the video tutorials at

http://rapid-i.com/content/view/189/198/

might be useful. There you can see how you can access subprocesses (among other nice features...).

Cheers,
Ingo

guitarslinger · April 2010

Hi there,

I am facing an issue on a related topic:

I use an retrieve operator to get a column with text out of an MySQL-DB, having this one connected to an "Process Documents from Files" Operator.

Here I get the error "The example set must contain at least one text attribute"

I set an alias in the SQL-Query when building the repository entry for the DB naming it "text" and i set the field type in MySQL to "text" as well but still can't manage to get it coonected.

What am I doing wrong?

THx for your help in advance!

Regards GS

land · April 2010

Hi,
you must change the attribute type of the attribute that contains the text to "text". Use the Nominal to Text operator on this attribute.

And I guess you mean Process Documents from Data instead of files? Otherwise you cannot use the ExampleSet at all.

Greetings,
Sebastian

guitarslinger · April 2010

Hi, worked!

Thank you very much, such an incredible software you created!

guitarslinger · April 2010

Hi, me again:

I am now trying to create a word list as result of my process showing the occurence and the frequency of the tokenized terms in the texts coming from the database.

But I don't manage to get the colums "occurence" and "frequency" in the result word list as I have seen in the tutorial video on text mining.
The only difference seems to be that in the video the text ist loaded from various documents, I load them from a database, convert them to text, and then process them.

Thx in advance fpr your help

land · April 2010

Hi,
which version of RapidMiner and Text Processing Extension do you use? If I remember correctly, this feature was added in one of the update releases of final 5.0.

Greetings,
Sebastian

guitarslinger · April 2010

Rapid Miner 5.0.3
Text Ext: 5.0.2

Thx GS

land · April 2010

Hi,
are there the columns Total Occurrences and Document Occurrences? These are the renamed columns from the tutorial. "occurrence" and "frequency" isn't very meaningful, so we decided to rename them.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Text Input from DB in RM-5"

Answers