The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Text Input from DB in RM-5"
Hi
I tried to import text fields, to calculate word vectors, from a Mysql DB with the following process chain:
Read Database
Nominal to Text
Data to Documents
Tokenize
and i got the failure message "Expected Document but received IOObjectCollection.
Is there a mistake in my chain?
I tried more or less every other combination from text processing operators but i was not able to calculate the word vector.
With the old plugin I used the StringTextInput Operator but as mentioned in another post this operator is depreciated in RM-5.
Did anyone manage this with RM-5?
bw joachim
I tried to import text fields, to calculate word vectors, from a Mysql DB with the following process chain:
Read Database
Nominal to Text
Data to Documents
Tokenize
and i got the failure message "Expected Document but received IOObjectCollection.
Is there a mistake in my chain?
I tried more or less every other combination from text processing operators but i was not able to calculate the word vector.
With the old plugin I used the StringTextInput Operator but as mentioned in another post this operator is depreciated in RM-5.
Did anyone manage this with RM-5?
bw joachim
Tagged:
0
Answers
you need to include a Process Documents operator for processing single documents. In your case, when you have the data in an example set's text attribute, you must choose the Process Documents from Data operator. All Process Documents operators are Super operators, that have a subprocess. You must put the tokenize Operator into this subprocess.
The Data to Documents will just generate Documents from an Example Set. This might be needed for arbitrary purpose, but not in this special case. If you have a Collection of Documents anyway (which you can recognize on the doubled line on the document output port), you might process it with the Process Documents operator.
Greetings,
Sebastian
after a few hours of frustrating search, i discovered, that you can access the subprocess by double click an the super operator ;-) (maybe this is helpfull for other ...)
now it works ...
Thank you for your help, ...
bw Joachim
Without nesting processes, RapidMiner is only worth the half
The documentation - which of course also covers this - is on its way. Until then, the video tutorials at
http://rapid-i.com/content/view/189/198/
might be useful. There you can see how you can access subprocesses (among other nice features...).
Cheers,
Ingo
I am facing an issue on a related topic:
I use an retrieve operator to get a column with text out of an MySQL-DB, having this one connected to an "Process Documents from Files" Operator.
Here I get the error "The example set must contain at least one text attribute"
I set an alias in the SQL-Query when building the repository entry for the DB naming it "text" and i set the field type in MySQL to "text" as well but still can't manage to get it coonected.
What am I doing wrong?
THx for your help in advance!
Regards GS
you must change the attribute type of the attribute that contains the text to "text". Use the Nominal to Text operator on this attribute.
And I guess you mean Process Documents from Data instead of files? Otherwise you cannot use the ExampleSet at all.
Greetings,
Sebastian
Thank you very much, such an incredible software you created!
I am now trying to create a word list as result of my process showing the occurence and the frequency of the tokenized terms in the texts coming from the database.
But I don't manage to get the colums "occurence" and "frequency" in the result word list as I have seen in the tutorial video on text mining.
The only difference seems to be that in the video the text ist loaded from various documents, I load them from a database, convert them to text, and then process them.
Thx in advance fpr your help
which version of RapidMiner and Text Processing Extension do you use? If I remember correctly, this feature was added in one of the update releases of final 5.0.
Greetings,
Sebastian
Text Ext: 5.0.2
Thx GS
are there the columns Total Occurrences and Document Occurrences? These are the renamed columns from the tutorial. "occurrence" and "frequency" isn't very meaningful, so we decided to rename them.
Greetings,
Sebastian