The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"text mining Excel file"
rdmckinney
Member Posts: 15 Maven
I didn't find my topic with a search, so please redirect me if you have discussed this elsewhere. I have an Excel file with comments from members. I want to mine the comments as if each member/record is a document. I can get the Excel file into Rapidminer easily with ExcelExampleSource, but when I connect that to TextInput I get an error message: "Error in: TextInput (TextInput) The attribute 'text_source' does not exist. The example set does not contain an attribute with the given name." What should be my next step after the ExcelExampleSource?
Thanks!
Roger D. McKinney
Thanks!
Roger D. McKinney
Tagged:
0
Answers
use the [tt]StringTextInput[/tt] operator instead of the [tt]TextInput[/tt] operator.
Kind regards,
Tobias
Thanks!
Btw.: the [tt]Nominal2String[/tt] operator converts nominal to string columns. That way, you could load the texts directly from the excel file.
Kind regards,
Tobias
I'm making progress I am running the following code and so far it has taken 43 minutes. Is that normal?
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#Specifying texts by an example set#ylt#/h3#ygt##ylt#p#ygt#Using the parameter list or the wizard are simple methods for setting up the directories from which the text documents are read. Sometimes, however, a more flexible solution is needed. If, for instance, your text documents have different types of encoding or are written in different languages, you might wish to provide this information for each input directory separately.#ylt#/p#ygt# #ylt#p#ygt#You can do this by using an example set that contains one row for each input directory and corresponding attributes for source, encoding, type and class. If such an example set is provided, the texts in the parameter list are ignored.#ylt#/p#ygt#"/>
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="C:\Documents and Settings\rkenney\My Documents\RapidMiner\TextExamples\MSSComments.aml"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="remove_original_attributes" value="true"/>
<parameter key="default_content_language" value="english"/>
<list key="namespaces">
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="EMClustering" class="EMClustering">
<parameter key="k" value="5"/>
</operator>
</operator>
G May 1, 2009 11:50:55 AM: [Fatal] Process failed: operator cannot be executed (6). Check the log messages...
Here's my code:
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#Specifying texts by an example set#ylt#/h3#ygt##ylt#p#ygt#Using the parameter list or the wizard are simple methods for setting up the directories from which the text documents are read. Sometimes, however, a more flexible solution is needed. If, for instance, your text documents have different types of encoding or are written in different languages, you might wish to provide this information for each input directory separately.#ylt#/p#ygt# #ylt#p#ygt#You can do this by using an example set that contains one row for each input directory and corresponding attributes for source, encoding, type and class. If such an example set is provided, the texts in the parameter list are ignored.#ylt#/p#ygt#"/>
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="C:\Documents and Settings\rkenney\My Documents\RapidMiner\TextExamples\MSSComments.aml"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="remove_original_attributes" value="true"/>
<parameter key="default_content_language" value="english"/>
<list key="namespaces">
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="GHA" class="GHA">
<parameter key="number_of_components" value="6"/>
<parameter key="number_of_iterations" value="100"/>
</operator>