The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"newbie: Excel to text"

shilaskishilaski Member Posts: 8 Contributor II
edited May 2019 in Help
Here is my project scope.  I have an excel spreadsheet of warranty claims with around 9100 entries.  One of the columns within the spreadsheet contains a comment section.  This section is where a tech will write what was wrong with the vehicle.  These sections are what I want to text mine.

I have figured out how to load the sheet and run it thru the filter so I am just concentrating on data that I am interested in.  Now I am guessing that I need to use the text plugin tool to create word vectors (please tell me if I am wrong).  It appears that the textinput operator expects an exampleset as it's input from a directory.  My question is how to correctly load the textinput operator.  Of couse I could be completely wrong...maybe there is a better way to do this?

Here is what I have

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExcelExampleSource" class="ExcelExampleSource">
        <parameter key="excel_file" value="C:\Documents and Settings\shilaski.TRICO29\Desktop\50percentWarranty\Conceptsheet_Claims_short.xls"/>
        <parameter key="first_row_as_names" value="true"/>
    </operator>
    <operator name="AttributeFilter" class="AttributeFilter">
        <parameter key="condition_class" value="attribute_name_filter"/>
        <parameter key="parameter_string" value="comments"/>
    </operator>
    <operator name="Nominal2String" class="Nominal2String">
    </operator>
    <operator name="TextInput" class="TextInput" expanded="yes">
        <parameter key="create_text_visualizer" value="true"/>
        <parameter key="id_attribute_type" value="long"/>
        <parameter key="use_content_attributes" value="true"/>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
    </operator>
</operator>

Answers

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee-RapidMiner, Member Posts: 295 RM Product Management
    Hi Stacy,

    in principal you are right. You simply have to use the [tt]StringTextInput[/tt] operator instead of the [tt]TextInput[/tt]. The first one will load the texts from strings form an already present example set. The latter one will load the texts from files directly.

    Hope that helps,
    regards,
    Tobias
  • shilaskishilaski Member Posts: 8 Contributor II
    Alright...Here is where I am at..

    <operator name="Root" class="Process" expanded="yes">
        <description text="#ylt#h3#ygt#Finding important terms#ylt#/h3#ygt##ylt#p#ygt#This experiments shows how to find terms that are characteristic for a set of texts#ylt#/p#ygt#. #ylt#p#ygt##ylt#b#ygt#Hint:#ylt#/b#ygt#In the interactive keyword selection, click on weight to sort the terms by their relevance to the class specified in the CorpusBasedWeighting operator.#ylt#/p#ygt#"/>
        <operator name="ExcelExampleSource" class="ExcelExampleSource">
            <parameter key="datamanagement" value="long_array"/>
            <parameter key="excel_file" value="C:\Documents and Settings\shilaski.TRICO29\Desktop\50percentWarranty\Conceptsheet_Claims_short.xls"/>
        </operator>
        <operator name="AttributeFilter" class="AttributeFilter">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="parameter_string" value="comments"/>
        </operator>
        <operator name="Nominal2String" class="Nominal2String">
        </operator>
        <operator name="StringTextInput" class="StringTextInput" expanded="yes">
            <parameter key="default_content_language" value="english"/>
            <parameter key="vector_creation" value="TermOccurrences"/>
            <operator name="StringTokenizer" class="StringTokenizer">
            </operator>
            <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
            </operator>
            <operator name="TokenLengthFilter" class="TokenLengthFilter">
                <parameter key="min_chars" value="3"/>
            </operator>
            <operator name="PorterStemmer" class="PorterStemmer">
            </operator>
        </operator>
        <operator name="CorpusBasedWeighting" class="CorpusBasedWeighting">
            <parameter key="class_to_characterize" value="graphics"/>
        </operator>
        <operator name="InteractiveAttributeWeighting" class="InteractiveAttributeWeighting">
        </operator>
    </operator>

    Problem now is that I keep on getting an error

    Error in: StringTextInput (StringTextInput) The input example set does not contain any attributes with value type string. Some operators require example sets with attributes of a specific value type. Please refer to the documentation of the used operators for further details.
  • shilaskishilaski Member Posts: 8 Contributor II
    figured it out.  Somehow I missed called out the parameter for which column I wanted.  Had it called out before,  but I supposed I should have troubleshot it before posting to the forums.

    Thanks
Sign In or Register to comment.