The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

[SOLVED]Read 'Transaction' file problem

wujiangwujiang Member Posts: 12 Contributor II
edited November 2018 in Help
Hello,

I want read a file like this:

a b c d e f g h i j k l m n o p q r s t u v w x y z aa bb cc dd
30 31 32
33 34 35
36 37 38 39 40 41 42 43 44 45 46
38 39 47 48
38 39 48 49 50 51 52 53 54 55 56 57 58
32 41 59 60 61 62
3 39 48
63 64 65 66 67 68
32 69
48 70 71 72

I use the 'Read CSV' operator, and split by the ' '

so I got a dataset like

30 31 32  ? ?  ? ? ?....
33 34 35  ? ?  ? ....
36 37 38 39 40 41 42 43 44 45 46
...

once I click 'OK' ,  rapidminer will prompts me a 'error Message:'

Message: An attribute 2 was specified for column 2, but this column does not exist in input data.

What I want is a integer array for these data, How can I deal with it ? ??? ??? ???

Thanks in advance.

Answers

  • frasfras Member Posts: 93 Contributor II
    Should do the job:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.003">
      <context>
        <input/>
        <output/>
        <macros>
          <macro>
            <key>input</key>
            <value>http://pastebin.com/raw.php?i=3guyR6H5</value&gt;
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" compatibility="6.0.003" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_csv" compatibility="6.0.003" expanded="true" height="60" name="Read CSV" width="90" x="45" y="75">
            <parameter key="csv_file" value="C:\Users\fras\AppData\Local\Temp\rm_file_6786169822975474295.dump"/>
            <parameter key="column_separators" value="'\n&quot;"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="att1.true.polynominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="split" compatibility="6.0.003" expanded="true" height="76" name="Split" width="90" x="246" y="75">
            <parameter key="split_pattern" value="\s"/>
          </operator>
          <operator activated="true" class="rename_by_example_values" compatibility="6.0.003" expanded="true" height="76" name="Rename by Example Values" width="90" x="447" y="75"/>
          <connect from_op="Read CSV" from_port="output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/>
          <connect from_op="Rename by Example Values" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • wujiangwujiang Member Posts: 12 Contributor II
    Thanks fras,

    It works, but The "rename by exaple values" shows "error " like
    The exampleset must contain at least 1 examples with parameter "row_number" set to "1".  Event the error exist, your program woks :)

    Actually I use RM not for a long time, it's the first time I use XML of RM, could you explain a little bit about your answer?

    I have to deal with the output of a example Set (the output data is the input data for my own operator), I can't use '?' value, so, do you have any idea to slove this problem, I tried to replace '?' by a '-1', but the '?' can't be known by "replace missing value".

    Now I use 'Nominal to Numerial' to transform "String" to "Integer". But what I get is "0 0 0 0 0 0 0 0" and "111...."

    I use the wrong Operater to generate the Integer Type?
Sign In or Register to comment.