The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

[SOLVED] Nominal To Number Operator

macctenmaccten Member Posts: 28 Contributor II
edited November 2018 in Help
Hi All

Im trying to massage data in order to run a clustering algorithm on it
The dataset I have has many nominal attributes and I wish to convert them to numbers in order that the clustering algorithm works correctly

I have used the nominal to number operator but am having problems with the dummy values replacing the nominal values with numbers

What I would like is something like below where each number actually represents a value
I am unable to get this working at present. Can anyone help me out…It’s a bit of a show stopper at present 

Old Value      Converted Value
CH              1
IE              2
CH              1
DE              3
IL              4

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    You should use the Nominal to Numerical operator with coding_type set to dummy_coding.

    Best regards,
    Marius
  • macctenmaccten Member Posts: 28 Contributor II
    Hi Marius

    Thank you for the quick response

    This was originally what i was doing. I had a read Database operator which linked to a select attributes operator which selected a column that had only 3 available nominal values.
    I then connected up the Nominal To Numeric operator.

    What i was expecting was

    value    Converted Value
    Value_1          1
    Value_2          2
    Value_2          2

    What i got instead was
                Value_1 Value_2
    row 1      1            0
    row 2      0            1
    row 3      0            1

    This looks like more of what i would expect from a nominal to binomial operator

    Again thanks for your time
    It is a very frustrating problem





  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    I reckon the coding type should be "unique integers" as in the following
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_nominal_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Nominal Data" width="90" x="112" y="75"/>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.3.008" expanded="true" height="94" name="Nominal to Numerical" width="90" x="112" y="165">
            <parameter key="coding_type" value="unique integers"/>
            <list key="comparison_groups"/>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    regards

    Andrew
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    To get the result described above, the coding type must in fact be unique integers. BUT: for clustering that's usually not a good choice, since unique integers would imply an ordering of the values. But imagine you have the nominal values red, green and blue. If you assign red=1, green=2 and blue=3 it would imply that blue is three times as much as red, and that a "blue" instance is further away from a 'red' instance than from a 'green' instance. That's usually not desired.
    The dummy coding overcomes this and is the method of choice if you want to apply clustering, linear regression or any other algorithm that depends on only numerical values.

    Best regards,
    Marius
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Marius is absolutely correct of course.

    regards

    Andrew
  • macctenmaccten Member Posts: 28 Contributor II
    Hi Marius,

    What you say makes a lot of sense and it appears i was heading down this road which would have for sure given me a poor output from the model

    Thank you very much for your time

Sign In or Register to comment.