The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"handling ordinal data and attribute weighting"

yogafireyogafire Member Posts: 43 Contributor II
edited June 2019 in Help
hello,

my first question is about handling ordinal data.
I have a dataset, described like this...

ID--> ID of each patient --> nominal
Age --> age of each patient --> integer
Test_i (i=1,..,4) -->score/grade of each test.--> ordinal--> [value ranges from 1,...,6, higher grade is more severe]
Class --> {cancer, not cancer) --> class of this dataset.

my question is, how to handle the attribute Test_i, it has ordinal data. Can I simply handle those attributes as integer or maybe I should handle those attribute as weights, or there are another ways?


my second question is about attribute weighting.
in that dataset, I use multiple attribute weighting technique, but i found that one out of 5 attribute resulted in 0 by normalizing weights and resulted in very much less value than other (e.g. the weight of this attribute is 0.00xx, the others is about >= 0.2xxx). Can I simply ignore that attribute?

Thank you very much for your reply.

best regards,
Dimas Yogatama
Tagged:

Answers

  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    my question is, how to handle the attribute Test_i, it has ordinal data. Can I simply handle those attributes as integer or maybe I should handle those attribute as weights, or there are another ways?
    the answer depends a bit on the learning scheme you intend to (or have to) use. For example, let's say you want to model your problem with a linear regression scheme. In that case I would suggest to read the data as nominal (or transform them to nominal), transform the columns to binominal (Test_1 = 1 with values "true" and "false", Test_1 = 2 with values "true" and "false"...), and then to numerical (leading to 0 and 1). No the learning scheme can handle the fact that the values are ordered by assigning specific weights to each attribute - and hence to each of the ordered values. The same argument would of course also apply if the data is not ordered but simply nominal  ;)

    Can I simply ignore that attribute?
    If the performance does not drop on an independent test set: yes.

    Cheers,
    Ingo
Sign In or Register to comment.