The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

What should be the variable type for: ratio variables and ordinal variables?

CuriousCurious Member Posts: 12 Learner I
1) e.g. percentage of population
2) customer rating (doesn't setting the type to polynominal lead to loss of information of hierarchy?)

Answers

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn
    Hi,

    I'm happy to see you posting again, but nonetheless I would like to ask you to specify a bit more on your posts. We need context and whenever possible sample data. And if you have a sample process, that would be a great way to get the help started!

    Regards,
    Sebastian
  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Without sample data it is hard to know for sure, but a priori it seems that a percentage or ratio should be attribute data type numeric/real and a rating should probably be converted to a numeric/integer (this may involve transformation on your part first if you have it as labeled text in the raw form).

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • kypexinkypexin RapidMiner Certified Analyst, Member Posts: 291 Unicorn
    Hi @Curious
    customer rating (doesn't setting the type to polynominal lead to loss of information of hierarchy?)
    I would suggest that you follow a simple rule, if you can directly compare or order different values on 'bigger-smaller' scale, than it should be of a numerical type. Which is most likely the case with a rating, as one can be higher than the other. Also keep in mind that variable type matters in case the variable is a target (label), which you are trying lo predict. In case of numerical label you face regression problem, while with polynomial label it's classification problem, which means different models should be used.
  • CuriousCurious Member Posts: 12 Learner I
    For instance, if we have variables: 

    % of lower status of the population (with values e.g. 12.43) 
    proportion of non-retail business acres per town (with values e.g. 7.87)
    average number of rooms per house (with values e.g. 6.575)
    index of accessibility to highways (with values e.g. 1,2, 3, 4, 5)

    Thank you!
  • kypexinkypexin RapidMiner Certified Analyst, Member Posts: 291 Unicorn
    Hi @Curious

    Here all variables are logically numerical, first three being real and last one integer.
Sign In or Register to comment.