The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
NORMALIZE FOR AN ATTRIBUTE THAT TAKES A VALUE EITHER 0 OR 1
Hello, everyobody!
I' ve recently started using rapid miner and educating myself in data mining - analysis.
While i was testing an example set of data that referred to a questionnaire , I spotted an attribute that took values either 0 or 1, while all the other attributes were taking values to a range from 1 to 5. I cannot exclude any of the attributes to my analysis so Im thinking to normalize. What are you suggesting that I should do?
I tried the range method from 0.0 to 1.0 for all attributes, but is it right considering my disputed attribute isnt getting values from 0 to 1, but it takes EITHER 0 OR 1.
I' ve recently started using rapid miner and educating myself in data mining - analysis.
While i was testing an example set of data that referred to a questionnaire , I spotted an attribute that took values either 0 or 1, while all the other attributes were taking values to a range from 1 to 5. I cannot exclude any of the attributes to my analysis so Im thinking to normalize. What are you suggesting that I should do?
I tried the range method from 0.0 to 1.0 for all attributes, but is it right considering my disputed attribute isnt getting values from 0 to 1, but it takes EITHER 0 OR 1.
0
Best Answers
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi,sorry, but what do you expect here? The range normalization makes sure, that the smallest value in your data set is 0, and the biggst one is 1. So if you come in with an attribute which is only 0 and 1 it can only map it to 0 and 1?Best,Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany1 -
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi @Maria_L,
you don't need to normalize the data for most machine learning algorithms. But of course you can do it even for these.
You can easily normalize the data for a range of 0 to 1. As Martin wrote, the 0/1 attribute won't change the values, others will be on the same magnitude (0, 0.2, 0.4, ... 1). This might help you understand your models better.
Of course you could do the range transformation on the 0/1 attribute and just multiply by 5.
Be careful when normalizing. You might have an attribute without the "extreme" answers (0?, 1, 5). This would then be changed in a different way - the actual maximum (e. g.) would become 1 and so on. So you would change the scale of this one attribute compared to others.
Regards,
Balázs2
Answers
Thanks for replying but Im not sure I understood your answer.
My attribute takes 0 or 1 meaning in reality that refers to a question in a a questionnaire that takes EITHER YES (1) OR NO (0). All other questions in same questionnaire of the dataset are taking answers to a range 1 to 5.
So Im wondering if there's any logic to transform the YES OR NO question which is either 1 or 0 to a range 1 to 5.
Between those two options I prefer to normalize all other questions (1 to 5) to a range 0.0 to 1.0.
What do you think?
Thank you all in advance!
Thank you for your kind reply! Good manners are always the best attributes!
I' m not even a week on this field and I really want to learn. Also, my background isn't a mathematical one.
So, If I understood correctly, you suggest that I could apply the range transformation to just that single attribute to a range from 1 to 5?
I also have an another question. In one other paper, I have to extract some association rules from a dataset. I'm asked to publish the 10 most powerful ones associated with one particular attribute. So, Im running a process, and it comes up that there are only two association rules which reffer strictly to this attribute like
i.e. [X= '(2.5-inf)'] ---> [Y'( 2.5.-inf)' ], [Z '(2.5-inf)'], (confidence : 0.9)
All other results came like [X= '(2.5-inf)'], [Y'( 2.5.-inf)' ] --->[Z '(2.5-inf)'], (confidence : 0.9)
So, my question is : when the attribute in dispute is X should I include in my result list just the ones like the example above or am I allowed to list all others that X is appeared in combination with other attributes?
Thank you in advance!
just go over your attribute list check the minimum and maximum values and decide accordingly, using the criteria I listed.
The most important thing is to understand which attributes have to be changed and how. And afterwards, check if the transformation is according to your expectations: e. g. min is 0, max is 1, etc.
You can change the filter parameters to get more association rules, ordered by confidence for example.
It depends on the question or the problem you're trying to solve if only the X => rule is relevant or also the X, Y => rule is. What is relevant in the real world? How often is the X, Y rule seen? It's your decision how you decide which rule is more important.
Regards,
Balázs