The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Low medium high Dataset to predict other dataset values
njhelloworld
Member Posts: 16 Contributor II
I have Nitrogen Attribute with nominal values of Low,Medium,High:
Nitrogen
Low
Medium
Low
Low
Medium
High
and then on the other dataset I have the equivalent value for Low : 0-15,Medium: 15-30,High 30+ . I also have other attribute SoilPh equivalent to numeric values the basis why Nitrogen becomes Low,Medium or High Values or these valuess are dependent to SoilPh.Now I want to predict the numerical values of my Nitrogen Attribute based from the other dataset of Low : 0-15,Medium: 15-30,High 30+.Is this possible??I am a newbie to Rapidminer and Data Mining hope you all give me chance.
0
Answers
Hi @njhelloworld,
To better understand, can you share your two datasets, please ?
Regards,
Lionel
@lionelderkrikor
These are my Data:
Here is my first data file name: nitrogen-Cleaned which has a Nitrogen Value of low,Medium, and High .I want to determine its specific numerical equivalent using the other excel file named: nitrogen those are the range equialient ..Is this be possible?? https://www.youtube.com/watch?v=EKK8X-1oaH8 can this link be applied?tnx for any actions..
Hi @njhelloworld,
Many things :
1 - As mentionned in the Youtube video, The Nominal to Numerical operator is used for converting nominal attributes into numerical attributes
in case you use algorithms which are not working with nominal attributes.
For example, in your case, this operator (by choosing Unique Integers), will transform Low, Medium, High into 0,1,2.
But if I good understand, it is not what you want to do.
2. It is impossible, in your Nitrogene-cleaned file, for a specific observation, to retrieve what was the numerical value of the nitrogen between 0 and 15, or between 15 and 30, or between 31 and 46, if respectivly nitrogene = Low, or respectivly nitrogene = Medium, or respectivly nitrogen = High.
However, it is possible to associate to each value (Low, Medium,High) a relevant numeric value, for example the average of the range, that is :
Low -> 7,5 unit (or any value in range [0,15])
Medium -> 22,5 unit (or any value in range [15,30])
High -> 38.5 unit (or any value in range [31,46])
To perform this association task, you can use the Map operator.
I hope it helps,
Regards,
Lionel
@lionelderkrikor @njhelloworld I want to give a warning with respect to using the Nominal to Numerical operator. Use the default "dummy coding." If you use unique integers you are implying an order. For example if your training set had Nitrogen, Carbon, Oxygen it would convert them to 1, 2, 3. Likewise if your scoring data had Oxygen, Nitrogren, Carbon, it would covert them in order like 1, 2, 3. This could cause bad predictions in the scoring set because the model only sees 1, 2, 3/
Ok, thanks @Thomas_Ott. I understand.
I will apply your advice.
That's explain that 80% of the RapidMiner's users choose "dummy coding" (according to the RapidMiner's statistics) when they
use "Nominal to numerical".
Regards,
Lionel