The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Is weight by Information Gain the right operator for me?"
mohammadreza
Member Posts: 23 Contributor II
Hi all,
I am using the operator "weight by Information Gain" in order to select the most predictive attributes from a data set with 218000 attribute and 60000 examples. (Actually, this is the resultant example set I got by of RapidMiner text processing.)
I have been waiting for 4 days so far and the process is still running on a PC with 32 GB of RAM. I am afraid this is not the right operator for my problem. Would you please explain if I have done something wrong.
BTW, as far as I could understand, the computational complexity of calculating information gain might be proportional to "number of attributes" * "number of examples" which is in my case 218000 * 60000 calculations. Do you think this might not be tractable in a PC? if yes, I do appreciate if you can propose any alternate solution.
Thanks in advance
I am using the operator "weight by Information Gain" in order to select the most predictive attributes from a data set with 218000 attribute and 60000 examples. (Actually, this is the resultant example set I got by of RapidMiner text processing.)
I have been waiting for 4 days so far and the process is still running on a PC with 32 GB of RAM. I am afraid this is not the right operator for my problem. Would you please explain if I have done something wrong.
BTW, as far as I could understand, the computational complexity of calculating information gain might be proportional to "number of attributes" * "number of examples" which is in my case 218000 * 60000 calculations. Do you think this might not be tractable in a PC? if yes, I do appreciate if you can propose any alternate solution.
Thanks in advance
Tagged:
0
Answers
200.000 Attributes is really a lot. Even in text mining you usually have less.
You might want to batch it and work on a subset of every attributes, write the weights to file and use it afterwards. Also a sample might be a good solution. Don't forget to use materialze data after the select attributes.
Cheers,
Martin
Dortmund, Germany
Just, would you please explain what is materialized data?
Thanks again
In Rapidminer an example set is usually just held one time in memory. If you select attributes, you do not delete them, but just deselect them. In order to get a real copy in memory you need to use the Materialze Data operator.
This is usually not needed. But in this special case you want to be sure to have an example without those attributes, thus i would recommend using it.
Cheers,
Martin
Dortmund, Germany
Thanks in advance,