The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Parralell Processing Extension
Hi,
I have a data set that contains about 4000 attributes
I'm attempting to cluster the attributes by using K-NN and then connect that up to a decision tree in order to see if it can classify the labels derived from the clustering
This is all embedded in a optimize parameters operator which changes the value of k on each run (so for between 2-10 in three steps), The aim being to get the accuracy of the decision tree as high as possible
I have installed the Parralell Processing Extension on my computer and was wondering is there anything special i have to do in order for it to process the information across the cores on the PC.
I have not been able to get anything back from the experiment because it takes up a huge amount of resources (almost 20 gigs of memory)
Thanks for your time
I have a data set that contains about 4000 attributes
I'm attempting to cluster the attributes by using K-NN and then connect that up to a decision tree in order to see if it can classify the labels derived from the clustering
This is all embedded in a optimize parameters operator which changes the value of k on each run (so for between 2-10 in three steps), The aim being to get the accuracy of the decision tree as high as possible
I have installed the Parralell Processing Extension on my computer and was wondering is there anything special i have to do in order for it to process the information across the cores on the PC.
I have not been able to get anything back from the experiment because it takes up a huge amount of resources (almost 20 gigs of memory)
Thanks for your time
0
Answers
you need to use the Loop Parameters (Parallel) operator, however your memory consumption will multiply with the number of threads you use...
Anyway, Decision Trees are a really bad choice for very wide data, i.e. with many attributes, plus they are instable and can completely change when the underlying data changes only marginally. Instead, normalize your data and train e.g. a linear svm on it (provided that you have numerical data with two classes). By inspecting the weights that the SVM assigns to the attributes you get a much better idea which attributes have which impact.
Happy Mining!
~Marius