The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"How to collect each performance of a backward elimination"
I am using rapid miner for my data mining research, I used backward elimination for my feature (attribute) selections. I was wondering how to set up the process in order to gather each performance for the backward elimination. For example: feature set one (A, B, C , D, E, F), performance one(…); feature set two(A, B, C, D, E), performance two(…); ….
I am currently processing a data table with 21 features and 157000 items. A brute force feature selection simply overload my computer memory. I was wonder how to find the best combination as well as plot a graph that shows which combination of features performance low, and which combination performance high.
Thanks in advance for your kindly support.
I am currently processing a data table with 21 features and 157000 items. A brute force feature selection simply overload my computer memory. I was wonder how to find the best combination as well as plot a graph that shows which combination of features performance low, and which combination performance high.
Thanks in advance for your kindly support.
Tagged:
0
Answers
you can use the logging mechanism to log the results of all rounds and plot them after this with the usual plotters of rapid miner.
Here's a process, that will illustrate how this works. Greetings,
Sebastian
thanks for your quick reply, I was wondering is there a way of loading the XML code and let the repaid miner show the process, thanks for your support.
John
I can not find operator "Process" in my rapid miner, I have a class of process control, in it, we have loops, etc. not sure we have the same version. I am using 5.0.003
John
indeed we don't have the same version. The latest version is 5.0.006 and I would suggest updating if possible. But this isn't the reason for the missing "Process" operator: This one cannot be added by users, since it represents the complete process and is added automatically if creating a new process.
If you want to use my posted process, copy it from here and paste it into the XML View of RapidMiner. After pressing the apply button, the process will be reconstructed from this xml fragment.
Greetings,
Sebastian
It worked, thanks a lot. May I ask another question, how to set up an automatic sampling with x-fold cross validation.
For example, a data set contain label X(6000 items), label Y(500 items). A 10-fold cross validation split the data to 650 for each fold, we use 9 folds to training and 1 fold for testing. For each fold of the training set, we want to balance the label X and Label Y.
For example, fold 1 has label Y(50) and label X(600), so we sample 50 out of label X in fold 1 and correct the new sampled fold 1 as label Y(50) and label X(50), same for the rest of 8 folds. Then we use the 9 sampled folds to training and use the 1 unbalanced fold to testing, the expirment loops the training and testing set for all 10 folds and collect the final performance.
Thanks for your kindly support.
Best Regards
John Quest
well, this seems to be rather difficult without coding. Anyway it could be possible to achieve it. You could build your own small XValidation just by using operators. I will line up the steps here, but it's definitvely beyond the scope of this free support forum to build it for you:
1. Generate a new attribute that will distribute the examples over the folds
2. Loop over each value of this attribute
2.1 Copy the data set and filter it according to the current value of the previously generated fold attribute: One set matching the value, the other containing non matching.
2.2 Learn the model on the non matching
2.3 apply it on the matching.
2.4 measure performance and store anywhere with regarding to fold number
3. Average all performance measurements
This way it could be achieved. Or you ask for a quote for such an extension of the XValidation and would donate this to the general functionality
Greetings,
Sebastian