Save results of operators which have a long runtime in one process
Hello everyone,
I have a large data set with about 10000 examples and about 40 attributes. There are only numeric attributes (real and integer). I used the „Weight by SVM“ operator to weight the attribute and afterwards I took the „Select by Weight“ operator to continue with the top ten attributes. Now I want extend the process to predict the label attribute. So I have to try different operators like Decision Tree an so on. The problem is every time I start to run the process the „Weight by SVM“ need about 20 minutes so that I have to wait a lot of time if I run the process every time from the beginning.
Now the question: What is the best way to save the results of the „Weight by SVM“ operator? At the moment I just want to change operators after I used the „Weight by SVM“ and „Select by Weight“ operator so that the selected attributes for prediction are always the same.
My solution at the moment: I select the attributes and store the reduced data set in one process and in an other process I retrieve the reduced data set and try to predict the label in there.
Is it somehow possible to put all the operations in one process without waiting a lot of time while the „Weight by SVM“ is running? Cache or something like that?
Thanks you very much.
Best regards
Moritz
Best Answers
-
varunm1 Member Posts: 1,207 UnicornHello @MoWei
I got your point, you want to see the process as it is but somehow the process should automatically skip the enables SVM weights operator if its already run.
I don't think this is possible if you want it in the same process. From my understanding, enabling and disabling operator which you are currently using is the way to skip an operator in the current process.
I think you are already aware of creating multiple processes (one with data preprocessing and other with running models). This is similar to the to your second image but with two processes, as it stores the output of one process and retrieves in other.
@yyhuang is there any way of skipping an operator while it's enabled?Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
6 -
yyhuang Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data ScientistIf the operator is enabled but in a "select subprocess" branch, it is possible to be skipped.
You can manually select the branch ID, doing nothing or enable the "weight by"...
HTH.5
Answers
There are multiple ways to deal with this,
1. You can build all your processes in the same process using the subprocess operator. In this, you can connect the output of select by weight to pick the attributes and input the data related to these attributes to train and test different predictive algorithms. Each algorithm will be in a different subprocess. You can use Store operator to store the results of each model performance in rapidminer repository. You can see in below image, I am training 6 models on in a single process with the help of subprocess operator. Inside these subprocesses, there are relevant operators like cross-validation, model, performance, store, etc.
2. You can store the weight by SVM results using store operator in the repository and then access it in different processes you are building. You don't need to run multiple times (Weight by SVM) as the results are stored in the repository, you need to just drag and drop it in the current process and use it in select weights.
Hope this helps, please inform if you need more information.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Hey @varunm1
Thank you for your answer.
To 1: I know that I can use the „subprocess“ operator, but when I click on „Run“ every time the hole subprocesses are running too, or not? Then also the „Weight by SVM“ ist running, even when I put it in an subprocess.
To 2: Yes that is what I do at the moment, but I wanted to know if there is a possibility without the „Store“ and „Retrieve“ operator?
In the following Picture is what I want to have, but without running the „Weight by SVM“ all the time I click on „run the process“
In the following picture is what I do at the moment. Let the process run one time with enabled „Weight by SVM“ and „Store Weight by SVM“ operator when I changed something ist „preprocessing subprocess“ and next time when I run the process I used the „Retrieve“ which gives me the „Weight by SVM“ results. It is okay but I thought there is maybe an other nicer way. I don’t want to have more processes (at the moment) I want to see everything in one process. Hopefully you understand what I mean.
But thank you for now. If you have an idea to solve my problem, it would be nice but it is also okay, how I do it at the moment.
Best regards
Moritz
perfect, that's what I´ve been looking for. And also pretty easy with the "Select Subprocess" operator. I could have figured it out myself.
In the following two screenshots, how I do it now:
Operator "Select Subprocess":
So I just have to set the "Select which" parameter of the "Select Subprocess" operator to "1" to start the "Weight by SVM" operator and store "the new results" or set "2" to use "the old results". Perfect!
Best regards
Moritz