The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
beginers question
Hi, in my coledge we have one project about data mining, and the tool we use is rapidminer. Since I'm new to rapidminer, a have one question for you. My process looks like this:
Root
ExampleSource
FeatureSelection
XValidation (number_of_validations = 10)
MetaCost
DecisionTree
OperatorChain
ModelApplier
ClassificationPerformance
I figured that model building is happening in iterations and the model we get at the and is the one that has the best results. When the process is finished, it shows me PerformaceVector in form of confusion matrix. The question is: Is that ConfusionMatrix for the last model, or for the best model?
Root
ExampleSource
FeatureSelection
XValidation (number_of_validations = 10)
MetaCost
DecisionTree
OperatorChain
ModelApplier
ClassificationPerformance
I figured that model building is happening in iterations and the model we get at the and is the one that has the best results. When the process is finished, it shows me PerformaceVector in form of confusion matrix. The question is: Is that ConfusionMatrix for the last model, or for the best model?
0
Answers
The answer is: the last model. But: Since the FeatureSelection stops when no more improvememt can be achieved (see description of FeatureSelection in tutorial.pdf or by selecting the operator and press F1) it is also the best model, which can represent a local maximum.
See another example in <your-rm-workspace>\sample\05_Features\10_ForwardSelection.xml.
regards,
Steffen
The reason I asked this is, because, when I save the model (which is result of the given process), and load it in another process and apply it to the same data set, that was used in in the first process, Confusion matrix produced by ClassificationPerformance is different then the one in first process. Why is that?
Your posted setup as the example mentioned by me does not produce a model. It just produces AttributeWeights. So to gain comparable result you have to use a process like this one: I said "comparable" not "the same", because to gain exactly the same results you have to ensure that the data is splitted by XValidation exactly the same way as in the last iteration of FeatureSelection. You can achieve this by setting the parameter local_random_seed to a value > 0 (in both the FeatureSelection process and the process specified above). But I do not know why this should matter.
If your proces does produce a model or I misunderstood anything else, please post it here. Otherwise I am restricted to guessing ...
Hope this was helpful
regards,
Steffen
So you save the model every step of XValidation or only the final model (by setting the related parameter) ? No matter what case is the true one, make sure that you have understood XValidation and / or read the documentation of the RapidMiner implementation (select the operator and press F1).