beginers question

shone · July 2009

Hi, in my coledge we have one project about data mining, and the tool we use is rapidminer. Since I'm new to rapidminer, a have one question for you. My process looks like this:

Root
ExampleSource
FeatureSelection
XValidation (number_of_validations = 10)
MetaCost
DecisionTree
OperatorChain
ModelApplier
ClassificationPerformance

I figured that model building is happening in iterations and the model we get at the and is the one that has the best results. When the process is finished, it shows me PerformaceVector in form of confusion matrix. The question is: Is that ConfusionMatrix for the last model, or for the best model?

steffen · July 2009

Hello and welcome to RapidMiner

The answer is: the last model. But: Since the FeatureSelection stops when no more improvememt can be achieved (see description of FeatureSelection in tutorial.pdf or by selecting the operator and press F1) it is also the best model, which can represent a local maximum.

See another example in <your-rm-workspace>\sample\05_Features\10_ForwardSelection.xml.

regards,

Steffen

shone · July 2009

Thanks for the reply.

The reason I asked this is, because, when I save the model (which is result of the given process), and load it in another process and apply it to the same data set, that was used in in the first process, Confusion matrix produced by ClassificationPerformance is different then the one in first process. Why is that?

steffen · July 2009

Ok, I think some terms have been mixed up. In the future please provide the complete setup (just copy all the text from the xm-tab in RapidMiner ... and put it into the thread by please using the code (#) tag).

Your posted setup as the example mentioned by me does not produce a model. It just produces AttributeWeights. So to gain comparable result you have to use a process like this one:


<operator name="Root" class="Process" expanded="yes">
    <operator name="Input" class="ExampleSource">
        <parameter key="attributes"	value="../data/polynomial.aml"/>
    </operator>
    <operator name="AttributeWeightsLoader" class="AttributeWeightsLoader">
    </operator>
    <operator name="AttributeWeightsApplier" class="AttributeWeightsApplier">
    </operator>
    <operator name="XValidation" class="XValidation" expanded="yes">
        <parameter key="create_complete_model"	value="true"/>
        <parameter key="sampling_type"	value="shuffled sampling"/>
        <operator name="NearestNeighbors" class="NearestNeighbors">
            <parameter key="k"	value="5"/>
        </operator>
        <operator name="ApplierChain" class="OperatorChain" expanded="yes">
            <operator name="Applier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="Performance" class="Performance">
            </operator>
        </operator>
    </operator>
    <operator name="ProcessLog" class="ProcessLog">
        <list key="log">
          <parameter key="generation"	value="operator.FS.value.generation"/>
          <parameter key="performance"	value="operator.FS.value.performance"/>
        </list>
    </operator>
</operator>

I said "comparable" not "the same", because to gain exactly the same results you have to ensure that the data is splitted by XValidation exactly the same way as in the last iteration of FeatureSelection. You can achieve this by setting the parameter local_random_seed to a value > 0 (in both the FeatureSelection process and the process specified above). But I do not know why this should matter.

If your proces does produce a model or I misunderstood anything else, please post it here. Otherwise I am restricted to guessing ...

Hope this was helpful

regards,

Steffen

shone · July 2009

I forgott to write, that i've added ModelWriter after the ClassificationPerformance operator.

steffen · July 2009

Fine.

So you save the model every step of XValidation or only the final model (by setting the related parameter) ? No matter what case is the true one, make sure that you have understood XValidation and / or read the documentation of the RapidMiner implementation (select the operator and press F1).

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

beginers question

Answers