The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Decision Tree Model Not Visible"

hgwelechgwelec Member Posts: 31 Maven
edited May 2019 in Help
Hello again,



I am trying to use a decision tree learner for a problem.  If i run the stream with just the input file node and the decision tree learner, the resulting decision tree is shown fine. However when i run the following stream (essentially i perform cross-validation), i cannot see the resulting tree (and hence the resulting model). Here is the setup :

<operator name="Root" class="Process" expanded="yes">
    <operator name="CSVExampleSource" class="CSVExampleSource">
        <parameter key="filename" value="D:\MyDocumentsr\kvltrain.csv"/>
        <parameter key="label_name" value="zkvl"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="skip_features_with_name" value="(Age|Profession)"/>
    </operator>
    <operator name="XValidation" class="XValidation" expanded="yes">
        <operator name="DecisionTree" class="DecisionTree">
            <parameter key="keep_example_set" value="true"/>
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="ModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="ClassificationPerformance" class="ClassificationPerformance">
                <parameter key="absolute_error" value="true"/>
                <parameter key="accuracy" value="true"/>
                <list key="class_weights">
                </list>
                <parameter key="classification_error" value="true"/>
                <parameter key="normalized_absolute_error" value="true"/>
                <parameter key="root_mean_squared_error" value="true"/>
                <parameter key="root_relative_squared_error" value="true"/>
            </operator>
        </operator>
    </operator>
    <operator name="ProcessLog" class="ProcessLog">
        <parameter key="filename" value="D:\Programs\Rapid-I\rm_workspace\logger.log"/>
        <list key="log">
          <parameter key="accuracy" value="operator.CSVExampleSource.value.null"/>
        </list>
    </operator>
    <operator name="GnuplotWriter" class="GnuplotWriter">
        <parameter key="additional_parameters" value="set grid"/>
        <parameter key="name" value="ProcessLog"/>
        <parameter key="output_file" value="D:\Programs\Rapid-I\rm_workspace\log.gnu"/>
        <parameter key="values" value="accuracy"/>
        <parameter key="x_axis" value="accuracy"/>
    </operator>
</operator>



Any idea as to why this is happening?



Thanks,


Harry
Tagged:

Answers

  • hgwelechgwelec Member Posts: 31 Maven
    Ok found out what happened : The Model gets consumed (?) in the first operator of cross validation. However if i save the model first and then read it at the end of the process chain, the decision tree shows fine :


    Here is the setup :

    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="D:\MyDocuments\kvltrain.csv"/>
            <parameter key="label_name" value="zkvl"/>
        </operator>
        <operator name="FeatureNameFilter" class="FeatureNameFilter">
            <parameter key="skip_features_with_name" value="(Age|Profession)"/>
        </operator>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <parameter key="number_of_validations" value="3"/>
            <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                <operator name="DecisionTree" class="DecisionTree">
                    <parameter key="keep_example_set" value="true"/>
                </operator>
                <operator name="ModelWriter" class="ModelWriter">
                    <parameter key="model_file" value="D:\Programs\Rapid-I\rm_workspace\model.mod"/>
                </operator>
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="ClassificationPerformance" class="ClassificationPerformance">
                    <parameter key="absolute_error" value="true"/>
                    <parameter key="accuracy" value="true"/>
                    <list key="class_weights">
                    </list>
                    <parameter key="classification_error" value="true"/>
                    <parameter key="normalized_absolute_error" value="true"/>
                    <parameter key="root_mean_squared_error" value="true"/>
                    <parameter key="root_relative_squared_error" value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="ProcessLog" class="ProcessLog">
            <parameter key="filename" value="D:\Programs\Rapid-I\rm_workspace\logger.log"/>
            <list key="log">
              <parameter key="accuracy" value="operator.CSVExampleSource.value.null"/>
            </list>
        </operator>
        <operator name="GnuplotWriter" class="GnuplotWriter">
            <parameter key="additional_parameters" value="set grid"/>
            <parameter key="name" value="ProcessLog"/>
            <parameter key="output_file" value="D:\Programs\Rapid-I\rm_workspace\log.gnu"/>
            <parameter key="values" value="accuracy"/>
            <parameter key="x_axis" value="accuracy"/>
        </operator>
        <operator name="ModelLoader" class="ModelLoader">
            <parameter key="model_file" value="D:\Programs\Rapid-I\rm_workspace\model.mod"/>
        </operator>
    </operator>
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee-RapidMiner, Member Posts: 295 RM Product Management
    Hi,
    hgwelec wrote:

    Ok found out what happened : The Model gets consumed (?) in the first operator of cross validation. However if i save the model first and then read it at the end of the process chain, the decision tree shows fine :
    you are right in that a decision tree is shown but its probably not the decision tree you want to look at. The thing is, that the [tt]XValidation[/tt] is a kind of loop that repeatedly learns a model (by applying the [tt]DecisionTree[/tt] learner) on a portion of the data and tests its performance on the complementary portion of the data where the actual chosen portion differs from iteration to iteration. Hence, if you save the model inside the [tt]XValidation[/tt] operator you always save a model which is learned only on a portion of the data. Hence, if you want to learn the complete model in addition to the determination of the learning performance you may simply turn on the parameter [tt]learn_complete_model[/tt] in the parameters of the [tt]XValidation[/tt] operator which will then apply the learner once more on the complete set and finally output the resulting model. If you compare the resulting model to the model you wrote out during the cross validation, you will probably observe a difference between them.

    Regards,
    Tobias

  • hgwelechgwelec Member Posts: 31 Maven
    Hello Tobias,


    First, thanks for your reply. After some experimentation i found out about the learn_complete_model option, right after i sent my first reply in the spirit of "we share our knowledge with the community"  :)

    Your reply puts things in order....Thanks again!
Sign In or Register to comment.