K-Means cluster validation performance at zero percent??? What am I doing wrong?

shredlegend88 · October 2016

I finally was able to produce a confusion/classification matrix for evaluating my model, however, it is showing a zero percent predictive accuracy and I cannot figure out why.

My process:

Cleaned data

Chose variables based on correlation to target variable (bankruptcy)

Normalized all variables

Dropped in X-Validation Operator

-Contains K-means (k=2) cluster model in training section

-contains apply model and Performance (Classification) operator with "accuracy" as the main criterion

The variable Bankrupcty is marked as nominal Prediction.

Accuracy is zero!!!

This assignment is due tonight and so far I am not able to evaluate my model performance.

I have attached my data and process

Thomas_Ott · November 2016

A few things you are attempting to Cluster which is a non-supervised method. Cross Validation is used for supervised training, where you need a training label.

So I'm not sure what you want to do? Cluster or do supervised training?

If you want to do supervised training with a Cross Validation operator, you shouldn't use the Clustering algorithm. Since the values in Bankruptcy are numerical, you could use a Linear Regression algoritm to evaluate. You will need a Set Role operator to set the Brankruptcy column as a label.

If you just want to just run cluster analysus you don't need a Set Role operator and you can just run it without the label.

<?xml version="1.0" encoding="UTF-8"?><process version="7.2.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.2.003" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.2.003" expanded="true" height="68" name="Retrieve cf_spending_bankruptcy (2)" width="90" x="45" y="34">
        <parameter key="repository_entry" value="../data/cf_spending_bankruptcy"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="7.2.003" expanded="true" height="82" name="Data Cleanse" width="90" x="179" y="34">
        <process expanded="true">
          <operator activated="true" class="replace_missing_values" compatibility="7.2.003" expanded="true" height="103" name="Replace w/ AVG" width="90" x="45" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Salary|PerTransportation|PerHousing|PerHealth|PerGrocery|PerEntertainment|PerApparel|No_Dep|Education|Credit Cards|Age"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="7.2.003" expanded="true" height="103" name="Replace w/Zero" width="90" x="179" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Marital|Gender"/>
            <parameter key="default" value="zero"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="7.2.003" expanded="true" height="103" name="Remove Missing IV" width="90" x="313" y="34">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="Bankruptcy.is_not_missing."/>
            </list>
          </operator>
          <connect from_port="in 1" to_op="Replace w/ AVG" to_port="example set input"/>
          <connect from_op="Replace w/ AVG" from_port="example set output" to_op="Replace w/Zero" to_port="example set input"/>
          <connect from_op="Replace w/Zero" from_port="example set output" to_op="Remove Missing IV" to_port="example set input"/>
          <connect from_op="Remove Missing IV" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.2.003" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Age|Credit Cards|Education|Gender|PerApparel|PerGrocery|Salary|Bankruptcy|cluster"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="7.2.003" expanded="true" height="103" name="Normalize" width="90" x="447" y="34"/>
      <operator activated="true" class="multiply" compatibility="7.2.003" expanded="true" height="103" name="Multiply" width="90" x="380" y="187"/>
      <operator activated="true" class="sample" compatibility="7.2.003" expanded="true" height="82" name="Sample" width="90" x="514" y="238">
        <list key="sample_size_per_class"/>
        <list key="sample_ratio_per_class"/>
        <list key="sample_probability_per_class"/>
      </operator>
      <operator activated="true" class="x_means" compatibility="7.2.003" expanded="true" height="82" name="X-Means" width="90" x="648" y="187"/>
      <operator activated="true" class="set_role" compatibility="7.2.003" expanded="true" height="82" name="Set Role" width="90" x="648" y="34">
        <parameter key="attribute_name" value="Bankruptcy"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="7.2.003" expanded="true" height="124" name="Validation" width="90" x="782" y="34">
        <process expanded="true">
          <operator activated="true" class="linear_regression" compatibility="7.2.003" expanded="true" height="103" name="Linear Regression" width="90" x="228" y="34"/>
          <connect from_port="training" to_op="Linear Regression" to_port="training set"/>
          <connect from_op="Linear Regression" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.2.003" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_regression" compatibility="7.2.003" expanded="true" height="82" name="Performance" width="90" x="296" y="34"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="data_to_similarity" compatibility="7.2.003" expanded="true" height="82" name="Data to Similarity" width="90" x="782" y="289"/>
      <operator activated="true" class="cluster_density_performance" compatibility="7.2.003" expanded="true" height="124" name="Performance (2)" width="90" x="916" y="187"/>
      <connect from_op="Retrieve cf_spending_bankruptcy (2)" from_port="output" to_op="Data Cleanse" to_port="in 1"/>
      <connect from_op="Data Cleanse" from_port="out 1" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Sample" to_port="example set input"/>
      <connect from_op="Sample" from_port="example set output" to_op="X-Means" to_port="example set"/>
      <connect from_op="X-Means" from_port="cluster model" to_op="Performance (2)" to_port="cluster model"/>
      <connect from_op="X-Means" from_port="clustered set" to_op="Data to Similarity" to_port="example set"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="model" to_port="result 1"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
      <connect from_op="Data to Similarity" from_port="similarity" to_op="Performance (2)" to_port="distance measure"/>
      <connect from_op="Data to Similarity" from_port="example set" to_op="Performance (2)" to_port="example set"/>
      <connect from_op="Performance (2)" from_port="example set" to_port="result 4"/>
      <connect from_op="Performance (2)" from_port="performance vector" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

Thomas_Ott · November 2016

And the Samples directory has a cluter model being converted to a classification model. Maybe that's what you;re trying to do?

Check out //Samples/07_Clustering/06_ClusterClassificationWithEvaluation.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

K-Means cluster validation performance at zero percent??? What am I doing wrong?

Answers