Performance Classifier for Naive Bayes and Decision Tree -- getting an error
I have built a Naives Bayes and a Decision Tree model and have one column = label in the training data so I can predict the outcome -- connected the perofmrnace classification operator and keep getting an error that says InputSet does not have a label attribute. I set the column to label using the Set Role operator. What classifier should I be using -- or what do I need to do to the data?
Best Answer
-
rfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
Hi @melissa_heinric,
Ok, let's start from general to particular: when you measure performance, you basically want to know how many times your trained algorithm was able to find the truth. For this, you need labeled data as input for both the Decision Tree and the Apply Model operators, as the Performance operator you are using just reads two columns: one with a label attribute and another with a prediction attribute. The thing is that you are not passing labeled data to the Apply Model operator, it is telling you that it cannot measure performance.
Few days ago I wrote an answer on how to perform Split Validation, Cross Validation and the kind of validation you are trying to do, which I call DIY Validation. I believe that the entire thread is a good source of information for you. Since you are learning, you might want to experiment with both the Split Validation and the Cross Validation operators to know what is the difference. Beware that these are super-operators, that can contain operators inside. Here is your process with Split Validation:
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV" width="90" x="112" y="238">
<parameter key="csv_file" value="\\itfs1.asu.edu\EM\USERS\msbooth\Grad Certificate\HED 606\Kaggle\Students-Testing-unlabeled.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="ROWID.true.integer.attribute"/>
<parameter key="1" value="CREDITHOURS.true.integer.attribute"/>
<parameter key="2" value="REGDAYSBEFORE.true.integer.attribute"/>
<parameter key="3" value="MODALITY.true.integer.attribute"/>
<parameter key="4" value="SEMESTER.true.polynominal.attribute"/>
<parameter key="5" value="COURSETYPE.true.polynominal.attribute"/>
<parameter key="6" value="COURSELENGTH.true.integer.attribute"/>
<parameter key="7" value="DEPARTMENT.true.polynominal.attribute"/>
<parameter key="8" value="TOTALENROLLMENT.true.integer.attribute"/>
<parameter key="9" value="RESIDENCY.true.integer.attribute"/>
<parameter key="10" value="ACADEMICLEVEL.true.polynominal.attribute"/>
<parameter key="11" value="CUMULATIVEGPA.true.real.attribute"/>
<parameter key="12" value="AGEATCOURSESTART.true.integer.attribute"/>
<parameter key="13" value="ETHNICITY.true.polynominal.attribute"/>
<parameter key="14" value="CITIZENSHIP.true.integer.attribute"/>
<parameter key="15" value="MARITALSTATUS.true.polynominal.attribute"/>
<parameter key="16" value="GENDER.true.integer.attribute"/>
<parameter key="17" value="WORKAUTHORIZATION.true.integer.attribute"/>
<parameter key="18" value="HIGHESTEDUCATION.true.polynominal.attribute"/>
<parameter key="19" value="SEMESTERCREDITS.true.integer.attribute"/>
<parameter key="20" value="SEMESTERCOURSES.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role (2)" width="90" x="246" y="238">
<parameter key="attribute_name" value="ROWID"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV (2)" width="90" x="112" y="34">
<parameter key="csv_file" value="\\itfs1.asu.edu\EM\USERS\msbooth\Grad Certificate\HED 606\Kaggle\Non Numberic Of Students_Training - Copy.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="ROWID.true.integer.attribute"/>
<parameter key="1" value="CREDITHOURS.true.integer.attribute"/>
<parameter key="2" value="REGDAYSBEFORE.true.integer.attribute"/>
<parameter key="3" value="MODALITY.true.integer.attribute"/>
<parameter key="4" value="SEMESTER.true.polynominal.attribute"/>
<parameter key="5" value="COURSETYPE.true.polynominal.attribute"/>
<parameter key="6" value="COURSELENGTH.true.integer.attribute"/>
<parameter key="7" value="DEPARTMENT.true.polynominal.attribute"/>
<parameter key="8" value="TOTALENROLLMENT.true.integer.attribute"/>
<parameter key="9" value="RESIDENCY.true.integer.attribute"/>
<parameter key="10" value="ACADEMICLEVEL.true.polynominal.attribute"/>
<parameter key="11" value="CUMULATIVEGPA.true.real.attribute"/>
<parameter key="12" value="AGEATCOURSESTART.true.integer.attribute"/>
<parameter key="13" value="ETHNICITY.true.polynominal.attribute"/>
<parameter key="14" value="CITIZENSHIP.true.integer.attribute"/>
<parameter key="15" value="MARITALSTATUS.true.polynominal.attribute"/>
<parameter key="16" value="GENDER.true.integer.attribute"/>
<parameter key="17" value="WORKAUTHORIZATION.true.integer.attribute"/>
<parameter key="18" value="HIGHESTEDUCATION.true.polynominal.attribute"/>
<parameter key="19" value="SEMESTERCREDITS.true.integer.attribute"/>
<parameter key="20" value="SEMESTERCOURSES.true.integer.attribute"/>
<parameter key="21" value="CORBETTERYOrN.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
<parameter key="attribute_name" value="CORBETTERYOrN"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="split_validation" compatibility="8.2.000" expanded="true" height="124" name="Validation" width="90" x="380" y="34">
<process expanded="true">
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="45" y="34"/>
<connect from_port="training" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="112" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="514" y="238">
<list key="application_parameters"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Read CSV (2)" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>If you don't have a lot of data (e.g. dozens of Mb), I recommend you to use Cross Validation. Beware that it's not easy on the amount of RAM it consumes.
Another small issue: check your Set Role operator connected to the Decision Tree. It assigns a label first on the Parameters view and then does it again inside the list. Remove the one in the list, and everything will be fine.
Hope it helps,
2
Answers
Hello, Melissa:
Do you mind to share your XML process with us? That way we can see what is not working. If you need help with sharing XML processes, please read this article.
All the best,
Thanks! Please see below -- will this work?
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV" width="90" x="112" y="187">
<parameter key="csv_file" value="\\itfs1.asu.edu\EM\USERS\msbooth\Grad Certificate\HED 606\Kaggle\Students-Testing-unlabeled.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="ROWID.true.integer.attribute"/>
<parameter key="1" value="CREDITHOURS.true.integer.attribute"/>
<parameter key="2" value="REGDAYSBEFORE.true.integer.attribute"/>
<parameter key="3" value="MODALITY.true.integer.attribute"/>
<parameter key="4" value="SEMESTER.true.polynominal.attribute"/>
<parameter key="5" value="COURSETYPE.true.polynominal.attribute"/>
<parameter key="6" value="COURSELENGTH.true.integer.attribute"/>
<parameter key="7" value="DEPARTMENT.true.polynominal.attribute"/>
<parameter key="8" value="TOTALENROLLMENT.true.integer.attribute"/>
<parameter key="9" value="RESIDENCY.true.integer.attribute"/>
<parameter key="10" value="ACADEMICLEVEL.true.polynominal.attribute"/>
<parameter key="11" value="CUMULATIVEGPA.true.real.attribute"/>
<parameter key="12" value="AGEATCOURSESTART.true.integer.attribute"/>
<parameter key="13" value="ETHNICITY.true.polynominal.attribute"/>
<parameter key="14" value="CITIZENSHIP.true.integer.attribute"/>
<parameter key="15" value="MARITALSTATUS.true.polynominal.attribute"/>
<parameter key="16" value="GENDER.true.integer.attribute"/>
<parameter key="17" value="WORKAUTHORIZATION.true.integer.attribute"/>
<parameter key="18" value="HIGHESTEDUCATION.true.polynominal.attribute"/>
<parameter key="19" value="SEMESTERCREDITS.true.integer.attribute"/>
<parameter key="20" value="SEMESTERCOURSES.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role (2)" width="90" x="313" y="187">
<parameter key="attribute_name" value="ROWID"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV (2)" width="90" x="112" y="34">
<parameter key="csv_file" value="\\itfs1.asu.edu\EM\USERS\msbooth\Grad Certificate\HED 606\Kaggle\Non Numberic Of Students_Training - Copy.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="ROWID.true.integer.attribute"/>
<parameter key="1" value="CREDITHOURS.true.integer.attribute"/>
<parameter key="2" value="REGDAYSBEFORE.true.integer.attribute"/>
<parameter key="3" value="MODALITY.true.integer.attribute"/>
<parameter key="4" value="SEMESTER.true.polynominal.attribute"/>
<parameter key="5" value="COURSETYPE.true.polynominal.attribute"/>
<parameter key="6" value="COURSELENGTH.true.integer.attribute"/>
<parameter key="7" value="DEPARTMENT.true.polynominal.attribute"/>
<parameter key="8" value="TOTALENROLLMENT.true.integer.attribute"/>
<parameter key="9" value="RESIDENCY.true.integer.attribute"/>
<parameter key="10" value="ACADEMICLEVEL.true.polynominal.attribute"/>
<parameter key="11" value="CUMULATIVEGPA.true.real.attribute"/>
<parameter key="12" value="AGEATCOURSESTART.true.integer.attribute"/>
<parameter key="13" value="ETHNICITY.true.polynominal.attribute"/>
<parameter key="14" value="CITIZENSHIP.true.integer.attribute"/>
<parameter key="15" value="MARITALSTATUS.true.polynominal.attribute"/>
<parameter key="16" value="GENDER.true.integer.attribute"/>
<parameter key="17" value="WORKAUTHORIZATION.true.integer.attribute"/>
<parameter key="18" value="HIGHESTEDUCATION.true.polynominal.attribute"/>
<parameter key="19" value="SEMESTERCREDITS.true.integer.attribute"/>
<parameter key="20" value="SEMESTERCOURSES.true.integer.attribute"/>
<parameter key="21" value="CORBETTERYOrN.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
<parameter key="attribute_name" value="CORBETTERYOrN"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles">
<parameter key="CORBETTERYOrN" value="label"/>
<parameter key="ROWID" value="id"/>
</list>
</operator>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="380" y="34">
<parameter key="criterion" value="information_gain"/>
</operator>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="581" y="136">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="648" y="34">
<list key="class_weights"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Read CSV (2)" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_port="result 2"/>
<connect from_op="Performance" from_port="performance" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Also getting the same error when I try to use the Deep Learning operator -- so must be something wrong input?
This helped alot -- I was going about it completley wrong. Had learned about cross and split validation a few weeks ago in the course I'm taking but hadn't put the pieces together moving forward to applying the models.
Thanks!