Label the class
If target classes are more then one and want to label more then one class then what should i do?? there were four target variables. Hinselmann: target variable Schiller: target variable Cytology: target variable Biopsy: target variable Help me please.
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi @talhamahboob95,
First,if you are new to RapidMiner, you can begin learning this software by seeing the training videos.
You have 4 target variables, so you need to create four modelling/validation (sub)processes.
Consider the target variable "Biopsy" :
When we apply basically a Decision Tree to your data, we have an accuracy of 93 % but a Recall (for the class Biopsy = 1) = 0 %.
It is often the case when we have unbalanced data (which is the case of your data) :
I suppose in your case that you want to predict with the greatest precision Biopsy = 1, then you can sample your data.
After preprocessing your data like that, the global accuracy is less than the first case (53 %) but the Recall (for the class Biopsy = 1)
is significantly better (33%).
You can, of course, play with the parameters of operators of this process to improve its performances
You can now apply this methodology to the 3 other target variables.
I hope it helps,
Regards,
Lionel
NB : The process :
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.000-BETA" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="9.0.000-BETA" expanded="true" height="68" name="Read CSV" width="90" x="45" y="85">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Multiples_modelling_cancer\risk_factors_cervical_cancer.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="skip_comments" value="true"/>
<parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
<list key="annotations"/>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Age.true.integer.attribute"/>
<parameter key="1" value="Number of sexual partners.true.polynominal.attribute"/>
<parameter key="2" value="First sexual intercourse.true.polynominal.attribute"/>
<parameter key="3" value="Num of pregnancies.true.polynominal.attribute"/>
<parameter key="4" value="Smokes.true.polynominal.attribute"/>
<parameter key="5" value="Smokes (years).true.polynominal.attribute"/>
<parameter key="6" value="Smokes (packs/year).true.polynominal.attribute"/>
<parameter key="7" value="Hormonal Contraceptives.true.polynominal.attribute"/>
<parameter key="8" value="Hormonal Contraceptives (years).true.polynominal.attribute"/>
<parameter key="9" value="IUD.true.polynominal.attribute"/>
<parameter key="10" value="IUD (years).true.polynominal.attribute"/>
<parameter key="11" value="STDs.true.polynominal.attribute"/>
<parameter key="12" value="STDs (number).true.polynominal.attribute"/>
<parameter key="13" value="STDs:condylomatosis.true.polynominal.attribute"/>
<parameter key="14" value="STDs:cervical condylomatosis.true.polynominal.attribute"/>
<parameter key="15" value="STDs:vaginal condylomatosis.true.polynominal.attribute"/>
<parameter key="16" value="STDs:vulvo-perineal condylomatosis.true.polynominal.attribute"/>
<parameter key="17" value="STDs:syphilis.true.polynominal.attribute"/>
<parameter key="18" value="STDs:pelvic inflammatory disease.true.polynominal.attribute"/>
<parameter key="19" value="STDs:genital herpes.true.polynominal.attribute"/>
<parameter key="20" value="STDs:molluscum contagiosum.true.polynominal.attribute"/>
<parameter key="21" value="STDs:AIDS.true.polynominal.attribute"/>
<parameter key="22" value="STDs:HIV.true.polynominal.attribute"/>
<parameter key="23" value="STDs:Hepatitis B.true.polynominal.attribute"/>
<parameter key="24" value="STDs:HPV.true.polynominal.attribute"/>
<parameter key="25" value="STDs: Number of diagnosis.true.integer.attribute"/>
<parameter key="26" value="STDs: Time since first diagnosis.true.polynominal.attribute"/>
<parameter key="27" value="STDs: Time since last diagnosis.true.polynominal.attribute"/>
<parameter key="28" value="Dx:Cancer.true.integer.attribute"/>
<parameter key="29" value="Dx:CIN.true.integer.attribute"/>
<parameter key="30" value="Dx:HPV.true.integer.attribute"/>
<parameter key="31" value="Dx.true.integer.attribute"/>
<parameter key="32" value="Hinselmann.true.integer.attribute"/>
<parameter key="33" value="Schiller.true.integer.attribute"/>
<parameter key="34" value="Citology.true.integer.attribute"/>
<parameter key="35" value="Biopsy.true.integer.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="false"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.0.000-BETA" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="85">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Citology|Hinselmann|Schiller"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="set_role" compatibility="9.0.000-BETA" expanded="true" height="82" name="Set Role" width="90" x="313" y="85">
<parameter key="attribute_name" value="Biopsy"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="numerical_to_polynominal" compatibility="9.0.000-BETA" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="447" y="85">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Biopsy"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="sample" compatibility="9.0.000-BETA" expanded="true" height="82" name="Sample" width="90" x="581" y="85">
<parameter key="balance_data" value="true"/>
<parameter key="sample_size" value="-1"/>
<list key="sample_size_per_class">
<parameter key="0" value="55"/>
<parameter key="1" value="55"/>
</list>
<list key="sample_ratio_per_class"/>
<list key="sample_probability_per_class"/>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="9.0.000-BETA" expanded="true" height="145" name="Cross Validation" width="90" x="715" y="85">
<process expanded="true">
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.0.000-BETA" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34"/>
<connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="9.0.000-BETA" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="9.0.000-BETA" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
<parameter key="weighted_mean_recall" value="true"/>
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
<connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Sample" to_port="example set input"/>
<connect from_op="Sample" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
<connect from_op="Cross Validation" from_port="model" to_port="result 2"/>
<connect from_op="Cross Validation" from_port="example set" to_port="result 1"/>
<connect from_op="Cross Validation" from_port="test result set" to_port="result 3"/>
<connect from_op="Cross Validation" from_port="performance 1" to_port="result 4"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
</process>1
Answers
Hi @talhamahboob95,
I have difficulties to understand your problem : Can you explain more precisely what you want to do.
The best is to give an example of what you have and what you want to obtain.
If you have one, can you share your process and your dataset ?
Regards,
Lionel
I want to label last four attributes and i am also sending you dataset.
Hallo Lionel,
Thanks for your very clear explanation. I have reproduced your XML file and it works fine.
I have a question; you write: ''You can now apply this methodology to the 3 other target variables." It means that the user should change in the Set Roles Operator, each time, a new attribute name: Hinselmann, Schiller, Cytology?
Maerkli
Hi @Maerkli,
Exactly, you are totally right.
Regards,
Lionel