The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Converting a nominal to binominal and setting a binominal target

amitdamitd Member, University Professor Posts: 49 Maven

I am encountering a seemingly trivial issue and would appreciate some pointers. I am analyzing the churn dataset (WA_Fn UseC_ Telco Customer Churn.csv) from the IBM sample datasets website. The sequence of the operators is set as follows:

Read CSV > Nominal to Binominal > Numerical to Binominal > Set Role > Split Validation (internally containing the model, apply, and performance operators). In each of the operators (Nominal to Binominal & Numerical to Binominal, "include special attributes" option is checked, although the label role is set later anyways.)

The Read CSV operator reads the Churn attribute as a polynominal. So, in the Nominal to Binominal operator, I selected it to be transformed into a binominal type along with a few other variables. The conversion works fine (tested with a breakpoint). However, the Set Role operator does NOT list it in the attributes dropdown and thus cannot be assigned to a label.

I also tried placing the Set Role operator prior to the type transformation operator but that does not work either. In that case, the Validation operator throws a warning (Input example set must have a special attribute label). Note that for the Nominal to Binominal & Numerical to Binominal operators, "include special attributes" option is checked. 

The pipeline works fine if I just proceed by keeping Churn as polynominal. However, my goal is to use the Performance (AUPRC) operator in the Operator Toolbox, which only works with a binominal label. 

I would appreciate any help.

0
0 votes

Fixed and Released · Last Updated

RM-3998

Comments

  • tftemmetftemme Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research

    Hi @amitdeokar,

     

    First, to give you the best advices please post the xml of your process. Without this it is hard to guess what the problem exactly is.

    You can get the xml of your process by adding the xml view to your RapidMiner Studio (Menu View->Show Panel->XML). 

     

    Your problem seems to be that your meta data information are at one point missing/incorrect. Without looking into the process I can suggest two solution.

     

    1. Split your process into two process. The first for reading the data, including the transformation operators. Use the Store operator to store the resulting ExampleSet in the repository. The second process can retrieve the data set from the repository and the meta data should be set correctly, so the Set Role operator knows the attributes.
    2. The list of the Set Role operator is only suggesting possible attributes (for which the meta data is known). If you are sure that the attribute is there you can type it in, although it is not in the list. You probably need to ignore the warning and run the process.

    Hopes this helps and happy mining

    Fabian

  • amitdamitd Member, University Professor Posts: 49 Maven
    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="9.0.002" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="/Users/amit_deokar/Dropbox (University of Massachusetts Lowell)/Teaching/MIST.4060 F18/DataFiles/Telco Customer Churn IBM/WA_Fn-UseC_-Telco-Customer-Churn.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="skip_comments" value="true"/>
    <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
    <list key="annotations"/>
    <parameter key="encoding" value="UTF-8"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="customerID.true.polynominal.attribute"/>
    <parameter key="1" value="gender.true.polynominal.attribute"/>
    <parameter key="2" value="SeniorCitizen.true.integer.attribute"/>
    <parameter key="3" value="Partner.true.polynominal.attribute"/>
    <parameter key="4" value="Dependents.true.polynominal.attribute"/>
    <parameter key="5" value="tenure.true.integer.attribute"/>
    <parameter key="6" value="PhoneService.true.polynominal.attribute"/>
    <parameter key="7" value="MultipleLines.true.polynominal.attribute"/>
    <parameter key="8" value="InternetService.true.polynominal.attribute"/>
    <parameter key="9" value="OnlineSecurity.true.polynominal.attribute"/>
    <parameter key="10" value="OnlineBackup.true.polynominal.attribute"/>
    <parameter key="11" value="DeviceProtection.true.polynominal.attribute"/>
    <parameter key="12" value="TechSupport.true.polynominal.attribute"/>
    <parameter key="13" value="StreamingTV.true.polynominal.attribute"/>
    <parameter key="14" value="StreamingMovies.true.polynominal.attribute"/>
    <parameter key="15" value="Contract.true.polynominal.attribute"/>
    <parameter key="16" value="PaperlessBilling.true.polynominal.attribute"/>
    <parameter key="17" value="PaymentMethod.true.polynominal.attribute"/>
    <parameter key="18" value="MonthlyCharges.true.real.attribute"/>
    <parameter key="19" value="TotalCharges.true.real.attribute"/>
    <parameter key="20" value="Churn.true.polynominal.attribute"/>
    </list>
    <parameter key="read_not_matching_values_as_missings" value="false"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="9.0.002" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
    <parameter key="attribute_name" value="Churn"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles">
    <parameter key="customerID" value="id"/>
    </list>
    </operator>
    <operator activated="true" class="nominal_to_binominal" compatibility="9.0.002" expanded="true" height="103" name="Nominal to Binominal" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value="Churn"/>
    <parameter key="attributes" value="Dependents|PaperlessBilling|Partner|PhoneService|gender|Churn"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="numerical_to_binominal" compatibility="9.0.002" expanded="true" height="82" name="Numerical to Binominal" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="SeniorCitizen"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="split_validation" compatibility="9.0.002" expanded="true" height="124" name="Validation" width="90" x="581" y="34">
    <parameter key="split_ratio" value="0.9"/>
    <parameter key="sampling_type" value="stratified sampling"/>
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.0.002" expanded="true" height="103" name="Decision Tree" width="90" x="112" y="34">
    <parameter key="maximal_depth" value="5"/>
    <parameter key="confidence" value="0.25"/>
    <parameter key="minimal_leaf_size" value="10"/>
    </operator>
    <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model"/>
    <connect from_op="Decision Tree" from_port="weights" to_port="through 1"/>
    <portSpacing port="source_training" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    <portSpacing port="sink_through 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="9.0.002" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_classification" compatibility="9.0.002" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
    <list key="class_weights"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="source_through 2" spacing="0"/>
    <portSpacing port="sink_averagable 1" spacing="0"/>
    <portSpacing port="sink_averagable 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
    <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
    <connect from_op="Numerical to Binominal" from_port="example set output" to_op="Validation" to_port="training"/>
    <connect from_op="Validation" from_port="model" to_port="result 1"/>
    <connect from_op="Validation" from_port="training" to_port="result 2"/>
    <connect from_op="Validation" from_port="averagable 1" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>
  • amitdamitd Member, University Professor Posts: 49 Maven

    1. I have posted the XML for the process in the case where "Set Role" is used prior to data type transformation.

    2. If I choose to put "Set Role" after the data type transformation, I can use a brute force approach by typing in the label attribute and make the assignment. It seems to work as you suggested. However, the warnings still persist. I don't know what is the reason for the warnings. Shouldn't the tool be able to handle this?

  • tftemmetftemme Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
    Solution Accepted

    Hi @amitdeokar,

     

    This is indeed a bug in the meta data propagation* (see below for a general explanation of meta data) of the operator. The problem is that the Read CSV does not know in advance the values which the Churn attribute can have. When you hover over the outputport of the Read CSV operator, you see that the range of the 'Churn' attribute is 'unknown' (indeed for all attributes, cause the operator does not know the range before reading).

    You can see that the Set Role operator does a correct meta data propagation (the role of the 'Churn' attribute is set to 'label'), but the Nominal to Binominal operator has a bug in the meta data propagation in case the values are unknown. You can see that the attribute is not anymore in the meta data at the output port of the operator.

     

    I file a bug report for this. For now, I would suggest to go for my second proposed solution. It is always a good idea to split reading and general preprocessing from the actual analysis. You don't need to read everytime your input data from disk. The meta data available are way more precise (cause RM stores also more meta data about ExampleSets including for example the values of a nominal attribute). You have a better structure in your project and so on.

     

    Hopes this explain the problem

    Fabian

     

    *Meta data are all information which are available to RapidMiner without actually running the process. You can see this by hovering over the ports. Also this meta data is used in the parameters to provide for example list of attributes and similar options. As it is not always possible to know in advance all necessary meta data, only warnings are displayed if for example an attribute is missing in the meta data. The process can be run never the less (what you mean with brute force). 

  • amitdamitd Member, University Professor Posts: 49 Maven

    Thank you much for clarifying this for me. I have a related issue from this process, but it's on a new topic, so I'll post it separately.

Sign In or Register to comment.