Converting a nominal to binominal and setting a binominal target
I am encountering a seemingly trivial issue and would appreciate some pointers. I am analyzing the churn dataset (WA_Fn UseC_ Telco Customer Churn.csv) from the IBM sample datasets website. The sequence of the operators is set as follows:
Read CSV > Nominal to Binominal > Numerical to Binominal > Set Role > Split Validation (internally containing the model, apply, and performance operators). In each of the operators (Nominal to Binominal & Numerical to Binominal, "include special attributes" option is checked, although the label role is set later anyways.)
The Read CSV operator reads the Churn attribute as a polynominal. So, in the Nominal to Binominal operator, I selected it to be transformed into a binominal type along with a few other variables. The conversion works fine (tested with a breakpoint). However, the Set Role operator does NOT list it in the attributes dropdown and thus cannot be assigned to a label.
I also tried placing the Set Role operator prior to the type transformation operator but that does not work either. In that case, the Validation operator throws a warning (Input example set must have a special attribute label). Note that for the Nominal to Binominal & Numerical to Binominal operators, "include special attributes" option is checked.
The pipeline works fine if I just proceed by keeping Churn as polynominal. However, my goal is to use the Performance (AUPRC) operator in the Operator Toolbox, which only works with a binominal label.
I would appreciate any help.
Comments
Hi @amitdeokar,
First, to give you the best advices please post the xml of your process. Without this it is hard to guess what the problem exactly is.
You can get the xml of your process by adding the xml view to your RapidMiner Studio (Menu View->Show Panel->XML).
Your problem seems to be that your meta data information are at one point missing/incorrect. Without looking into the process I can suggest two solution.
Hopes this helps and happy mining
Fabian
1. I have posted the XML for the process in the case where "Set Role" is used prior to data type transformation.
2. If I choose to put "Set Role" after the data type transformation, I can use a brute force approach by typing in the label attribute and make the assignment. It seems to work as you suggested. However, the warnings still persist. I don't know what is the reason for the warnings. Shouldn't the tool be able to handle this?
Hi @amitdeokar,
This is indeed a bug in the meta data propagation* (see below for a general explanation of meta data) of the operator. The problem is that the Read CSV does not know in advance the values which the Churn attribute can have. When you hover over the outputport of the Read CSV operator, you see that the range of the 'Churn' attribute is 'unknown' (indeed for all attributes, cause the operator does not know the range before reading).
You can see that the Set Role operator does a correct meta data propagation (the role of the 'Churn' attribute is set to 'label'), but the Nominal to Binominal operator has a bug in the meta data propagation in case the values are unknown. You can see that the attribute is not anymore in the meta data at the output port of the operator.
I file a bug report for this. For now, I would suggest to go for my second proposed solution. It is always a good idea to split reading and general preprocessing from the actual analysis. You don't need to read everytime your input data from disk. The meta data available are way more precise (cause RM stores also more meta data about ExampleSets including for example the values of a nominal attribute). You have a better structure in your project and so on.
Hopes this explain the problem
Fabian
*Meta data are all information which are available to RapidMiner without actually running the process. You can see this by hovering over the ports. Also this meta data is used in the parameters to provide for example list of attributes and similar options. As it is not always possible to know in advance all necessary meta data, only warnings are displayed if for example an attribute is missing in the meta data. The process can be run never the less (what you mean with brute force).
Thank you much for clarifying this for me. I have a related issue from this process, but it's on a new topic, so I'll post it separately.