how to connect between Set Role operator and Apply Model operator

m_gholami1991 · August 2018

hi
I have two questions. I would appreciate if you would guide me

1-I have a dataset with 5000 samples that do not have labels. On the other hand, I have another dataset with 100 samples labeled and the samples are not in the 5000 dataset. Is it okay to remove the label of 100 samples and cluster with clustering algorithms and after clustering, add the label to 100 samples and see how many algorithms are clustered correctly. And then, if clustering accuracy increased, We cluster 5000 samples with the same algorithm?

2- I run the scenario for my first question in the RapidMiner, but I do not know how to create connection between two operators. Does anyone know how to connect Set Role operator and Apply Model operator together? I will send you the related file and I hope you help me.
َAlso the dataset is available at the below link:

https://drive.google.com/drive/folders/1t2qEnc7K35IHKfDVvG2dqEHZ_lNHZBis

David_A · August 2018

Hi,

Regarding your first question:

Even if your approach is theoretical valid, there is no need to remove the label and do a clustering approach. If you set the role of your label to "label" it will be ignored by the clustering algorithm.

But if you have said label, why not use a supbervised learning algorithm to directly train a model that can predict this label.

This you then can apply on your second data set (where you don't have the label). Only potential issue I see there is, that the training size

To connect the two operators, you simply need to left-click on one of the ports you want to connect and then move over to other port and click again (check this tutorial video for an example: https://youtu.be/ophGqpUexKI?t=2m14s)

Best,
David

m_gholami1991 · August 2018

hi

About the first solution you said: If you give an unlabeled dataset to a supervised learning algorithms like Decision Tree, in the input of the algorithm, you must specify the label column. Thus, for the 5000 unlabeled samples, it is not possible to use supervised algorithms.I want to get the precision of 170 labeled samples with a clustering algorithm like K-means, and then, based on earn the high percentage accuracy, do clustering on 5000 samples with the same algorithm.

About the second solution you said: as you see, the input of Apply model operator needs a model, and when i connect exa port of Set role operator to mod port of Apply model , the error shown. i need both operator but i connect connect them.

Best Regard,

Mina

lionelderkrikor · August 2018

Hi @m_gholami1991, Hi @David_A,

Sorry, @m_gholami1991, I come with questions and not answers :

I played with your data and builded a "classic process" with a Decision Tree.

The builded model is the following :

or in an other form :

If I good understand, the model is not able to predict (label = One) ?, however :

When the model is applied to the Training set (output of a Cross Validation), (label = One) is predicted in some cases by the model ... :

an other case which is not intuitive for me is the following :

Depending on the model, (label = Two) is predicted only if Marital > 2,5, however there are cases where

(label = Two) is predicted with Marital <= 2,5 (Marital = 2) with a confidence = 1 ... :

Can you enlighten me on these cases, which are not intuitive for me ?

Thanks you for your answers,

Regards,

Lionel

NB : The process :

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_excel" compatibility="9.0.001" expanded="true" height="68" name="Read Excel" width="90" x="179" y="85">
        <parameter key="excel_file" value="C:\Users\Lionel\Downloads\Dataset.xlsx"/>
        <list key="annotations"/>
        <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="NID.true.integer.attribute"/>
          <parameter key="1" value="LiveInCity.true.integer.attribute"/>
          <parameter key="2" value="Age.true.integer.attribute"/>
          <parameter key="3" value="Marital.true.integer.attribute"/>
          <parameter key="4" value="Sex.true.integer.attribute"/>
          <parameter key="5" value="Edu.true.integer.attribute"/>
          <parameter key="6" value="Priority.true.polynominal.attribute"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.0.001" expanded="true" height="82" name="Set Role" width="90" x="313" y="85">
        <parameter key="attribute_name" value="Priority"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles">
          <parameter key="NID" value="id"/>
        </list>
      </operator>
      <operator activated="true" class="multiply" compatibility="9.0.001" expanded="true" height="145" name="Multiply" width="90" x="447" y="85"/>
      <operator activated="true" class="concurrency:cross_validation" compatibility="9.0.001" expanded="true" height="145" name="Cross Validation" width="90" x="447" y="340">
        <process expanded="true">
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.0.001" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34"/>
          <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="9.0.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="9.0.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
            <list key="class_weights"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="9.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="238">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Sex|Marital|LiveInCity|Edu|Age"/>
      </operator>
      <operator activated="true" class="concurrency:k_means" compatibility="9.0.001" expanded="true" height="82" name="Clustering" width="90" x="715" y="238">
        <parameter key="k" value="3"/>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_port="result 1"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 3" to_port="result 4"/>
      <connect from_op="Multiply" from_port="output 4" to_op="Cross Validation" to_port="example set"/>
      <connect from_op="Cross Validation" from_port="model" to_port="result 5"/>
      <connect from_op="Cross Validation" from_port="test result set" to_port="result 6"/>
      <connect from_op="Cross Validation" from_port="performance 1" to_port="result 7"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_port="result 2"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
      <portSpacing port="sink_result 6" spacing="0"/>
      <portSpacing port="sink_result 7" spacing="0"/>
      <portSpacing port="sink_result 8" spacing="0"/>
    </process>
  </operator>
</process>

m_gholami1991 · August 2018

hi @lionelderkrikor

thanks of your attention. but you know, this dataset is sample.

Please pay attention to the picture, I want to explain different steps.

Step1: 100 data labeled input (Label column has been deleted) and after normalize, based on the number of specified attributes (select by weight operator), clustering is performed.

Step2: The four evaluation criteria apply to each feature. And finally, the features are ranked according to their importance.

Step3: after clustering finished, as you know. A new column is added to the features column which shows each sample in which cluster is located. After that, with Map Operator We can specify a match between the names of the clusters and the priorities. (The priorities are the same labels that were already given to the samples.) After that, We can use a tree to model the output. (Many tell me that at this stage there is no need for a decision tree at all and its use is wrong.)

Step4: 100 data with label entered and with the help of the Apply Model Operator, labeled samples applied to decision tree and compare the percentage accuracy between the label column and the clustering results. and finally, final accuracy is determined by Performance Operator.

My question related to Step3. Is the using of decision tree wrong? And if the connection is wrong, which operator should be used?

lionelderkrikor · August 2018

Hi @m_gholami1991,

I have a question : how do you establish the correspondance between the clusters results (cluster_0, cluster_1, cluster_2) and the label values (priority = One /Two/Three) ?

To answer to your question : A priori I don't know if "the using of Decision Tree is wrong". I recommend you to follow the "classic methodology", that is to say, to perform a Cross Validation with some models and to select the most performant...

Regards,

Lionel

lionelderkrikor · August 2018

Hi again @m_gholami1991,

OK, after reading again your process, I understood the "philosophy" of your process and what you want to perform (excuse me but, here in France, it's late in the evening and I am less efficient...).

Indeed, you want to compare your clustering results to your labelled data, isn't it ? So, no need of Decision Tree.

So you can inspire of this sample process :

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_excel" compatibility="9.0.001" expanded="true" height="68" name="Read Excel" width="90" x="179" y="85">
        <parameter key="excel_file" value="C:\Users\Lionel\Downloads\Dataset.xlsx"/>
        <list key="annotations"/>
        <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="NID.true.integer.attribute"/>
          <parameter key="1" value="LiveInCity.true.integer.attribute"/>
          <parameter key="2" value="Age.true.integer.attribute"/>
          <parameter key="3" value="Marital.true.integer.attribute"/>
          <parameter key="4" value="Sex.true.integer.attribute"/>
          <parameter key="5" value="Edu.true.integer.attribute"/>
          <parameter key="6" value="Priority.true.polynominal.attribute"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.0.001" expanded="true" height="82" name="Set Role" width="90" x="313" y="85">
        <parameter key="attribute_name" value="Priority"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles">
          <parameter key="NID" value="id"/>
        </list>
      </operator>
      <operator activated="true" class="multiply" compatibility="9.0.001" expanded="true" height="145" name="Multiply" width="90" x="447" y="85"/>
      <operator activated="false" class="concurrency:cross_validation" compatibility="9.0.001" expanded="true" height="145" name="Cross Validation" width="90" x="447" y="340">
        <process expanded="true">
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.0.001" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34"/>
          <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="9.0.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="9.0.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
            <list key="class_weights"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="9.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="238">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Sex|Marital|LiveInCity|Edu|Age"/>
      </operator>
      <operator activated="true" class="concurrency:k_means" compatibility="9.0.001" expanded="true" height="82" name="Clustering" width="90" x="715" y="238">
        <parameter key="k" value="3"/>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_port="result 1"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 3" to_port="result 4"/>
      <connect from_op="Multiply" from_port="output 4" to_port="result 5"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_port="result 2"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
      <portSpacing port="sink_result 6" spacing="0"/>
    </process>
  </operator>
</process>

However i would try to establish "manually" the correlations between the clustering Results (cluster_0, cluster_1, cluster_2) and your labelled data (Priority = One/Two/Three) at the final step. (if these correlations exist).

NB : For example, with your sample data, the correlations are not obvious...

1. Labelled data :

2. Clustering results :

I hope it helps,

Regards,

Lionel

m_gholami1991 · August 2018

hi @lionelderkrikor

yessss, You know exactly what I mean

I copied the code you provided and I saw the process. You marked the priority column in the first Read Operator.

But you know, I think it is better not to mark this column for the first Read Operator, because the clustering algorithm may consider this column for clustering. For the reason I mentioned above, in step 4, I re-entered the dataset and selected this column there.

On the other hand, if you run my XML file and select this column in the first Read Operator and disable Operators(Stap3: Set Role and Decision Tree | Stap4: TrainData_WithLabel ( Read Operator) and Normalize and Apply Model), An error will appear in the Performance operator stage, which "Input ExampleSet does not have a label".

m_gholami1991 · August 2018

hi

Is there anyone to help me? I really need your help. My thesis presentation is very close. Please....

lionelderkrikor · August 2018

HI @m_gholami1991,

Here a working process with the Decision Tree model :

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="false" class="write_excel" compatibility="9.0.001" expanded="true" height="82" name="Write Excel (4)" width="90" x="1318" y="238">
        <parameter key="excel_file" value="C:\Users\MyCurisityLaptop\Documents\3.xlsx"/>
      </operator>
      <operator activated="true" class="read_excel" compatibility="9.0.001" expanded="true" height="68" name="Training_data_without_label" width="90" x="112" y="34">
        <parameter key="excel_file" value="C:\Users\Lionel\Downloads\Dataset.xlsx"/>
        <list key="annotations"/>
        <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="NID.true.integer.attribute"/>
          <parameter key="1" value="LiveInCity.true.integer.attribute"/>
          <parameter key="2" value="Age.true.integer.attribute"/>
          <parameter key="3" value="Marital.true.integer.attribute"/>
          <parameter key="4" value="Sex.true.integer.attribute"/>
          <parameter key="5" value="Edu.true.integer.attribute"/>
          <parameter key="6" value="Priority.true.polynominal.attribute"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="9.0.001" expanded="true" height="103" name="Normalize_Dataset" width="90" x="313" y="34"/>
      <operator activated="true" class="select_attributes" compatibility="9.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="715" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Age|DisabilitySeverity|Edu|IsCityOrNot|Job|Marital|Money|Mostamari|PoshtNobat|Sex|TedadMalolDarKhanevade|TedadMaloliatHarFard"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="false" class="write_excel" compatibility="9.0.001" expanded="true" height="82" name="Write Excel (2)" width="90" x="849" y="34">
        <parameter key="excel_file" value="C:\Users\MyCurisityLaptop\Documents\1.xlsx"/>
      </operator>
      <operator activated="true" class="denormalize" compatibility="9.0.001" expanded="true" height="82" name="De-Normalize" width="90" x="514" y="85"/>
      <operator activated="true" class="read_excel" compatibility="9.0.001" expanded="true" height="68" name="Training_data_with_label_top-k" width="90" x="45" y="238">
        <parameter key="excel_file" value="C:\Users\Lionel\Downloads\Dataset.xlsx"/>
        <list key="annotations"/>
        <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="NID.true.integer.attribute"/>
          <parameter key="1" value="LiveInCity.true.integer.attribute"/>
          <parameter key="2" value="Age.true.integer.attribute"/>
          <parameter key="3" value="Marital.true.integer.attribute"/>
          <parameter key="4" value="Sex.true.integer.attribute"/>
          <parameter key="5" value="Edu.true.integer.attribute"/>
          <parameter key="6" value="Priority.true.polynominal.attribute"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
      </operator>
      <operator activated="true" class="read_excel" compatibility="9.0.001" expanded="true" height="68" name="Training_data_with_label" width="90" x="1586" y="289">
        <parameter key="excel_file" value="C:\Users\Lionel\Downloads\Dataset.xlsx"/>
        <list key="annotations"/>
        <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="NID.true.integer.attribute"/>
          <parameter key="1" value="LiveInCity.true.integer.attribute"/>
          <parameter key="2" value="Age.true.integer.attribute"/>
          <parameter key="3" value="Marital.true.integer.attribute"/>
          <parameter key="4" value="Sex.true.integer.attribute"/>
          <parameter key="5" value="Edu.true.integer.attribute"/>
          <parameter key="6" value="Priority.true.polynominal.attribute"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="9.0.001" expanded="true" height="103" name="Normalize_TestData" width="90" x="1720" y="289"/>
      <operator activated="true" class="numerical_to_polynominal" compatibility="9.0.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="112" y="340">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="NID"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.0.001" expanded="true" height="82" name="Set Role (2)" width="90" x="179" y="238">
        <parameter key="attribute_name" value="NID"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="9.0.001" expanded="true" height="103" name="Normalize_TrainingData" width="90" x="313" y="238"/>
      <operator activated="true" class="multiply" compatibility="9.0.001" expanded="true" height="145" name="Multiply" width="90" x="447" y="238"/>
      <operator activated="true" class="weight_by_gini_index" compatibility="9.0.001" expanded="true" height="82" name="Weight by Gini Index" width="90" x="581" y="442">
        <parameter key="normalize_weights" value="true"/>
      </operator>
      <operator activated="true" class="weights_to_data" compatibility="9.0.001" expanded="true" height="68" name="Weights to Data (3)" width="90" x="715" y="442"/>
      <operator activated="true" class="rename_by_replacing" compatibility="9.0.001" expanded="true" height="82" name="Rename by Replacing (3)" width="90" x="849" y="442">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Weight"/>
        <parameter key="replace_what" value="Weight"/>
        <parameter key="replace_by" value="Weight_Gini_Index"/>
      </operator>
      <operator activated="true" class="weight_by_information_gain_ratio" compatibility="9.0.001" expanded="true" height="82" name="Weight by Information Gain Ratio" width="90" x="581" y="340">
        <parameter key="normalize_weights" value="true"/>
      </operator>
      <operator activated="true" class="weights_to_data" compatibility="9.0.001" expanded="true" height="68" name="Weights to Data (2)" width="90" x="715" y="340"/>
      <operator activated="true" class="rename_by_replacing" compatibility="9.0.001" expanded="true" height="82" name="Rename by Replacing (2)" width="90" x="849" y="340">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Weight"/>
        <parameter key="replace_what" value="Weight"/>
        <parameter key="replace_by" value="Weight_Info_Gain_Ratio"/>
      </operator>
      <operator activated="true" class="weight_by_information_gain" compatibility="9.0.001" expanded="true" height="82" name="Weight by Information Gain" width="90" x="581" y="238">
        <parameter key="normalize_weights" value="true"/>
      </operator>
      <operator activated="true" class="weights_to_data" compatibility="9.0.001" expanded="true" height="68" name="Weights to Data" width="90" x="715" y="238"/>
      <operator activated="true" class="rename_by_replacing" compatibility="9.0.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="849" y="238">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Weight"/>
        <parameter key="replace_what" value="Weight"/>
        <parameter key="replace_by" value="Weight_Info_Gain"/>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="9.0.001" expanded="true" height="82" name="Join" width="90" x="983" y="238">
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="Attribute" value="Attribute"/>
        </list>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="9.0.001" expanded="true" height="82" name="Join (2)" width="90" x="983" y="340">
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="Attribute" value="Attribute"/>
        </list>
      </operator>
      <operator activated="true" class="weight_by_chi_squared_statistic" compatibility="9.0.001" expanded="true" height="82" name="Weight by Chi Squared Statistic" width="90" x="581" y="544">
        <parameter key="normalize_weights" value="true"/>
      </operator>
      <operator activated="true" class="weights_to_data" compatibility="9.0.001" expanded="true" height="68" name="Weights to Data (4)" width="90" x="715" y="544"/>
      <operator activated="true" class="rename_by_replacing" compatibility="9.0.001" expanded="true" height="82" name="Rename by Replacing (4)" width="90" x="849" y="544">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Weight"/>
        <parameter key="replace_what" value="Weight"/>
        <parameter key="replace_by" value="Weight_Chi_Square"/>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="9.0.001" expanded="true" height="82" name="Join (3)" width="90" x="983" y="442">
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="Attribute" value="Attribute"/>
        </list>
      </operator>
      <operator activated="true" class="generate_aggregation" compatibility="9.0.001" expanded="true" height="82" name="Generate Aggregation" width="90" x="983" y="544">
        <parameter key="attribute_name" value="Weight"/>
        <parameter key="attribute_filter_type" value="regular_expression"/>
        <parameter key="regular_expression" value="Weight.*"/>
        <parameter key="except_regular_expression" value="Attribute"/>
        <parameter key="aggregation_function" value="average"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="9.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="1117" y="493">
        <parameter key="attribute_filter_type" value="regular_expression"/>
        <parameter key="regular_expression" value="Weight_.*"/>
        <parameter key="except_regular_expression" value="Attribute"/>
        <parameter key="invert_selection" value="true"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="9.0.001" expanded="true" height="103" name="Normalize" width="90" x="1117" y="340">
        <parameter key="method" value="range transformation"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="9.0.001" expanded="true" height="82" name="Weight Data Converter" width="90" x="1117" y="238">
        <process expanded="true">
          <operator activated="true" class="set_macro" compatibility="9.0.001" expanded="true" height="82" name="Temp File Path" width="90" x="45" y="34">
            <parameter key="macro" value="weight_file_path"/>
            <parameter key="value" value="Desktop/average_weight.dat"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.0.001" expanded="true" height="82" name="Create Pseudo Attr" width="90" x="179" y="34">
            <list key="function_descriptions">
              <parameter key="_bas" value="&quot;&lt;weight name=&quot;"/>
              <parameter key="_orta" value="&quot; value=&quot;"/>
              <parameter key="_son" value="&quot;/&gt;&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="order_attributes" compatibility="9.0.001" expanded="true" height="82" name="ReOrder" width="90" x="313" y="34">
            <parameter key="attribute_ordering" value="_bas|Attribute|_orta|Weight|_son"/>
          </operator>
          <operator activated="true" class="write_message" compatibility="9.0.001" expanded="true" height="82" name="Write Opening Tags" width="90" x="447" y="34">
            <parameter key="file" value="%{weight_file_path}"/>
            <parameter key="text" value="&lt;?xml version=&quot;1.0&quot; encoding=&quot;windows-1254&quot;?&gt;&#10;&#10;&lt;attributeweights version=&quot;8.2&quot;&gt;&#10;"/>
          </operator>
          <operator activated="true" class="write_special" compatibility="9.0.001" expanded="true" height="68" name="Write Weight Values" width="90" x="581" y="34">
            <parameter key="example_set_file" value="%{weight_file_path}"/>
            <parameter key="special_format" value="$t$a[&quot;]"/>
            <parameter key="quote_nominal_values" value="false"/>
            <parameter key="overwrite_mode" value="append"/>
          </operator>
          <operator activated="true" class="write_message" compatibility="9.0.001" expanded="true" height="82" name="Write Closing Tags" width="90" x="715" y="34">
            <parameter key="file" value="%{weight_file_path}"/>
            <parameter key="text" value="&lt;/attributeweights&gt;"/>
            <parameter key="mode" value="append"/>
          </operator>
          <operator activated="true" class="legacy:read_weights" compatibility="9.0.001" expanded="true" height="68" name="Read Weight File" width="90" x="849" y="34">
            <parameter key="attribute_weights_file" value="%{weight_file_path}"/>
          </operator>
          <connect from_port="in 1" to_op="Temp File Path" to_port="through 1"/>
          <connect from_op="Temp File Path" from_port="through 1" to_op="Create Pseudo Attr" to_port="example set input"/>
          <connect from_op="Create Pseudo Attr" from_port="example set output" to_op="ReOrder" to_port="example set input"/>
          <connect from_op="ReOrder" from_port="example set output" to_op="Write Opening Tags" to_port="through 1"/>
          <connect from_op="Write Opening Tags" from_port="through 1" to_op="Write Weight Values" to_port="input"/>
          <connect from_op="Write Weight Values" from_port="through" to_op="Write Closing Tags" to_port="through 1"/>
          <connect from_op="Read Weight File" from_port="output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_by_weights" compatibility="9.0.001" expanded="true" height="103" name="Select by Weights" width="90" x="983" y="34">
        <parameter key="weight_relation" value="top k"/>
        <parameter key="k" value="3"/>
      </operator>
      <operator activated="false" class="write_excel" compatibility="9.0.001" expanded="true" height="82" name="Write Excel" width="90" x="1117" y="34">
        <parameter key="excel_file" value="C:\Users\MyCurisityLaptop\Documents\2.xlsx"/>
      </operator>
      <operator activated="true" class="k_medoids" compatibility="9.0.001" expanded="true" height="82" name="Clustering (2)" width="90" x="1318" y="34">
        <parameter key="k" value="3"/>
        <parameter key="use_local_random_seed" value="true"/>
      </operator>
      <operator activated="true" class="map" compatibility="9.0.001" expanded="true" height="82" name="Map" width="90" x="1318" y="136">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="cluster"/>
        <parameter key="include_special_attributes" value="true"/>
        <list key="value_mappings">
          <parameter key="cluster_0" value="Ø³Ù‡"/>
          <parameter key="cluster_1" value="Ú†Ù‡Ø§Ø±"/>
          <parameter key="cluster_2" value="Ø¯Ùˆ"/>
        </list>
      </operator>
      <operator activated="true" class="apply_model" compatibility="9.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="1318" y="340">
        <list key="application_parameters">
          <parameter key="NID" value="NID"/>
        </list>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.0.001" expanded="true" height="82" name="Set Role" width="90" x="1519" y="34">
        <parameter key="attribute_name" value="cluster"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="9.0.001" expanded="true" height="145" name="Cross Validation" width="90" x="1720" y="34">
        <process expanded="true">
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.0.001" expanded="true" height="103" name="Decision Tree" width="90" x="112" y="34"/>
          <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="9.0.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="9.0.001" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
            <parameter key="classification_error" value="true"/>
            <list key="class_weights"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="9.0.001" expanded="true" height="82" name="Apply Model (3)" width="90" x="1921" y="238">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Training_data_without_label" from_port="output" to_op="Normalize_Dataset" to_port="example set input"/>
      <connect from_op="Normalize_Dataset" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Normalize_Dataset" from_port="preprocessing model" to_op="De-Normalize" to_port="model input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Select by Weights" to_port="example set input"/>
      <connect from_op="De-Normalize" from_port="model output" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Training_data_with_label_top-k" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
      <connect from_op="Training_data_with_label" from_port="output" to_op="Normalize_TestData" to_port="example set input"/>
      <connect from_op="Normalize_TestData" from_port="example set output" to_op="Apply Model (3)" to_port="unlabelled data"/>
      <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
      <connect from_op="Set Role (2)" from_port="example set output" to_op="Normalize_TrainingData" to_port="example set input"/>
      <connect from_op="Normalize_TrainingData" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Weight by Information Gain" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Weight by Information Gain Ratio" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 3" to_op="Weight by Gini Index" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 4" to_op="Weight by Chi Squared Statistic" to_port="example set"/>
      <connect from_op="Weight by Gini Index" from_port="weights" to_op="Weights to Data (3)" to_port="attribute weights"/>
      <connect from_op="Weights to Data (3)" from_port="example set" to_op="Rename by Replacing (3)" to_port="example set input"/>
      <connect from_op="Rename by Replacing (3)" from_port="example set output" to_op="Join (2)" to_port="right"/>
      <connect from_op="Weight by Information Gain Ratio" from_port="weights" to_op="Weights to Data (2)" to_port="attribute weights"/>
      <connect from_op="Weights to Data (2)" from_port="example set" to_op="Rename by Replacing (2)" to_port="example set input"/>
      <connect from_op="Rename by Replacing (2)" from_port="example set output" to_op="Join" to_port="right"/>
      <connect from_op="Weight by Information Gain" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/>
      <connect from_op="Weights to Data" from_port="example set" to_op="Rename by Replacing" to_port="example set input"/>
      <connect from_op="Rename by Replacing" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Join" from_port="join" to_op="Join (2)" to_port="left"/>
      <connect from_op="Join (2)" from_port="join" to_op="Join (3)" to_port="left"/>
      <connect from_op="Weight by Chi Squared Statistic" from_port="weights" to_op="Weights to Data (4)" to_port="attribute weights"/>
      <connect from_op="Weights to Data (4)" from_port="example set" to_op="Rename by Replacing (4)" to_port="example set input"/>
      <connect from_op="Rename by Replacing (4)" from_port="example set output" to_op="Join (3)" to_port="right"/>
      <connect from_op="Join (3)" from_port="join" to_op="Generate Aggregation" to_port="example set input"/>
      <connect from_op="Generate Aggregation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Weight Data Converter" to_port="in 1"/>
      <connect from_op="Weight Data Converter" from_port="out 1" to_op="Select by Weights" to_port="weights"/>
      <connect from_op="Select by Weights" from_port="example set output" to_op="Clustering (2)" to_port="example set"/>
      <connect from_op="Clustering (2)" from_port="clustered set" to_op="Map" to_port="example set input"/>
      <connect from_op="Map" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
      <connect from_op="Cross Validation" from_port="model" to_op="Apply Model (3)" to_port="model"/>
      <connect from_op="Cross Validation" from_port="test result set" to_port="result 1"/>
      <connect from_op="Cross Validation" from_port="performance 1" to_port="result 2"/>
      <connect from_op="Apply Model (3)" from_port="labelled data" to_port="result 3"/>
      <connect from_op="Apply Model (3)" from_port="model" to_port="result 4"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

I hope it helps,

Regards,

Lionel

lionelderkrikor · August 2018

Hi @m_gholami1991,

And here a simplified process without Decision Tree :

Like I said in a previous post, data are just clustered (after performing feature selection) and then

simply compared to the labeled data.

The process :

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="false" class="write_excel" compatibility="9.0.001" expanded="true" height="82" name="Write Excel (4)" width="90" x="1318" y="238">
        <parameter key="excel_file" value="C:\Users\MyCurisityLaptop\Documents\3.xlsx"/>
      </operator>
      <operator activated="true" class="read_excel" compatibility="9.0.001" expanded="true" height="68" name="Training_data_without_label" width="90" x="112" y="34">
        <parameter key="excel_file" value="C:\Users\Lionel\Downloads\Dataset.xlsx"/>
        <list key="annotations"/>
        <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="NID.true.integer.attribute"/>
          <parameter key="1" value="LiveInCity.true.integer.attribute"/>
          <parameter key="2" value="Age.true.integer.attribute"/>
          <parameter key="3" value="Marital.true.integer.attribute"/>
          <parameter key="4" value="Sex.true.integer.attribute"/>
          <parameter key="5" value="Edu.true.integer.attribute"/>
          <parameter key="6" value="Priority.true.polynominal.attribute"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.0.001" expanded="true" height="82" name="Set Role (3)" width="90" x="246" y="34">
        <parameter key="attribute_name" value="Priority"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="9.0.001" expanded="true" height="103" name="Normalize_Dataset" width="90" x="447" y="34"/>
      <operator activated="false" class="select_attributes" compatibility="9.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="715" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Age|DisabilitySeverity|Edu|IsCityOrNot|Job|Marital|Money|Mostamari|PoshtNobat|Sex|TedadMalolDarKhanevade|TedadMaloliatHarFard"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="false" class="write_excel" compatibility="9.0.001" expanded="true" height="82" name="Write Excel (2)" width="90" x="849" y="34">
        <parameter key="excel_file" value="C:\Users\MyCurisityLaptop\Documents\1.xlsx"/>
      </operator>
      <operator activated="true" class="denormalize" compatibility="9.0.001" expanded="true" height="82" name="De-Normalize" width="90" x="581" y="136"/>
      <operator activated="true" class="read_excel" compatibility="9.0.001" expanded="true" height="68" name="Training_data_with_label_top-k" width="90" x="45" y="238">
        <parameter key="excel_file" value="C:\Users\Lionel\Downloads\Dataset.xlsx"/>
        <list key="annotations"/>
        <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="NID.true.integer.attribute"/>
          <parameter key="1" value="LiveInCity.true.integer.attribute"/>
          <parameter key="2" value="Age.true.integer.attribute"/>
          <parameter key="3" value="Marital.true.integer.attribute"/>
          <parameter key="4" value="Sex.true.integer.attribute"/>
          <parameter key="5" value="Edu.true.integer.attribute"/>
          <parameter key="6" value="Priority.true.polynominal.attribute"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
      </operator>
      <operator activated="true" class="numerical_to_polynominal" compatibility="9.0.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="112" y="340">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="NID"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.0.001" expanded="true" height="82" name="Set Role (2)" width="90" x="179" y="238">
        <parameter key="attribute_name" value="NID"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="9.0.001" expanded="true" height="103" name="Normalize_TrainingData" width="90" x="313" y="238"/>
      <operator activated="true" class="multiply" compatibility="9.0.001" expanded="true" height="145" name="Multiply" width="90" x="447" y="238"/>
      <operator activated="true" class="weight_by_gini_index" compatibility="9.0.001" expanded="true" height="82" name="Weight by Gini Index" width="90" x="581" y="442">
        <parameter key="normalize_weights" value="true"/>
      </operator>
      <operator activated="true" class="weights_to_data" compatibility="9.0.001" expanded="true" height="68" name="Weights to Data (3)" width="90" x="715" y="442"/>
      <operator activated="true" class="rename_by_replacing" compatibility="9.0.001" expanded="true" height="82" name="Rename by Replacing (3)" width="90" x="849" y="442">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Weight"/>
        <parameter key="replace_what" value="Weight"/>
        <parameter key="replace_by" value="Weight_Gini_Index"/>
      </operator>
      <operator activated="true" class="weight_by_information_gain_ratio" compatibility="9.0.001" expanded="true" height="82" name="Weight by Information Gain Ratio" width="90" x="581" y="340">
        <parameter key="normalize_weights" value="true"/>
      </operator>
      <operator activated="true" class="weights_to_data" compatibility="9.0.001" expanded="true" height="68" name="Weights to Data (2)" width="90" x="715" y="340"/>
      <operator activated="true" class="rename_by_replacing" compatibility="9.0.001" expanded="true" height="82" name="Rename by Replacing (2)" width="90" x="849" y="340">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Weight"/>
        <parameter key="replace_what" value="Weight"/>
        <parameter key="replace_by" value="Weight_Info_Gain_Ratio"/>
      </operator>
      <operator activated="true" class="weight_by_information_gain" compatibility="9.0.001" expanded="true" height="82" name="Weight by Information Gain" width="90" x="581" y="238">
        <parameter key="normalize_weights" value="true"/>
      </operator>
      <operator activated="true" class="weights_to_data" compatibility="9.0.001" expanded="true" height="68" name="Weights to Data" width="90" x="715" y="238"/>
      <operator activated="true" class="rename_by_replacing" compatibility="9.0.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="849" y="238">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Weight"/>
        <parameter key="replace_what" value="Weight"/>
        <parameter key="replace_by" value="Weight_Info_Gain"/>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="9.0.001" expanded="true" height="82" name="Join" width="90" x="983" y="238">
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="Attribute" value="Attribute"/>
        </list>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="9.0.001" expanded="true" height="82" name="Join (2)" width="90" x="983" y="340">
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="Attribute" value="Attribute"/>
        </list>
      </operator>
      <operator activated="true" class="weight_by_chi_squared_statistic" compatibility="9.0.001" expanded="true" height="82" name="Weight by Chi Squared Statistic" width="90" x="581" y="544">
        <parameter key="normalize_weights" value="true"/>
      </operator>
      <operator activated="true" class="weights_to_data" compatibility="9.0.001" expanded="true" height="68" name="Weights to Data (4)" width="90" x="715" y="544"/>
      <operator activated="true" class="rename_by_replacing" compatibility="9.0.001" expanded="true" height="82" name="Rename by Replacing (4)" width="90" x="849" y="544">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Weight"/>
        <parameter key="replace_what" value="Weight"/>
        <parameter key="replace_by" value="Weight_Chi_Square"/>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="9.0.001" expanded="true" height="82" name="Join (3)" width="90" x="983" y="442">
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="Attribute" value="Attribute"/>
        </list>
      </operator>
      <operator activated="true" class="generate_aggregation" compatibility="9.0.001" expanded="true" height="82" name="Generate Aggregation" width="90" x="983" y="544">
        <parameter key="attribute_name" value="Weight"/>
        <parameter key="attribute_filter_type" value="regular_expression"/>
        <parameter key="regular_expression" value="Weight.*"/>
        <parameter key="except_regular_expression" value="Attribute"/>
        <parameter key="aggregation_function" value="average"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="9.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="1117" y="493">
        <parameter key="attribute_filter_type" value="regular_expression"/>
        <parameter key="regular_expression" value="Weight_.*"/>
        <parameter key="except_regular_expression" value="Attribute"/>
        <parameter key="invert_selection" value="true"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="9.0.001" expanded="true" height="103" name="Normalize" width="90" x="1117" y="340">
        <parameter key="method" value="range transformation"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="9.0.001" expanded="true" height="82" name="Weight Data Converter" width="90" x="1117" y="238">
        <process expanded="true">
          <operator activated="true" class="set_macro" compatibility="9.0.001" expanded="true" height="82" name="Temp File Path" width="90" x="45" y="34">
            <parameter key="macro" value="weight_file_path"/>
            <parameter key="value" value="Desktop/average_weight.dat"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.0.001" expanded="true" height="82" name="Create Pseudo Attr" width="90" x="179" y="34">
            <list key="function_descriptions">
              <parameter key="_bas" value="&quot;&lt;weight name=&quot;"/>
              <parameter key="_orta" value="&quot; value=&quot;"/>
              <parameter key="_son" value="&quot;/&gt;&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="order_attributes" compatibility="9.0.001" expanded="true" height="82" name="ReOrder" width="90" x="313" y="34">
            <parameter key="attribute_ordering" value="_bas|Attribute|_orta|Weight|_son"/>
          </operator>
          <operator activated="true" class="write_message" compatibility="9.0.001" expanded="true" height="82" name="Write Opening Tags" width="90" x="447" y="34">
            <parameter key="file" value="%{weight_file_path}"/>
            <parameter key="text" value="&lt;?xml version=&quot;1.0&quot; encoding=&quot;windows-1254&quot;?&gt;&#10;&#10;&lt;attributeweights version=&quot;8.2&quot;&gt;&#10;"/>
          </operator>
          <operator activated="true" class="write_special" compatibility="9.0.001" expanded="true" height="68" name="Write Weight Values" width="90" x="581" y="34">
            <parameter key="example_set_file" value="%{weight_file_path}"/>
            <parameter key="special_format" value="$t$a[&quot;]"/>
            <parameter key="quote_nominal_values" value="false"/>
            <parameter key="overwrite_mode" value="append"/>
          </operator>
          <operator activated="true" class="write_message" compatibility="9.0.001" expanded="true" height="82" name="Write Closing Tags" width="90" x="715" y="34">
            <parameter key="file" value="%{weight_file_path}"/>
            <parameter key="text" value="&lt;/attributeweights&gt;"/>
            <parameter key="mode" value="append"/>
          </operator>
          <operator activated="true" class="legacy:read_weights" compatibility="9.0.001" expanded="true" height="68" name="Read Weight File" width="90" x="849" y="34">
            <parameter key="attribute_weights_file" value="%{weight_file_path}"/>
          </operator>
          <connect from_port="in 1" to_op="Temp File Path" to_port="through 1"/>
          <connect from_op="Temp File Path" from_port="through 1" to_op="Create Pseudo Attr" to_port="example set input"/>
          <connect from_op="Create Pseudo Attr" from_port="example set output" to_op="ReOrder" to_port="example set input"/>
          <connect from_op="ReOrder" from_port="example set output" to_op="Write Opening Tags" to_port="through 1"/>
          <connect from_op="Write Opening Tags" from_port="through 1" to_op="Write Weight Values" to_port="input"/>
          <connect from_op="Write Weight Values" from_port="through" to_op="Write Closing Tags" to_port="through 1"/>
          <connect from_op="Read Weight File" from_port="output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_by_weights" compatibility="9.0.001" expanded="true" height="103" name="Select by Weights" width="90" x="983" y="34">
        <parameter key="weight_relation" value="top k"/>
        <parameter key="k" value="3"/>
      </operator>
      <operator activated="false" class="write_excel" compatibility="9.0.001" expanded="true" height="82" name="Write Excel" width="90" x="1117" y="34">
        <parameter key="excel_file" value="C:\Users\MyCurisityLaptop\Documents\2.xlsx"/>
      </operator>
      <operator activated="true" class="k_medoids" compatibility="9.0.001" expanded="true" height="82" name="Clustering (2)" width="90" x="1318" y="34">
        <parameter key="k" value="3"/>
        <parameter key="use_local_random_seed" value="true"/>
      </operator>
      <operator activated="true" class="map" compatibility="9.0.001" expanded="true" height="82" name="Map" width="90" x="1318" y="136">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="cluster"/>
        <parameter key="include_special_attributes" value="true"/>
        <list key="value_mappings">
          <parameter key="cluster_0" value="Ø³Ù‡"/>
          <parameter key="cluster_1" value="Ú†Ù‡Ø§Ø±"/>
          <parameter key="cluster_2" value="Ø¯Ùˆ"/>
        </list>
      </operator>
      <operator activated="true" class="apply_model" compatibility="9.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="1318" y="340">
        <list key="application_parameters">
          <parameter key="NID" value="NID"/>
        </list>
      </operator>
      <connect from_op="Training_data_without_label" from_port="output" to_op="Set Role (3)" to_port="example set input"/>
      <connect from_op="Set Role (3)" from_port="example set output" to_op="Normalize_Dataset" to_port="example set input"/>
      <connect from_op="Normalize_Dataset" from_port="example set output" to_op="Select by Weights" to_port="example set input"/>
      <connect from_op="Normalize_Dataset" from_port="preprocessing model" to_op="De-Normalize" to_port="model input"/>
      <connect from_op="De-Normalize" from_port="model output" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Training_data_with_label_top-k" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
      <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
      <connect from_op="Set Role (2)" from_port="example set output" to_op="Normalize_TrainingData" to_port="example set input"/>
      <connect from_op="Normalize_TrainingData" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Weight by Information Gain" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Weight by Information Gain Ratio" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 3" to_op="Weight by Gini Index" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 4" to_op="Weight by Chi Squared Statistic" to_port="example set"/>
      <connect from_op="Weight by Gini Index" from_port="weights" to_op="Weights to Data (3)" to_port="attribute weights"/>
      <connect from_op="Weights to Data (3)" from_port="example set" to_op="Rename by Replacing (3)" to_port="example set input"/>
      <connect from_op="Rename by Replacing (3)" from_port="example set output" to_op="Join (2)" to_port="right"/>
      <connect from_op="Weight by Information Gain Ratio" from_port="weights" to_op="Weights to Data (2)" to_port="attribute weights"/>
      <connect from_op="Weights to Data (2)" from_port="example set" to_op="Rename by Replacing (2)" to_port="example set input"/>
      <connect from_op="Rename by Replacing (2)" from_port="example set output" to_op="Join" to_port="right"/>
      <connect from_op="Weight by Information Gain" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/>
      <connect from_op="Weights to Data" from_port="example set" to_op="Rename by Replacing" to_port="example set input"/>
      <connect from_op="Rename by Replacing" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Join" from_port="join" to_op="Join (2)" to_port="left"/>
      <connect from_op="Join (2)" from_port="join" to_op="Join (3)" to_port="left"/>
      <connect from_op="Weight by Chi Squared Statistic" from_port="weights" to_op="Weights to Data (4)" to_port="attribute weights"/>
      <connect from_op="Weights to Data (4)" from_port="example set" to_op="Rename by Replacing (4)" to_port="example set input"/>
      <connect from_op="Rename by Replacing (4)" from_port="example set output" to_op="Join (3)" to_port="right"/>
      <connect from_op="Join (3)" from_port="join" to_op="Generate Aggregation" to_port="example set input"/>
      <connect from_op="Generate Aggregation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Weight Data Converter" to_port="in 1"/>
      <connect from_op="Weight Data Converter" from_port="out 1" to_op="Select by Weights" to_port="weights"/>
      <connect from_op="Select by Weights" from_port="example set output" to_op="Clustering (2)" to_port="example set"/>
      <connect from_op="Clustering (2)" from_port="clustered set" to_op="Map" to_port="example set input"/>
      <connect from_op="Map" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

I hope it helps, too.

Regards,

Lionel

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

how to connect between Set Role operator and Apply Model operator

Answers