The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Imputing challenge with a best possible ensemble

msacs09msacs09 Member Posts: 55 Contributor II
Experts,

I have the attache sample data and i wanted to impute with an ensembles, and pick the best imputation model. The data i have is 24  months of data  and need to impute the missing months based on best possible algorithm like average/nearest available (knn)/linear regression etc., I tried doing that but i'm seeing data being imputed in columnar way (i.e) it is taking average on column for different id's and applying that.. what we need is to perform a row wise average as opposed to columnar. 

I would greatly appreciate a sample process with ensemble that would impute based on row values 

Many Thanks
S


Answers

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @msacs09 ,

    Does this process answer to your need ?

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" breakpoints="after" class="read_excel" compatibility="9.2.000" expanded="true" height="68" name="Read Excel" width="90" x="179" y="85">
            <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Impute_column_missing_values\Sample_Data_2_Impute.xlsx"/>
            <parameter key="sheet_selection" value="sheet number"/>
            <parameter key="sheet_number" value="1"/>
            <parameter key="imported_cell_range" value="A1"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="first_row_as_names" value="true"/>
            <list key="annotations"/>
            <parameter key="date_format" value=""/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="ID.true.integer.attribute"/>
              <parameter key="1" value="2017-01.true.integer.attribute"/>
              <parameter key="2" value="2017-02.true.integer.attribute"/>
              <parameter key="3" value="2017-03.true.integer.attribute"/>
              <parameter key="4" value="2017-04.true.integer.attribute"/>
              <parameter key="5" value="2017-05.true.integer.attribute"/>
              <parameter key="6" value="2017-06.true.integer.attribute"/>
              <parameter key="7" value="2017-07.true.integer.attribute"/>
              <parameter key="8" value="2017-08.true.integer.attribute"/>
              <parameter key="9" value="2017-09.true.integer.attribute"/>
              <parameter key="10" value="2017-10.true.integer.attribute"/>
              <parameter key="11" value="2017-11.true.integer.attribute"/>
              <parameter key="12" value="2017-12.true.integer.attribute"/>
              <parameter key="13" value="2018-01.true.integer.attribute"/>
              <parameter key="14" value="2018-02.true.integer.attribute"/>
              <parameter key="15" value="2018-03.true.integer.attribute"/>
              <parameter key="16" value="2018-04.true.integer.attribute"/>
              <parameter key="17" value="2018-05.true.integer.attribute"/>
              <parameter key="18" value="2018-06.true.integer.attribute"/>
              <parameter key="19" value="2018-07.true.integer.attribute"/>
              <parameter key="20" value="2018-08.true.integer.attribute"/>
              <parameter key="21" value="2018-09.true.integer.attribute"/>
              <parameter key="22" value="2018-10.true.integer.attribute"/>
              <parameter key="23" value="2018-11.true.integer.attribute"/>
              <parameter key="24" value="2018-12.true.integer.attribute"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" class="transpose" compatibility="9.2.000" expanded="true" height="82" name="Transpose" width="90" x="313" y="85"/>
          <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="ID"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000" expanded="true" height="82" name="Generate ID" width="90" x="648" y="238">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="9.2.000" expanded="true" height="82" name="Filter Example Range" width="90" x="447" y="85">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="1"/>
            <parameter key="invert_filter" value="true"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="9.2.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="581" y="85">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default" value="average"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="transpose" compatibility="9.2.000" expanded="true" height="82" name="Transpose (2)" width="90" x="715" y="85"/>
          <operator activated="true" class="generate_id" compatibility="9.2.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="849" y="85">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.2.000" expanded="true" height="82" name="Join" width="90" x="1050" y="136">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Transpose" to_port="example set input"/>
          <connect from_op="Transpose" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Transpose" from_port="original" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Transpose (2)" to_port="example set input"/>
          <connect from_op="Transpose (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
          <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    

    Regards,

    Lionel


  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Hi @msacs,

    I think you cannot impute on row average. However, what you can do is:
    1. Read Excel
    2. Set the ID role to the first column.
    3. Transpose your data, so now your columns are rows.
    Then you can use some magic filters to try to iterate and try to impute the values from rows that have less missing information first. I don't know anything about this model, so can't help you better, but here is the XML of what I have done so far. This essentially solves your question on how to impute values on rows instead of columns.

    Attached is the XML for transposing, hope this helps.

    Hope this helps,

    Rod.



  • msacs09msacs09 Member Posts: 55 Contributor II
    lionelderkrikor Thank you Very much sir. The attached is what my end goal (i.e) to pick the right ensemble. I'm struggling to incorporate this into your proposed process.  I get errors like "only One Label" etc.,

    As Always Many Thanks for your expert help.
  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @msacs09,

    I must admit that I'm little lost with the Impute Missing Values in your process...

    Like in a Hollywood fantastic movie, the datas disappear mysteriously when they enter in the Impute Missing Values operator...!!! :# 

    More seriously, when I set a breakpoint before  the Impute Missing Values, I obtain logically the following example set : 
     When I set a breakpoint before the model inside the Impute missing values operator, I obtain an empty example set !! : 

     

    So when the process is executed, RM raises logically an error ("example set is empty")
    But I have to add that the datas seem to be spreading well (when I click on the ouput port of Impute missing values operator, I obtain an example set with no missing values !!) : 


    Someone has an idea of what's going on ?

    Regards,

    Lionel

    NB : the process : 
    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="9.2.000" expanded="true" height="68" name="Read Excel" width="90" x="179" y="85">
            <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Impute_column_missing_values\Sample_Data_2_Impute.xlsx"/>
            <parameter key="sheet_selection" value="sheet number"/>
            <parameter key="sheet_number" value="1"/>
            <parameter key="imported_cell_range" value="A1"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="first_row_as_names" value="true"/>
            <list key="annotations"/>
            <parameter key="date_format" value=""/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="ID.true.integer.attribute"/>
              <parameter key="1" value="2017-01.true.integer.attribute"/>
              <parameter key="2" value="2017-02.true.integer.attribute"/>
              <parameter key="3" value="2017-03.true.integer.attribute"/>
              <parameter key="4" value="2017-04.true.integer.attribute"/>
              <parameter key="5" value="2017-05.true.integer.attribute"/>
              <parameter key="6" value="2017-06.true.integer.attribute"/>
              <parameter key="7" value="2017-07.true.integer.attribute"/>
              <parameter key="8" value="2017-08.true.integer.attribute"/>
              <parameter key="9" value="2017-09.true.integer.attribute"/>
              <parameter key="10" value="2017-10.true.integer.attribute"/>
              <parameter key="11" value="2017-11.true.integer.attribute"/>
              <parameter key="12" value="2017-12.true.integer.attribute"/>
              <parameter key="13" value="2018-01.true.integer.attribute"/>
              <parameter key="14" value="2018-02.true.integer.attribute"/>
              <parameter key="15" value="2018-03.true.integer.attribute"/>
              <parameter key="16" value="2018-04.true.integer.attribute"/>
              <parameter key="17" value="2018-05.true.integer.attribute"/>
              <parameter key="18" value="2018-06.true.integer.attribute"/>
              <parameter key="19" value="2018-07.true.integer.attribute"/>
              <parameter key="20" value="2018-08.true.integer.attribute"/>
              <parameter key="21" value="2018-09.true.integer.attribute"/>
              <parameter key="22" value="2018-10.true.integer.attribute"/>
              <parameter key="23" value="2018-11.true.integer.attribute"/>
              <parameter key="24" value="2018-12.true.integer.attribute"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" class="transpose" compatibility="9.2.000" expanded="true" height="82" name="Transpose" width="90" x="313" y="85"/>
          <operator activated="true" class="filter_example_range" compatibility="9.2.000" expanded="true" height="82" name="Filter Example Range" width="90" x="447" y="85">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="1"/>
            <parameter key="invert_filter" value="true"/>
          </operator>
          <operator activated="true" breakpoints="before" class="impute_missing_values" compatibility="9.2.000" expanded="true" height="68" name="Impute Missing Values (4)" width="90" x="581" y="85">
            <parameter key="attribute_filter_type" value="value_type"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="iterate" value="true"/>
            <parameter key="learn_on_complete_cases" value="true"/>
            <parameter key="order" value="chronological"/>
            <parameter key="sort" value="ascending"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <process expanded="true">
              <operator activated="true" breakpoints="before" class="neural_net" compatibility="9.2.000" expanded="true" height="82" name="Neural Net (2)" width="90" x="447" y="34">
                <list key="hidden_layers"/>
                <parameter key="training_cycles" value="200"/>
                <parameter key="learning_rate" value="0.01"/>
                <parameter key="momentum" value="0.9"/>
                <parameter key="decay" value="false"/>
                <parameter key="shuffle" value="true"/>
                <parameter key="normalize" value="true"/>
                <parameter key="error_epsilon" value="1.0E-4"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
              </operator>
              <connect from_port="example set source" to_op="Neural Net (2)" to_port="training set"/>
              <connect from_op="Neural Net (2)" from_port="model" to_port="model sink"/>
              <portSpacing port="source_example set source" spacing="0"/>
              <portSpacing port="sink_model sink" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="ID"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000" expanded="true" height="82" name="Generate ID" width="90" x="648" y="238">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="false" class="subprocess" compatibility="9.2.000" expanded="true" height="82" name="Subprocess" width="90" x="648" y="442">
            <process expanded="true">
              <operator activated="false" class="multiply" compatibility="9.2.000" expanded="true" height="68" name="Multiply" width="90" x="45" y="136"/>
              <operator activated="false" class="materialize_data" compatibility="9.2.000" expanded="true" height="82" name="DT then NN" width="90" x="179" y="34">
                <parameter key="datamanagement" value="double_array"/>
                <parameter key="data_management" value="auto"/>
              </operator>
              <operator activated="false" class="materialize_data" compatibility="9.2.000" expanded="true" height="82" name="kNN" width="90" x="246" y="238">
                <parameter key="datamanagement" value="double_array"/>
                <parameter key="data_management" value="auto"/>
              </operator>
              <operator activated="false" breakpoints="before" class="impute_missing_values" compatibility="7.3.001" expanded="true" height="68" name="Impute Missing Values (3)" width="90" x="380" y="187">
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="numeric"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="iterate" value="true"/>
                <parameter key="learn_on_complete_cases" value="true"/>
                <parameter key="order" value="chronological"/>
                <parameter key="sort" value="ascending"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
                <process expanded="true">
                  <operator activated="true" class="k_nn" compatibility="9.2.000" expanded="true" height="82" name="k-NN" width="90" x="112" y="34">
                    <parameter key="k" value="1"/>
                    <parameter key="weighted_vote" value="false"/>
                    <parameter key="measure_types" value="MixedMeasures"/>
                    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
                    <parameter key="nominal_measure" value="NominalDistance"/>
                    <parameter key="numerical_measure" value="EuclideanDistance"/>
                    <parameter key="divergence" value="GeneralizedIDivergence"/>
                    <parameter key="kernel_type" value="radial"/>
                    <parameter key="kernel_gamma" value="1.0"/>
                    <parameter key="kernel_sigma1" value="1.0"/>
                    <parameter key="kernel_sigma2" value="0.0"/>
                    <parameter key="kernel_sigma3" value="2.0"/>
                    <parameter key="kernel_degree" value="3.0"/>
                    <parameter key="kernel_shift" value="1.0"/>
                    <parameter key="kernel_a" value="1.0"/>
                    <parameter key="kernel_b" value="0.0"/>
                  </operator>
                  <connect from_port="example set source" to_op="k-NN" to_port="training set"/>
                  <connect from_op="k-NN" from_port="model" to_port="model sink"/>
                  <portSpacing port="source_example set source" spacing="0"/>
                  <portSpacing port="sink_model sink" spacing="0"/>
                </process>
              </operator>
              <operator activated="false" class="impute_missing_values" compatibility="7.3.001" expanded="true" height="68" name="Impute Missing Values" width="90" x="313" y="34">
                <parameter key="attribute_filter_type" value="value_type"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="nominal"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="iterate" value="true"/>
                <parameter key="learn_on_complete_cases" value="true"/>
                <parameter key="order" value="chronological"/>
                <parameter key="sort" value="ascending"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
                <process expanded="true">
                  <operator activated="true" breakpoints="before,after" class="concurrency:parallel_decision_tree" compatibility="9.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="380" y="34">
                    <parameter key="criterion" value="gain_ratio"/>
                    <parameter key="maximal_depth" value="20"/>
                    <parameter key="apply_pruning" value="true"/>
                    <parameter key="confidence" value="0.25"/>
                    <parameter key="apply_prepruning" value="true"/>
                    <parameter key="minimal_gain" value="0.1"/>
                    <parameter key="minimal_leaf_size" value="2"/>
                    <parameter key="minimal_size_for_split" value="4"/>
                    <parameter key="number_of_prepruning_alternatives" value="3"/>
                  </operator>
                  <connect from_port="example set source" to_op="Decision Tree" to_port="training set"/>
                  <connect from_op="Decision Tree" from_port="model" to_port="model sink"/>
                  <portSpacing port="source_example set source" spacing="0"/>
                  <portSpacing port="sink_model sink" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="impute_missing_values" compatibility="7.3.001" expanded="true" height="68" name="Impute Missing Values (2)" width="90" x="447" y="34">
                <parameter key="attribute_filter_type" value="value_type"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="numeric"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="iterate" value="true"/>
                <parameter key="learn_on_complete_cases" value="true"/>
                <parameter key="order" value="chronological"/>
                <parameter key="sort" value="ascending"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
                <process expanded="true">
                  <operator activated="true" breakpoints="before" class="neural_net" compatibility="9.2.000" expanded="true" height="82" name="Neural Net" width="90" x="179" y="34">
                    <list key="hidden_layers"/>
                    <parameter key="training_cycles" value="500"/>
                    <parameter key="learning_rate" value="0.3"/>
                    <parameter key="momentum" value="0.2"/>
                    <parameter key="decay" value="false"/>
                    <parameter key="shuffle" value="true"/>
                    <parameter key="normalize" value="true"/>
                    <parameter key="error_epsilon" value="1.0E-5"/>
                    <parameter key="use_local_random_seed" value="false"/>
                    <parameter key="local_random_seed" value="1992"/>
                  </operator>
                  <connect from_port="example set source" to_op="Neural Net" to_port="training set"/>
                  <connect from_op="Neural Net" from_port="model" to_port="model sink"/>
                  <portSpacing port="source_example set source" spacing="0"/>
                  <portSpacing port="sink_model sink" spacing="0"/>
                </process>
              </operator>
              <connect from_port="in 1" to_op="Impute Missing Values (2)" to_port="example set in"/>
              <connect from_op="Impute Missing Values (2)" from_port="example set out" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="601" y="506">Type your comment</description>
            </process>
          </operator>
          <operator activated="false" class="replace_missing_values" compatibility="9.2.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="442">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default" value="average"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="transpose" compatibility="9.2.000" expanded="true" height="82" name="Transpose (2)" width="90" x="715" y="85"/>
          <operator activated="true" class="generate_id" compatibility="9.2.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="849" y="85">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.2.000" expanded="true" height="82" name="Join" width="90" x="1050" y="136">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Transpose" to_port="example set input"/>
          <connect from_op="Transpose" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Transpose" from_port="original" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_op="Impute Missing Values (4)" to_port="example set in"/>
          <connect from_op="Impute Missing Values (4)" from_port="example set out" to_op="Transpose (2)" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Transpose (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
          <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


  • msacs09msacs09 Member Posts: 55 Contributor II
    Yes sir. This one of the problem I faced where the learners where complaining about no example set. Is this a BUG? 
  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Yes, it seems so. I also faced this. @sgenzer @mschmitz can you take a look at it?
  • msacs09msacs09 Member Posts: 55 Contributor II
    @IngoRM Can you please chime in for this BUG. Thank you very much for your time
  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    edited February 2019
    hi @msacs09 and all - there is no bug here. You have no examples because the "learn on complete cases" box is checked and the ExampleSet has no complete cases. Every example in this set has at least one missing attribute.

    FWIW I would strongly reconsider using Neural Nets to impute missing values on such a small data set. More traditional methods such as interpolation or k-NN would likely give you better results.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="9.2.000" expanded="true" height="68" name="Read Excel" width="90" x="179" y="85">
            <parameter key="excel_file" value="/Users/genzerconsulting/Desktop/Sample_Data_2_Impute.xlsx"/>
            <parameter key="sheet_selection" value="sheet number"/>
            <parameter key="sheet_number" value="1"/>
            <parameter key="imported_cell_range" value="A1"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="first_row_as_names" value="true"/>
            <list key="annotations"/>
            <parameter key="date_format" value=""/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="ID.true.integer.attribute"/>
              <parameter key="1" value="2017-01.true.integer.attribute"/>
              <parameter key="2" value="2017-02.true.integer.attribute"/>
              <parameter key="3" value="2017-03.true.integer.attribute"/>
              <parameter key="4" value="2017-04.true.integer.attribute"/>
              <parameter key="5" value="2017-05.true.integer.attribute"/>
              <parameter key="6" value="2017-06.true.integer.attribute"/>
              <parameter key="7" value="2017-07.true.integer.attribute"/>
              <parameter key="8" value="2017-08.true.integer.attribute"/>
              <parameter key="9" value="2017-09.true.integer.attribute"/>
              <parameter key="10" value="2017-10.true.integer.attribute"/>
              <parameter key="11" value="2017-11.true.integer.attribute"/>
              <parameter key="12" value="2017-12.true.integer.attribute"/>
              <parameter key="13" value="2018-01.true.integer.attribute"/>
              <parameter key="14" value="2018-02.true.integer.attribute"/>
              <parameter key="15" value="2018-03.true.integer.attribute"/>
              <parameter key="16" value="2018-04.true.integer.attribute"/>
              <parameter key="17" value="2018-05.true.integer.attribute"/>
              <parameter key="18" value="2018-06.true.integer.attribute"/>
              <parameter key="19" value="2018-07.true.integer.attribute"/>
              <parameter key="20" value="2018-08.true.integer.attribute"/>
              <parameter key="21" value="2018-09.true.integer.attribute"/>
              <parameter key="22" value="2018-10.true.integer.attribute"/>
              <parameter key="23" value="2018-11.true.integer.attribute"/>
              <parameter key="24" value="2018-12.true.integer.attribute"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" class="transpose" compatibility="9.2.000" expanded="true" height="82" name="Transpose" width="90" x="313" y="85"/>
          <operator activated="true" breakpoints="after" class="filter_example_range" compatibility="9.2.000" expanded="true" height="82" name="Filter Example Range" width="90" x="447" y="85">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="1"/>
            <parameter key="invert_filter" value="true"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="ID"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000" expanded="true" height="82" name="Generate ID" width="90" x="648" y="238">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="false" class="subprocess" compatibility="9.2.000" expanded="true" height="82" name="Subprocess" width="90" x="648" y="442">
            <process expanded="true">
              <operator activated="false" class="multiply" compatibility="9.2.000" expanded="true" height="68" name="Multiply" width="90" x="45" y="136"/>
              <operator activated="false" class="materialize_data" compatibility="9.2.000" expanded="true" height="82" name="DT then NN" width="90" x="179" y="34">
                <parameter key="datamanagement" value="double_array"/>
                <parameter key="data_management" value="auto"/>
              </operator>
              <operator activated="false" class="materialize_data" compatibility="9.2.000" expanded="true" height="82" name="kNN" width="90" x="246" y="238">
                <parameter key="datamanagement" value="double_array"/>
                <parameter key="data_management" value="auto"/>
              </operator>
              <operator activated="false" breakpoints="before" class="impute_missing_values" compatibility="7.3.001" expanded="true" height="68" name="Impute Missing Values (3)" width="90" x="380" y="187">
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="numeric"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="iterate" value="true"/>
                <parameter key="learn_on_complete_cases" value="true"/>
                <parameter key="order" value="chronological"/>
                <parameter key="sort" value="ascending"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
                <process expanded="true">
                  <operator activated="true" class="k_nn" compatibility="9.2.000" expanded="true" height="82" name="k-NN" width="90" x="112" y="34">
                    <parameter key="k" value="1"/>
                    <parameter key="weighted_vote" value="false"/>
                    <parameter key="measure_types" value="MixedMeasures"/>
                    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
                    <parameter key="nominal_measure" value="NominalDistance"/>
                    <parameter key="numerical_measure" value="EuclideanDistance"/>
                    <parameter key="divergence" value="GeneralizedIDivergence"/>
                    <parameter key="kernel_type" value="radial"/>
                    <parameter key="kernel_gamma" value="1.0"/>
                    <parameter key="kernel_sigma1" value="1.0"/>
                    <parameter key="kernel_sigma2" value="0.0"/>
                    <parameter key="kernel_sigma3" value="2.0"/>
                    <parameter key="kernel_degree" value="3.0"/>
                    <parameter key="kernel_shift" value="1.0"/>
                    <parameter key="kernel_a" value="1.0"/>
                    <parameter key="kernel_b" value="0.0"/>
                  </operator>
                  <connect from_port="example set source" to_op="k-NN" to_port="training set"/>
                  <connect from_op="k-NN" from_port="model" to_port="model sink"/>
                  <portSpacing port="source_example set source" spacing="0"/>
                  <portSpacing port="sink_model sink" spacing="0"/>
                </process>
              </operator>
              <operator activated="false" class="impute_missing_values" compatibility="7.3.001" expanded="true" height="68" name="Impute Missing Values" width="90" x="313" y="34">
                <parameter key="attribute_filter_type" value="value_type"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="nominal"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="iterate" value="true"/>
                <parameter key="learn_on_complete_cases" value="true"/>
                <parameter key="order" value="chronological"/>
                <parameter key="sort" value="ascending"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
                <process expanded="true">
                  <operator activated="true" breakpoints="before,after" class="concurrency:parallel_decision_tree" compatibility="9.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="380" y="34">
                    <parameter key="criterion" value="gain_ratio"/>
                    <parameter key="maximal_depth" value="20"/>
                    <parameter key="apply_pruning" value="true"/>
                    <parameter key="confidence" value="0.25"/>
                    <parameter key="apply_prepruning" value="true"/>
                    <parameter key="minimal_gain" value="0.1"/>
                    <parameter key="minimal_leaf_size" value="2"/>
                    <parameter key="minimal_size_for_split" value="4"/>
                    <parameter key="number_of_prepruning_alternatives" value="3"/>
                  </operator>
                  <connect from_port="example set source" to_op="Decision Tree" to_port="training set"/>
                  <connect from_op="Decision Tree" from_port="model" to_port="model sink"/>
                  <portSpacing port="source_example set source" spacing="0"/>
                  <portSpacing port="sink_model sink" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="impute_missing_values" compatibility="7.3.001" expanded="true" height="68" name="Impute Missing Values (2)" width="90" x="447" y="34">
                <parameter key="attribute_filter_type" value="value_type"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="numeric"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="iterate" value="true"/>
                <parameter key="learn_on_complete_cases" value="true"/>
                <parameter key="order" value="chronological"/>
                <parameter key="sort" value="ascending"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
                <process expanded="true">
                  <operator activated="true" breakpoints="before" class="neural_net" compatibility="9.2.000" expanded="true" height="82" name="Neural Net" width="90" x="179" y="34">
                    <list key="hidden_layers"/>
                    <parameter key="training_cycles" value="500"/>
                    <parameter key="learning_rate" value="0.3"/>
                    <parameter key="momentum" value="0.2"/>
                    <parameter key="decay" value="false"/>
                    <parameter key="shuffle" value="true"/>
                    <parameter key="normalize" value="true"/>
                    <parameter key="error_epsilon" value="1.0E-5"/>
                    <parameter key="use_local_random_seed" value="false"/>
                    <parameter key="local_random_seed" value="1992"/>
                  </operator>
                  <connect from_port="example set source" to_op="Neural Net" to_port="training set"/>
                  <connect from_op="Neural Net" from_port="model" to_port="model sink"/>
                  <portSpacing port="source_example set source" spacing="0"/>
                  <portSpacing port="sink_model sink" spacing="0"/>
                </process>
              </operator>
              <connect from_port="in 1" to_op="Impute Missing Values (2)" to_port="example set in"/>
              <connect from_op="Impute Missing Values (2)" from_port="example set out" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="601" y="506">Type your comment</description>
            </process>
          </operator>
          <operator activated="false" class="replace_missing_values" compatibility="9.2.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="442">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default" value="average"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="impute_missing_values" compatibility="9.2.000" expanded="true" height="68" name="Impute Missing Values (4)" width="90" x="581" y="85">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="real"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="iterate" value="true"/>
            <parameter key="learn_on_complete_cases" value="false"/>
            <parameter key="order" value="chronological"/>
            <parameter key="sort" value="ascending"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <process expanded="true">
              <operator activated="true" class="k_nn" compatibility="9.2.000" expanded="true" height="82" name="k-NN (2)" width="90" x="380" y="34">
                <parameter key="k" value="5"/>
                <parameter key="weighted_vote" value="true"/>
                <parameter key="measure_types" value="MixedMeasures"/>
                <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
                <parameter key="nominal_measure" value="NominalDistance"/>
                <parameter key="numerical_measure" value="EuclideanDistance"/>
                <parameter key="divergence" value="GeneralizedIDivergence"/>
                <parameter key="kernel_type" value="radial"/>
                <parameter key="kernel_gamma" value="1.0"/>
                <parameter key="kernel_sigma1" value="1.0"/>
                <parameter key="kernel_sigma2" value="0.0"/>
                <parameter key="kernel_sigma3" value="2.0"/>
                <parameter key="kernel_degree" value="3.0"/>
                <parameter key="kernel_shift" value="1.0"/>
                <parameter key="kernel_a" value="1.0"/>
                <parameter key="kernel_b" value="0.0"/>
              </operator>
              <connect from_port="example set source" to_op="k-NN (2)" to_port="training set"/>
              <connect from_op="k-NN (2)" from_port="model" to_port="model sink"/>
              <portSpacing port="source_example set source" spacing="0"/>
              <portSpacing port="sink_model sink" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="transpose" compatibility="9.2.000" expanded="true" height="82" name="Transpose (2)" width="90" x="715" y="85"/>
          <operator activated="true" class="generate_id" compatibility="9.2.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="849" y="85">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.2.000" expanded="true" height="82" name="Join" width="90" x="1050" y="136">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Transpose" to_port="example set input"/>
          <connect from_op="Transpose" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Transpose" from_port="original" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_op="Impute Missing Values (4)" to_port="example set in"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Impute Missing Values (4)" from_port="example set out" to_op="Transpose (2)" to_port="example set input"/>
          <connect from_op="Transpose (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
          <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    



    Scott
  • msacs09msacs09 Member Posts: 55 Contributor II
    edited February 2019
    Bummer sorry sir.. for being blind on this.. However, you raised a important point here that i need all of your expert guidance 

    So my actual example data set has 970 rows and out of that I only have 16 rows with complete records (i.e) without any missing values for 24 months periods, in such minority complete cases, what would be the best approach here? Would it is still makes sense to learn from complete records?

    When i try enabling learn from complete data set i get very very poor or rather completely wrong result here.. it is taking the client ID and filling in the missing values ??

    Further when I run by disabling learn from complete option and run the KNN regression, i just get the same value replicated across missing months as opposed to true imputation values--What approach can i take here to not have the same value repeated for missing months 

    Lastly, if i try to build a ensemble here, is there any way i can evaluate a imputation performance of various learners ??

    As Always Thank you for your valuable advice and time.


  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    edited February 2019
    hi @msacs09 so my advice is the same as what was taught to me in engineering school during the dinosaur age: simplify the problem. Try GLM or even just linear regression instead of k-NN. Try using a subset of attributes. Simplify the problem.

    One other thing to know is that you are using a pretty small data set. You're not going to get great results no matter what you throw at it. As we say here, "you cannot make a silk purse out of a sow's ear." :wink:

    Scott
  • msacs09msacs09 Member Posts: 55 Contributor II
    Thank you. Sorry for not being clear I was actually running the process using my whole 2000+ samples. 

    As you said KNN doesn't seem to work here. I will try working using a linear regression or probably use Replace Missing Values (Series) to interpolate these, since my dataset is a time series data .. I hope it makes sense.
Sign In or Register to comment.