The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Merging feature sets

vme64vme64 Member Posts: 10 Contributor II
edited November 2019 in Help
Hello!

  I developed a process that consists of the following steps:

1) Read dataset
2) Split in test and training set
3) Do a feature selection in the training set using SVM, Foward selection, X-Val and OptimizeParameters
4) Build a model using the selected parameters
4) Apply the resulting model (that is, the one generated with the best features) to the test set

  The problem is that SVM classifiers expect that the test set has exactly the same features that were used to build the model, otherwise the results are screwed up. But I did not manage to filter out the features of the test set that were not among the selected ones.

  Stating more concisely, given two different datasets A and B, where the features of B consists of a subset of the features of A, I need a dataset C that consist of the data contained in A but comprising only the features shared with B:

Dataset A
IDFeature 1Feature 2Feature 3
1392
2531
Dataset B
IDFeature 1Feature 3
3510
4129
2929
Dataset C
IDFeature 1Feature 3
132
251
  I am doing things this way (instead of using only X-Val) so as to guarantee that my test set is not used at all during the modelling process.

  If somebody has some clue of how doing this (or if I should do it another way) I will thank a lot!

Best regards,

  Vinicius
Tagged:

Answers

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    You could try the "Data to weights" and "Select by weights" operators. See the enclosed.

    regards

    Andrew
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="431" width="614">
          <operator activated="true" class="generate_data" compatibility="5.1.006" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30">
            <parameter key="number_of_attributes" value="50"/>
          </operator>
          <operator activated="true" class="generate_data" compatibility="5.1.006" expanded="true" height="60" name="Generate Data (2)" width="90" x="112" y="120">
            <parameter key="number_of_attributes" value="3"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.1.006" expanded="true" height="76" name="Select Attributes" width="90" x="112" y="210">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="|att3|att1"/>
          </operator>
          <operator activated="true" class="data_to_weights" compatibility="5.1.006" expanded="true" height="76" name="Data to Weights" width="90" x="246" y="210"/>
          <operator activated="true" class="select_by_weights" compatibility="5.1.006" expanded="true" height="94" name="Select by Weights" width="90" x="447" y="30"/>
          <connect from_op="Generate Data" from_port="output" to_op="Select by Weights" to_port="example set input"/>
          <connect from_op="Generate Data (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Data to Weights" to_port="example set"/>
          <connect from_op="Data to Weights" from_port="weights" to_op="Select by Weights" to_port="weights"/>
          <connect from_op="Select by Weights" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • vme64vme64 Member Posts: 10 Contributor II
    Hello,

      Thanks a lot for the answer and for the example! It worked and now I could complete my process.

    Best regards,

      Vinicius
Sign In or Register to comment.