The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Naive Bayes Classification of multiple rows

iinnaanncciinnaanncc Member Posts: 3 Contributor I
edited November 2018 in Help
Hello everyone,

I am making a naive bayes classification process for some data in RapidMiner. I have a training data to construct a model which has some thousands of rows in the following format.

label  attribute  attribute  attribute  attribute  attribute  attribute 

When I want to classify another data which has 3 rows and has following format:

attribute  attribute  attribute  attribute  attribute  attribute 

In this case, everything runs normally and I get a prediction for each row according to naive bayes classification results. (in total I get 3 predictions)

But my question is following: What if I assume that these 3 rows belongs to same category and therefore, I want to get only 1 prediction in total by using these three rows. How can I manage that? Please help me.

I hope I could explain myself.

Thanks in advance,
iinnaanncc

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi iinnaanncc,

    you can't tell Naive Bayes to give the same label to three rows, but you could classify all three rows separately (as you are doing now), and then return the label which appears most frequently. You can use the Aggregation operator with the "mode" aggregation function as in the example process below.

    Best,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.017">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
        <process expanded="true" height="633" width="743">
          <operator activated="true" class="generate_data" compatibility="5.1.017" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30">
            <parameter key="target_function" value="polynomial classification"/>
            <parameter key="number_examples" value="1000"/>
          </operator>
          <operator activated="true" class="naive_bayes" compatibility="5.1.017" expanded="true" height="76" name="Naive Bayes" width="90" x="246" y="30"/>
          <operator activated="true" class="generate_data" compatibility="5.1.017" expanded="true" height="60" name="Generate Data (2)" width="90" x="112" y="120">
            <parameter key="target_function" value="polynomial classification"/>
            <parameter key="number_examples" value="3"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.017" expanded="true" height="76" name="Apply Model" width="90" x="380" y="75">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.1.017" expanded="true" height="76" name="Aggregate" width="90" x="514" y="30">
            <list key="aggregation_attributes">
              <parameter key="prediction(label)" value="mode"/>
            </list>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Generate Data (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.