The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Strange result from Naive Bayes classifier
Hello,
First of all, thank you so much to contribute this great DM tool ... you, guys, are so great
I'm new to DM .. and try using RM .. I'm trying to use Naive Bayes to predict whether new customer with a particular profile will/will not buy the product. I have set up the model like this ..
Quote
<operator name="Root" class="Process" expanded="yes">
<operator name="TrainingSet" class="DatabaseExampleSource">
<parameter key="database_url" value="jdbc:mysql://localhost:3306/insurance"/>
<parameter key="username" value="xxx"/>
<parameter key="password" value="xxx"/>
<parameter key="query" value="select * from customer;"/>
<parameter key="label_attribute" value="CARAVAN"/>
<parameter key="classes" value="buy not_buy"/>
</operator>
<operator name="NaiveBayes" class="NaiveBayes">
</operator>
<operator name="TestSet" class="DatabaseExampleSource">
<parameter key="database_url" value="jdbc:mysql://localhost:3306/insurance"/>
<parameter key="username" value="xxx"/>
<parameter key="password" value="xxx"/>
<parameter key="query" value="select * from customer_eval;"/>
<parameter key="label_attribute" value="CARAVAN"/>
<parameter key="classes" value="buy not_buy"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
</operator>
It works without error but in data view field: confidence(buy) and confidence(not_buy) return '?' as a result for each data record ..
Can anybody give me any clues to my error?
Thank you so much
Pupu.
and here is haddock reply
First of all, thank you so much to contribute this great DM tool ... you, guys, are so great
I'm new to DM .. and try using RM .. I'm trying to use Naive Bayes to predict whether new customer with a particular profile will/will not buy the product. I have set up the model like this ..
Quote
<operator name="Root" class="Process" expanded="yes">
<operator name="TrainingSet" class="DatabaseExampleSource">
<parameter key="database_url" value="jdbc:mysql://localhost:3306/insurance"/>
<parameter key="username" value="xxx"/>
<parameter key="password" value="xxx"/>
<parameter key="query" value="select * from customer;"/>
<parameter key="label_attribute" value="CARAVAN"/>
<parameter key="classes" value="buy not_buy"/>
</operator>
<operator name="NaiveBayes" class="NaiveBayes">
</operator>
<operator name="TestSet" class="DatabaseExampleSource">
<parameter key="database_url" value="jdbc:mysql://localhost:3306/insurance"/>
<parameter key="username" value="xxx"/>
<parameter key="password" value="xxx"/>
<parameter key="query" value="select * from customer_eval;"/>
<parameter key="label_attribute" value="CARAVAN"/>
<parameter key="classes" value="buy not_buy"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
</operator>
It works without error but in data view field: confidence(buy) and confidence(not_buy) return '?' as a result for each data record ..
Can anybody give me any clues to my error?
Thank you so much
Pupu.
and here is haddock reply
Hi there,
Firstly welcome to the dataminers' asylum! On your prob what happens if you apply the model on the training set, do you still get a row of ?'s in the prediction columns? Just disable your second database call to check it out. Make sure to tick "keep example set" in the learner.
? usually represents a missing value, so I'm pondering what got learnt.. The setup looks fine so something murky is going on. I take it you've checked the training set and such.
0
Answers
I apply model on the training set .. all values in prediction field are 'not_buy' and confidence(buy)/confidence(not_buy) are '?' for all records
I have checked the training set .. there are no missing value but it is unbalanced like 6% is buy and 94% is not_buy ... is the unbalance matter relevant to my problem?
Thank you very much
Pupu.
I think that is probably the cause of your problem, try balancing it up so it is more even. Why not get 50 buy records and 50 no_buy records and do a merge? Hope you get better results, get back if that doesn't do the trick.
Onward, full ahead through the fog...
NaiveBayes has indeed problems with unbalanced examplesets. But this should not result in unkown confidence values. A more critical question on that issue: How many attributes does your exampleset contain?
Greetings,
Sebastian
well, Sebastians question will indeed be essential here. Unfortunately, Naive Bayes did produce unknown confidence values for data sets with a high number of attributes. We have robustified Naive Bayes regarding that issue - but after the release of version 4.4 of the Community Edition. The recent automatically delivered RapidMiner Enterprise Edition update already contains that bugfix. It will of course also be part of the next Community Edition release, which is probably about to come in a couple of weeks.
If you like (and there are no privacy issues) you can send us a data sample and we can check if it works on the most recent developer version. If there should be any privacy issues and you need a solution very urgently, we could also build you a custom version for once. Just drop us a note.
Kind regards,
Tobias
Thank you so much for your replies.
To haddock,
I tried what u suggest .. i split the data set to 50 for 'buy' and 50 for 'not_buy' ... Naive bayes still produce '?' for confidence value and prediction result is 50% correct.
To Land,
The data set has 85 attributes ... should I try feature/attribute subset selection before apply Naive Bayes?
To Tobias Malbrecht,
There is no privacy here ... actually it is data set from KDD cup '98 .. How can I send u the dataset?
Best regards,
Pupu.
I just forget to tell that your examples are very useful to me ..
Best regards,
Pupu.
As you mentioned about number of attributes .. i do "select <some fields> from table"
those confidence value are shown now ..
I'm finding the way to do something with unbalanced data .. >:( (Cheers myself)
Thank you so much everyone.
Pupu.