The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Negative weights
Hello,
I built a decision tree classification model, and I would like to know how each of the attributes is contributing to generating the obtained model. I connected the weights port to res port, I obtained positive and negative weight values; all of them in the range [-1,+1]. What do mean negative weights?
Does it mean that what does matter is the absolute value?
Tagged:
0
Answers
Hi tayebasta,
in general the weights are the collected value an Attribute delivered throughout the model with regards to the chosen criterion (e.g. information gain).
I'd like to have a look into that, could you please share your process if possible, or describe it a bit?
You can just provide the .rmp file or copy the XML code here. To gain access go to "View -> Show Panel -> XML". This opens up a new view containing the XML code of your process. BTW: Did you know, that you can just drag & drop (or ctrl + v) XML code into your standard process view to copy a process into studio?
Regards,
Philipp
Hi @pschlunder
I meet the same thing (negative weights)
You can find my process here : (the training set in attached file) :
Regards,
Lionel
Hello,
I read Weight by Correlation in Rapidminer documentation:
"A correlation is a number between -1 and +1 that measures the degree of association between two attributes (call them X and Y). A positive value for the correlation implies a positive association. In this case, large values of X tend to be associated with large values of Y and small values of X tend to be associated with small values of Y. A negative value for the correlation implies a negative or inverse association. In this case, large values of X tend to be associated with small values of Y and vice versa." Does this apply to attribute weights in modeling (classification)?
Hi @lionelderkrikor,
could you please provide a data set. I can't reconstruct the problem without the data you're using.
Thanks
@tayebasta Regarding the association to correlations: If we'd considere the negative weights to be reasonable, it would imply, that a split using this Attribute would worsen the decision. But I'd really like to see your process to investigate more. Are you providing a column with the role weight, that uses negative weights?
Regards,
Philipp
Dear Mr. Philip,
Please find attached a copy of processes rmp file.
Regards,
Dear Mr. Philip,
I exported the model from my home laptop to my office computer and run it on the same data. On both computers, I am using Rapidminer version 8. The only difference if the PC at home is 64 bit and the one in office is 32.
All weights I've got this time are positive. They are attached to this message.
I'll go home and compare and see what's up.
Regards,
Basta
Hi @pschlunder
Thanks you for your feedback.
effectively, i forgot to attach my entry data set.
You can find it in attached file, this time.
Best regards,
Lionel
Hi @lionelderkrikor,
sorry I still can't reproduce your problem, since your process doesn't define a label, but uses label requireing Operators like 'Weight by Information Gain'.
@tayebasta looking forward to your findings.
Regards,
Philipp
Hi @pschlunder
The process work fine on my computer, but previously I did not specify that my label is set on the parameter data set meta data information of Read Excel operator :
then :
I hope the process will running with these informations.
Thanks you,
regards,
Lionel
Sorry, my bad :smileylol:
I just loaded the data straight into Studio and replaced your Read Excel Operator with Retrieve >.<
@lionelderkrikor your process uses a SVM applied to a binominal classification problem. Internally one label value is seen as true and one as false. So when obtaining positive weights, they imply a relevance with regards to the true label value (in your case Gender = M), hence the highest positive weight is the word 'husband'. While the highest negative weight occurs for the token 'wife'. This value is a strong sign, that it is not the true label value, hence Gender = F. For that case the weight can be seen as something similar to a correlation as @tayebasta suggested.
Regards,
Philipp
Hi,
Thanks to you @tayebasta, @pschlunder for your explanations. It's much clearer to me.
Best regards,
Lionel