The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Help - Clustering?"
JEdward
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
I'm very new to this datamining lark so apologies in advance.
I have a example set containing only "yes" data & I have been asked to score records in a new example set based on their similarity to records in the "yes" set. ??? - I don't really know what I'm doing, but I have a feeling clustering might be involved somehow. So far though all I have done is create clusters using the "yes" set and then labelled the new records with a prediction on which cluster they would fall into.
Not quite what I'm after; the desired result is to give each record a label from 1 to 10 indicating how close that record is a match it is to the "yes" set.
Any pointers would be appreciated.
Thanks,
JEdward
I have a example set containing only "yes" data & I have been asked to score records in a new example set based on their similarity to records in the "yes" set. ??? - I don't really know what I'm doing, but I have a feeling clustering might be involved somehow. So far though all I have done is create clusters using the "yes" set and then labelled the new records with a prediction on which cluster they would fall into.
Not quite what I'm after; the desired result is to give each record a label from 1 to 10 indicating how close that record is a match it is to the "yes" set.
Any pointers would be appreciated.
Thanks,
JEdward
Tagged:
0
Answers
well, this sounds (if I got it right) like a scenario where a 1-class modeling might be most appropriate. You could try the 1-class SVM offered by RapidMiner. First you model the "yes"-data set and afterwards you just apply the trained model on your prediction data set. Afterwards you can rescale the predictions from [0-1] to [1-10] and round it to integers. That's it.
Cheers,
Ingo
That sounds exactly what I'm looking for, I'll give it a try.
JEdward.
On trying to store the labelled data to the repository I receive a 'ConcurrentModificationException' error.
I think this is caused by the ApplyModel process creating two special attributes 'confidence(inside)' and 'prediction(LabelT)' as this is the only thing that changes between the original dataset.
Can anyone point me in the right direction to resolve this?
Thanks,
JEdward.
please post the process as well as the stack trace for this exception. We will see if we can help you.
Greetings,
Sebastian
Here's the process attached. Not sure what you mean by stack trace. Is this it? (copied from the log window). Thanks,
JEdward
I have solved the problem by changing the process to rename the attribute confidence(inside). Could it be that the brackets in the name that caused the store operator problems? I had to write the field names into the Rename & SelectAttributes operators because they are not available from the menus & drop down lists after being created by Apply Model.