The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Question about SSE for DBSCAN in WEKA and RapidMiner
IngoRM
Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
Original message from SourceForge forum at http://sourceforge.net/forum/forum.php?thread_id=2043261&;forum_id=390413
I have a question about DBSCAN clustering algorithm in RapidMiner.What does whole procedure look like? My answer is: ExcelExampleSource => DBScanClustering => ItemDistributionEvaluator(Sum of Squares), I can not get right answer from it, howerver, when I use WEKA,the answer is perfect(under same parameters).
Answer by Ingo Mierswa:
Hello,
first things is, that I am not sure if something like the ItemDistributionEvaluator is available at all in Weka? Isn't a density based evaluation more appropriate?
Second thing is, that Weka performs some preprocessing before DBScan (at least normalization and missing values handling) which must be added before the RapidMiner operator for DBScan to get similar results. Although this seems to be more work for the analyst, he or she can at least exactly control the preprocessing and the way it is performed.
Third thing is, that there seems to be a bug in the Weka DBScan algorithm (please refer to https://list.scms.waikato.ac.nz/pipermail/wekalist/2007-July/010784.html ) and maybe this explains the differences for you, too.
Hope that helps to get some insight into the possible reasons for the difference.
Cheers,
Ingo
I have a question about DBSCAN clustering algorithm in RapidMiner.What does whole procedure look like? My answer is: ExcelExampleSource => DBScanClustering => ItemDistributionEvaluator(Sum of Squares), I can not get right answer from it, howerver, when I use WEKA,the answer is perfect(under same parameters).
Answer by Ingo Mierswa:
Hello,
first things is, that I am not sure if something like the ItemDistributionEvaluator is available at all in Weka? Isn't a density based evaluation more appropriate?
Second thing is, that Weka performs some preprocessing before DBScan (at least normalization and missing values handling) which must be added before the RapidMiner operator for DBScan to get similar results. Although this seems to be more work for the analyst, he or she can at least exactly control the preprocessing and the way it is performed.
Third thing is, that there seems to be a bug in the Weka DBScan algorithm (please refer to https://list.scms.waikato.ac.nz/pipermail/wekalist/2007-July/010784.html ) and maybe this explains the differences for you, too.
Hope that helps to get some insight into the possible reasons for the difference.
Cheers,
Ingo
0