The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Clustering Performance"
Hello:)
I m trying to perform image segmentation using rapidminer's clustering algorithms. Except K-means, who completes execution in aproximatelly 3-4 minutes, other methods (EM, k-medoids, Kernel k-means) never seem to converge (although on a Q6600 with 2GB, rapidminer never uses more than 30% of my cpu).
My data are simple features derived from pixels such as texture, magnitude, gradient etc all normalized to 0-1 (for each 300x400 image, a 300x400x3 feature matrix is extracted).
Do i need more powerfull cpu/memory or some kind of different normalization/preprocessing specifically for these algorithms??
Thnk you & sorry for the long msg (O>o)
I m trying to perform image segmentation using rapidminer's clustering algorithms. Except K-means, who completes execution in aproximatelly 3-4 minutes, other methods (EM, k-medoids, Kernel k-means) never seem to converge (although on a Q6600 with 2GB, rapidminer never uses more than 30% of my cpu).
My data are simple features derived from pixels such as texture, magnitude, gradient etc all normalized to 0-1 (for each 300x400 image, a 300x400x3 feature matrix is extracted).
Do i need more powerfull cpu/memory or some kind of different normalization/preprocessing specifically for these algorithms??
Thnk you & sorry for the long msg (O>o)
Tagged:
0
Answers
I don't think the problem is your computer, but compared to K-Means all other flat cluster methods take a factor equal to your number of examples longer to converge. That's because K-Means utilizes some neat properties of the euclidean distance measure to be faster.
So you might buy a faster computer, that would speed up the calculation a bit, but as you see on the workload, most of your cores are just doing nothing. So instead of buying a faster computer it would be more efficient to give us the money and let us implement a multi threaded version of the algorithms, so that it runs parallel. This would give you a speedup of factor 3 on your machine.
Another possibility would be to reduce the dimensionality of the examples for example using a PCA.
Greetings,
Sebastian