The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
newbie requires advice to select a clustering algorithm
Hello,
I discovered RapidMiner yesterday after several hours of research into data clustering (it looks very nice and friendly). I need a little bit of help in selecting an algorithm for what is most likely a simple case.
I have a series of events that happened in time at irregular intervals. I would like to determine which of those events are in clusters, where in my case a cluster is composed by those adjacent events that were closer in time than a given threshold. The time span of the cluster does not matter (so 3 events at 10 seconds apart or 20 events at 1 minute interval are still valid clusters, I only care about the distance between two succesive events).
From what I've read so far, k-means and its variants are not appropriate since they require the user to specify how many clusters are desired. I don't know how many there are and, in this case, their number is in fact an output of the analysis, not an input.
Any guidance is appreciated.
Thanks,
-jl
I discovered RapidMiner yesterday after several hours of research into data clustering (it looks very nice and friendly). I need a little bit of help in selecting an algorithm for what is most likely a simple case.
I have a series of events that happened in time at irregular intervals. I would like to determine which of those events are in clusters, where in my case a cluster is composed by those adjacent events that were closer in time than a given threshold. The time span of the cluster does not matter (so 3 events at 10 seconds apart or 20 events at 1 minute interval are still valid clusters, I only care about the distance between two succesive events).
From what I've read so far, k-means and its variants are not appropriate since they require the user to specify how many clusters are desired. I don't know how many there are and, in this case, their number is in fact an output of the analysis, not an input.
Any guidance is appreciated.
Thanks,
-jl
0
Answers
if each of your example is marked with the point in time, when the even occurs, you might use the Agglomerative Clustering with single link. If you only cluster on the time (mar each other attribute special or remove it), you will get a dendrogram, showing which events are combined into one cluster and which distance is between them.
Greetings,
Sebastian