DBSCAN taking very long time

moritz_moeller · January 2019

Hello there,

I am currently trying to do a cluster analysis with DBSCAN. Since it is my first time to either do a clusteranalysis or using DBSCAN I only have knowledge from papers and online documents. But maybe someone of you is able to help me out:

I am analyzing a kind of huge amount of data (I know it's relative). It's 10 columns and around 6 million rows. I am selecting attributes, filter them, normalize and then put them into the dbscan clustering. My parameters are epsilon=0.5 and minpts=4. I want to look at 2 attributes at a time since I'll compare it to k-means.

But the problem is that it already takes over an hour to preprocess the data (there is the loading circle on the clustering part) before it even starts to go from 1 to 100. Is there anything I can change in my process that would maybe make it faster? Perhaps there are some beginner mistakes involved which is quite likely..

Thanks for your answers and have a nice day.

EDIT: I have 64GB of RAM and the process uses around 32GB at the moment. I put the maximum to 50GB. In addition I can say that I only have numeric attributes

MartinLiebig · January 2019

Hi Moritz,

i guess 6M rows are just a lot for this.. If i remember correctly the runtime is in O(n²).

BR,

martin

moritz_moeller · January 2019

Well it seems like you're correct. I am working with only a range of my rows now and the runtime is fairly lower.

Thanks for the answer, I assume that this is the correct one.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

DBSCAN taking very long time

Best Answer

Answers