The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Clustering high-dimensional data

mskhmskh Member Posts: 13 Learner I
edited March 2019 in Help
Hi,
I try to use DBSACN to detect the outlier in my data set but it is difficult to set the parameters (epsilon,min points). Does anyone have an idea to solve the problem? it is possible to consider two clustering algorithms and each algorithm only consider sub-attributes of data set and i detect the outlier based on the results of two clustering algorithm?
Thanks 

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    DBSCAN is definitely one of those algorithms where you need to have domain expertise to set the parameters properly to get good results.  You might want to try a simpler clustering algorithm first like k-means or hierarchical.
    If you want to try to use two clustering algorithms based on different attributes, you'll need to multiply/split your dataset and feed one set of attributes to the first algorithm and a different set of attributes to the second algorithm, get the assigned clusters, and then join the two datasets back together again to compare.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • mskhmskh Member Posts: 13 Learner I
    Thank you so much @Telcontar120.
    I am a beginner in rapidminer. Will the results be different, if i use two clustering algorithms based on different attributes instead of one clustering algorithm on those attributes?
    Thank you
  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    It is not possible to give a definitive answer without seeing the data, but in general you would not necessarily expect to get the same results if you are using different subsets of attributes or different clustering algorithms.  You mentioned that you wanted to do this in your earlier comments, that is all.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.