First steps. Need help in clustering
hi,
I create a fictious dataset using Excel RANDBETWEEN function. The dataset is composed of 18000 rows and two columns. Columns A contains IDs with values ranging between 1 and 100. Column B contains an hypothetical expense amount between 0 and 50000 for each ID numbers except for ID number 100 whose column B corresponding expense range is narrower and comprised between 48000 and 50000.
Let’s suppose I don’t know how the dataset is composed and I’d wanted to see it there is one ore more IDs with anomaly concentration (I mean I would like the analysis to spot ID number 100 with its concentration between 480000 and 50000), what kind of analysis I should perform? I tried with clustering (k-means), but without success; probably I do not know the steps to follow to perform the analysis. Might somebody help me?
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornTry some of the operators in the anomaly detection methods available in the free extension of that name. LOF might be particularly useful in this type of context.1
Answers