The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Clustering in rapidminer
Hello.!! I make a project in rapidminer and I 've got a question. My question is how can I find the representative consumer based in demographic data after having clustered the group of consumers with criterion the ratings in products.??? I will be waiting for some help.I appreciate it if someone could help me.!!
Tagged:
0
Answers
the clustering model contains a centeroid table. In this centeroid table you can see, what the center points of your cluster were. You might want to use them as representative (in the end the centeroid is the best representative of a cluster).
If you want to have something like "What is most the most important attribute for Cluster X?" you might use the Cluster-ID as label for a supervised learning algorithm and then do a standard feature selection.
Best,
Martin
Dortmund, Germany
Do you have a data set where you have the "truth"? Than you can simply use a classificator.
Otherwise you might want to find items which are usually bought together. Have a look at the FP-Growth operator and it's tutorial in this case.
Dortmund, Germany
One more question we did a classification and accuracy of classification is very low etc. 30%/+-15%, 50%/+-15% ... We have used naive bays, decision tree and K-nn but the accuracy is also low... What can we do to improve our model accuracy?????
of course you can analyse the cluster belongings. The question is how to find the "important" attributes. If you use the cluster_id as a label you can use weight by svm to find the key attributes.
For the classification problem. There are several typical things you do to optimize the performance:
0. Feature Generation and preprocessing - E.g. converting dates to useful numbers, calculating differences etc.
1. Feature Selection
2. Choosing the different algorithm. I would try for: SVM (with different Kernels), Random Forest, Neural Net, Linear Regression, Boosted decision Tree, LDA..
3. Optimizing the parameters of the algorithm (C for SVM is very very important).
As described by the CRISP-DM (http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining ) cycle it is a cykle. so you might turn back to the data again.
Data science is nothing like "do that and be happy". Good data science is kind of an art.
Can you share the data and/or the processes? Than someone might have a look on it and give more detailed tips.
Best,
Martin
Dortmund, Germany
1. W should apply text processing functions that will lead to the largest possible reduction in the number of features (words) describing the vector reviews, 2 We should develop model classification which can rank (classify) the three categories new paradigms reviews (positive, negative, neutral) and evaluate the accuracy of classification by trying different algorithms. Which choice we should select for your recommendationsin order to optimize the performance of the model?????
Then i would try 3 different algorithms: Radial SVM, k-NN with cosine similarity and naive bayes.
Did you use stemming and pruning?
Dortmund, Germany
cosine similarity as a distance measurement. When using k-NN you need to define one. Cosine similarity works quite good on text data.
Pruning is an option of process documents.
Dortmund, Germany