The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Clustering in rapidminer

nickanicka Member Posts: 8 Contributor II
edited April 2020 in Help
Hello.!! I make a project in rapidminer and I 've got a question. My question is how can I find the representative consumer based in demographic data after having clustered the group of consumers with criterion the ratings in products.??? I will be waiting for some help.I appreciate it if someone could help me.!! :):) 
Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,

    the clustering model contains a centeroid table. In this centeroid table you can see, what the center points of your cluster were. You might want to use them as representative (in the end the centeroid is the best representative of a cluster).

    If you want to have something like "What is most the most important attribute for Cluster X?" you might use the Cluster-ID as label for a supervised learning algorithm and then do a standard feature selection.


    Best,

    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • nickanicka Member Posts: 8 Contributor II
    Thank you very much...
  • nickanicka Member Posts: 8 Contributor II
    Thank you very much... !!Your help is really important.....I want to ask something else.....It is a question....what products should propose to a "new" customer for which only knows the assessment for a given product. The only data is given to us is the assessments for the products by the users... I think we should do something with recommendation system...How can I use recommendation systems in rapidminer, if this is the right way???? :):)
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    The basic question is, if you have a supervised learning problem.

    Do you have a data set where you have the "truth"? Than you can simply use a classificator.

    Otherwise you might want to find items which are usually bought together. Have a look at the FP-Growth operator and it's tutorial in this case.
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • nickanicka Member Posts: 8 Contributor II
    Another solution I have thought is that we can see in which cluster is that product (the product which customer has assessed) and we can recommend the products which are in this cluster....???
    One more question we did a classification and accuracy of classification is very low etc. 30%/+-15%, 50%/+-15% ... We have used naive bays, decision tree and K-nn but the accuracy is also low... What can we do to improve our model accuracy?????
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hello nicka,

    of course you can analyse the cluster belongings. The question is how to find the "important" attributes. If you use the cluster_id as a label you can use weight by svm to find the key attributes.


    For the classification problem. There are several typical things you do to optimize the performance:

    0. Feature Generation and preprocessing - E.g. converting dates to useful numbers, calculating differences etc.
    1. Feature Selection
    2. Choosing the different algorithm. I would try for: SVM (with different Kernels), Random Forest, Neural Net, Linear Regression, Boosted decision Tree, LDA..
    3. Optimizing the parameters of the algorithm (C for SVM is very very important).

    As described by the CRISP-DM (http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining ) cycle it is a cykle. so you might turn back to the data again.
    Data science is nothing like "do that and be happy". Good data science is kind of an art.

    Can you share the data and/or the processes? Than someone might have a look on it and give more detailed tips.


    Best,

    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • nickanicka Member Posts: 8 Contributor II
    Thank you..!!We have \downloaded (manually) from tripadvisor a number  recent reviews of a particular hotel.We  entered  the data in an excel file for each review note if positive, negative or neutral based on the rating given by the user himself (negatively consider ratings with values 1-2, 3, 4-5 neutral and positive).
    1. W should apply text processing functions that will lead to the largest possible reduction in the number of features (words) describing the vector reviews, 2 We should develop model classification which can rank (classify) the three categories new paradigms reviews (positive, negative, neutral) and evaluate the accuracy of classification by trying different algorithms. Which choice we should select for your recommendationsin order to optimize the performance of the model?????
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Ahhh it's text classification!

    Then i would try 3 different algorithms: Radial SVM, k-NN with cosine similarity and naive bayes.

    Did you use stemming and pruning?
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • nickanicka Member Posts: 8 Contributor II
    We can find SVM butv we can't find Radial SVM .....How can we select SVM Radial???Is there a choice somewhere??? The same applies for K-NN with cosine similarity...How can we choose cosin similarity???? We have chosen operator stemming....Pruning is an operator????
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Radial means that you use a radial kernel. So simply change the option to radial
    cosine similarity as a distance measurement. When using k-NN you need to define one. Cosine similarity works quite good on text data.

    Pruning is an option of process documents.
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.