The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

PSO showing error "incompatible number of attributes (821! = 4260)!"

nehaneha Member Posts: 7 Contributor II
edited November 2018 in Help

hello,

I am using RM for the sentiment analysis of the movie review dataset. i have tokenised the sentiments and have calculated the term frequency and TF-IDF for the words. for classification want to use 10-fold cross validated SVM-PSO but after 11th execution the tool returns the error "incompatible number of attributes (821! = 4260) !". 

plz help

Tagged:

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Solution Accepted

    Hi Neha,

     

    the problem is that the dataset you apply it on has not the same (or not enough) attributes. Are you sure that you passed over the wordlist from one process documents to the other?

     

    You might have a look at this Knowledgebase post i just created for this: http://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Text-Mining-and-the-Word-List/ta-p/31723

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • nehaneha Member Posts: 7 Contributor II

    Thanks for your reply.

    sorry for replying late. Does the execution of PSO takes a long time ?? And in case of n-fold cross-validation, it is defined as (n-1) iterations are for training and nth iteration is for testing. In RM, the validation occurs for (n+1) times. Why is it so??

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Hi Neha,

     

    Let me refer you this discussion on X-validation: http://community.rapidminer.com/t5/RapidMiner-Studio/What-about-n-models-generated-in-cross-validation-Should-we-not/m-p/31649

     

    There is another link in solution that has an even more detailed thread about X-validation.

  • nehaneha Member Posts: 7 Contributor II

    hi, 

     

    i have come to end of my work. after preprocessing, transforming tokens into TF and TF-IDF and classifying the generated unigram, bigram and trigram terms using SVM and SVM-PSO on five kernel types - dot, radial, polynomial, epachenikov and anova, i have found that for bigram and trigrams on all kernel types it gives same accuracy.

     why is it so???

     


     

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist

    Hi,

    i would argue that the statistical significance of bigrams and trigrams are so small, that they are negleted by the SVM. Simply not enough occuruces of the bi/tri grams.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • nehaneha Member Posts: 7 Contributor II

    why does the different kernel type do not show any change??.

    in my work i hav used SVM and SVM-PSO. both are showing similar response for accuracy on different kernels. why??

     

  • nehaneha Member Posts: 7 Contributor II

    hi,

           

             plz reply!

     

    thank you.

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist

    Hi Neha,

     

    this is I think a difficult question and belongs definitly into the area of computer science. There is a overview paper by Kaestner etl al where also a linear kernel works quite well. I read some time ago, that the RBF kernel is worse because it introduces smearing which is bad in text mining - i forgot to save the bookmark.

    If i remember correctly @land told me once that kernels are not good because they add additional degrees of freedom, which result usually in overfitting because d>>n.

     

    Anyway - that would only support why kernels are not better than non kernel. Maybe @IngoRM or @RalfKlinkenberg can help, they have some more expierence with SVM-Theory.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • nehaneha Member Posts: 7 Contributor II

    hi sir,

     

           thank you for ur reply.

     

          Sir it is said and observed that SVM-PSO performs better than SVM reason being that PSO optimises and overcomes the drawbacks of SVM. what are these drawbacks of SVM overcomed by the PSO? What parameter are optimised by PSO in SVM ??? What actually makes SVM-PSO better than SVM??? i have implemented SVM and SVM-PSO for analysing the sentiments.

     

         Eagerly waiting for ur early reply.

     

    Thank You.

  • nehaneha Member Posts: 7 Contributor II

    Hi Sir,

     

          Thank You for ur reply.

     

          how can my doubt reach @IngoRM or @RalfKlinkenberg??

         

          plz help.

     

    Thank You.

Sign In or Register to comment.