The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"context/feature based opinion mining/sentiment analysis"

alexjohnpalexjohnp Member Posts: 1 Learner III
edited June 2019 in Help
Hello everybody,
I'm pretty new to Rapidminer, and I'm stuck on the following problem.
I managed to build a simple sentiment classifier following the Pang's theory and the examples on the Internet (especially those on vancouverdata). Now i'd like to extend the concept by extracting the specific features (n-grams) and showing their sentiment score.
For example, let's have the following phrase: "the camera has a pretty good focus, but its flash lacks of speed".  I have the two features focus (positive), and flash (negative).
Could you help me get through the pain?
Thank you in advance,

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    I'm sure there are many ways to look at this. 
    If your mining is English examples separated by commas, then it's straightforward.  You just split on the comma. 
    Let's assume that you don't have that luxury, however I am going to assume that you have the posts all on the one subject.
    So for example:
    "the camera has a pretty good focus but its flash lacks of speed"
    "The Canon Sureshot has a pretty good focus and flash, but tastes awful without ketchup."
    "I've always liked the focus on my Canon, but really think the lightmeter is poor."

    I'd suggest the following approach (others may disagree):
    First I'd add an ID so you can split up the documents in many ways, but still combine them again later.
    • 1: build a list of N-Grams (4-5 max terms long seems about right)
    • 2.1: build a list of features of the subject (flash, focus, shutter, lens, etc).
    • 2.2: build a list of positive & negative terms for labelling. e.g postive: good,pretty,
    • 3.1: eliminate any N-Grams that contain more than one feature.
        (this is where I think my approach is wrong)
        do you remove "pretty good focus and flash" and just keep "pretty good focus"?
    • 3.2: eliminate any N-Grams that contain conflicting sentiment (e.g. keep "good focus but bad flash", do not keep "good focus but bad"
      • 4: build a sentiment mining model from the N-Grams
      • 5: have a look on the most positive / least positive words in the N-Grams (that aren't features) and see if they should be added to the labelling in step 2.2
      After repeating this process a few times on the sample data it should be possible to join your N-Grams up with your list of features to show what the overall sentiment balance is for the individual
      e.g. focus 30 / 45 / 25  (positive, negative, neutral). 

      I won't put together a sample process though as I think there are probably better ideas than mine on here. 
  • puteri_prameswaputeri_prameswa Member Posts: 3 Contributor I
    Dear Alexjohnp,

    I am using RapidMiner for my final thesis about feature-based sentiment analysis and I face the same problem like you. However I would like to know if you already find ways to solve it.

    Could you explain it to me?

    Also thanks JEdward for sharing.

    Thank you so much.
Sign In or Register to comment.