The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Text Similarity Detection

statsprostatspro Member Posts: 2 Contributor I
edited August 2019 in Help
Hi  I am  beginner to Rapid Miner. I want to use the Data to Similarity operator to check the text similarity but my problem is little different. I have a Excel file which has 2 columns (UserID & Review) and I want to check the text similarity of common userid.
For example - I have userid's A1, B1, C1, A1, B1, A1, B1, A1..etc now I want to check the text similarity of reviews given by A1  only.

UserID      Review
A1            I love McDonald
B1            McDonald is bad
C1            I love McDonald in Newyork
A1            I love McDonald
B1            abc love McDonald
A1            I love McDonald when I was in Paris.
B1            My Experience of McDonald is Pathetic
A1            I love it

I would appreciate if anyone can help me on it....

Thanks,
Arun

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Arun,

    use the Loop Values opeator to loop the different UserIDs. Inside the loop, use Filter Examples to filter only the examples of the current user, then apply Data to Similarity.

    Best regards,
    Marius
  • veveveve Member Posts: 63 Contributor II
    Hello,

    What similarity should be used in the "data to similarity" componenet  in the case mentioned before?

    Thank you in advance..
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    For text data often the CosineSimilarity (in Numerical Measures) is a good choice.

    Please remember to convert the texts to TF/IDF values or another suitable measure using the Process Documents operators from the text processing extension. Otherwise RapidMiner does not "understand" the unmodified/unprepared texts.

    Best regards,
    Marius
Sign In or Register to comment.