The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Use an own dictionary for Slang synonyms

na020na020 Member Posts: 1 Learner III
edited April 2020 in Help

Hey guys,

i'm currently trying to create my own dictionnary for some slangwords. The whole task is about implementing an automatic Music Lyrics Analyzer. The MLA uses text analytics methods based on an established platform to analyze the vocabulary used in song lyrics of different interpreters / genres and.build clusters of songs based on their lyrics. In some genres like reggae, there are many slangwords which I try to replace by normal englisch (create my own dictionary). Which operators and which file-type should I choose for the task?

Thanks

 

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Based on what you described, "replace tokens" is probably your best choice.  That will allow you to map all synonyms into a single consistent token for your subsequent analysis.  You can also filter out some custom words entirely with a text filter, either by using "filter tokens" or by "filter stopwords - dictionary", both of which allow you to supply user input to tell RapidMiner which tokens to omit.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist

    Hi,

     

    another option would be to use Replace (Dictionary) from RM core. That would need some regexes.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.