The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Problem with stopwords(dictionary)

kersorkersor Member Posts: 26 Maven
edited August 2019 in Help
hi everyone,

i want to filter some txt files and remove some useless words.i use the process filterstopwords Dictionary(greek words).but the problem is that the words that i want to remove are there after the filtering.I use utf 8 for encoding and all the txt files are in utf 8. firstly, my txt files were in  ANSI encode and the stopwords were removed but the wordlist contained incomprehensible words.Now the word list (with utf8) is correct but the stopwords are still there.sorry for my Engish.

Thanks!!

Answers

  • nerynery Member Posts: 1 Learner III
    I'm having exactly the same problem (with Portuguese text). Have you found a solution yet? Thanks, n.
  • kersorkersor Member Posts: 26 Maven
    No i didn;t find a solution.

    A part solution is to tranform the portuguese letters into English.(with replace tokens)

    for example the greek word συμφωνώ transformed into simfono.

    With this the problem solved.

    ut you have to do this again in the classification problems.If you want any further information  just tell me

    Regards

Sign In or Register to comment.