The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

words containing UMLAUTE in Text Mining

Thesis_12Thesis_12 Member Posts: 1 Learner III
edited November 2018 in Help
Dear all,

apparently Rapid Miner is not able to search for certain words containing German Umlaute such as ä,ö,ü or also ß. When I search for the word "Änderung" in "regular expression" (in "Filter Tokens by Region" /condition: "contains match") it doesn't show any results.
I use version 5.3.005 on a Mac and am working with HTML documents. I know that the problem described above does not occur with an older version and Windows.

However, I need to solve this problem with version 5.3.005 on a mac.

I tried with " .{1,2}nderung" which worked but also gave me results like "Minderung" which was not intended.

I would be very glad if somebody knew a solution for this problem.

Thanks a lot

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    How do you retrieve your data?
    For some data retrieval operators you have to configure the correct encoding. If your input data is e.g. encoded in UTF-8 you have to configure that in the respective operator.

    Best regards,
    Marius
Sign In or Register to comment.