The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
words containing UMLAUTE in Text Mining
Dear all,
apparently Rapid Miner is not able to search for certain words containing German Umlaute such as ä,ö,ü or also ß. When I search for the word "Änderung" in "regular expression" (in "Filter Tokens by Region" /condition: "contains match") it doesn't show any results.
I use version 5.3.005 on a Mac and am working with HTML documents. I know that the problem described above does not occur with an older version and Windows.
However, I need to solve this problem with version 5.3.005 on a mac.
I tried with " .{1,2}nderung" which worked but also gave me results like "Minderung" which was not intended.
I would be very glad if somebody knew a solution for this problem.
Thanks a lot
apparently Rapid Miner is not able to search for certain words containing German Umlaute such as ä,ö,ü or also ß. When I search for the word "Änderung" in "regular expression" (in "Filter Tokens by Region" /condition: "contains match") it doesn't show any results.
I use version 5.3.005 on a Mac and am working with HTML documents. I know that the problem described above does not occur with an older version and Windows.
However, I need to solve this problem with version 5.3.005 on a mac.
I tried with " .{1,2}nderung" which worked but also gave me results like "Minderung" which was not intended.
I would be very glad if somebody knew a solution for this problem.
Thanks a lot
0
Answers
For some data retrieval operators you have to configure the correct encoding. If your input data is e.g. encoded in UTF-8 you have to configure that in the respective operator.
Best regards,
Marius