The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
language filter issue
![huaiyanggongzi](https://us.v-cdn.net/6030995/uploads/defaultavatar/nCCNNSPK1YM69.jpg)
![](https://s3.amazonaws.com/rapidminer.community/vanilla-rank-images/contributor-16x16.png )
I have a document that include both chinese and english. Can I filter all those english text and keep chinese text only? Or in the other direction, can I filter all those chinese text and keep english text only?
0
Answers
Just saw this one. Yes you can, simply use a regular expression in your filter and search for \p{Han} this only selects Chinese characters.
To get the reverse just invert it.
Btw, my team has released a RapidMiner extension to perform multilingual text analysis - the Rosette Text Toolkit. We have an "Identify Language" operator that returns the language of every cell in the input attribute (identifies 56 languages, including Chinese). The extension may help in analyzing multiple-language input - and most of our operators support Chinese.
-Lauren
Hope Rosette could fix this soon.
Rumi