The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED]The approach for filtering non-letter tokens
huaiyanggongzi
Member Posts: 39 Contributor II
In Rapidminer, I use tokenize operator to process a lot of documents. Currently, I have some documents that have a lot of no-letter characters, such as digits, %, $ or any other non-letter symbols. Are there any operators that can allow me to filter these tokens? Thanks.
0
Answers
first of all, you have to configure the Tokenize operator to use a splitting pattern appropriate to your problem. By default, it splits at "non-letters", you could change it to e.g. split by all space characters.
Then, to filter, you can use the Filter Tokens operator with a customized pattern.
If you have probems with the regular expressions, please post again.
Happy Mining!
~Marius