The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Text Mining- Select Token based on Dictionary File
Hi every one,
I'm trying to work on a text mining workflow to filter specific language contents based on a specific language dictionary (TXT file most probably).
I was able to filter stopwords using the operator "Filter Stopwords (Dictionary)" to filter the content depending on a dictionary, but I'm still trying to select tokens based also on a dictionary, but it seems that the only operator offered is Filter Tokens (by contents) (which enables selecting tokens based on a regular expression, there is no option for selecting tokens based on a dictionary file).
I need your support if you have an idea if there exists any operator to do that, or if I'm missing something.
Thank you in advance
I'm trying to work on a text mining workflow to filter specific language contents based on a specific language dictionary (TXT file most probably).
I was able to filter stopwords using the operator "Filter Stopwords (Dictionary)" to filter the content depending on a dictionary, but I'm still trying to select tokens based also on a dictionary, but it seems that the only operator offered is Filter Tokens (by contents) (which enables selecting tokens based on a regular expression, there is no option for selecting tokens based on a dictionary file).
I need your support if you have an idea if there exists any operator to do that, or if I'm missing something.
Thank you in advance
0
Answers
See my post here: http://rapid-i.com/rapidforum/index.php/topic,8008.msg27328.html#msg27328 suggesting that a group of us band together to pay for RM to improve language support in the text extension. Particularly for more difficult languages such as Arabic, Chinese, Indonesian or even TxtSpk.
txt file with one stopword per line.