The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Stem (Dictionary) Indonesia Language with regex"
Hello,
I have a problem when trying to use regex for Stem (Dictionary) Indonesia language
This is for example indonesian language:
saya sangat senang dengan kalian-kalian, tampilannya dan suaranya sangat bagus
and I want to make it as below:
saya sangat senang dengan kalian, tampil dan suara sangat bagus
That is working when I used stem like this:
kalian:kalian.*
tampil:tampil.*
suara:suara.*
But failed, when I'am trying to used another regex function:
:-(.*)$
:(ku|mu|nya|lah|kah|tah|pun)$
How can I used stem, besides with function "text: text. *"
Please help me for this case
Thanks
Best Regards,
Bay
Tagged:
0
Answers
hello @baybay - hmm I don't speak Indonesian and am very puzzled on what you're trying to do with your first RegEx expression
the second one seems ok. If you could post your XML and your sample data set, it would be a lot easier to help. Also tagging my go-to RegEx guru @Telcontar120
Scott
Hi @baybay,
You definitely can use rule based stemmer. A preferred way is "stem tokens using example set" operator from toolbox extension.
A comprehensive study of stemming on Indonesia
https://pdfs.semanticscholar.org/8ed9/c7d54fd3f0b1ce3815b2eca82147b771ca8f.pdf
HTH,
YY
Hi @sgenzer,
I sent by attachment for dataset, XML and stemming
Thanks
Bay
Hi @yyhuang,
So we must input stem text one by one like "suara:suara.*"?
I just want to make automaticaly remove stem text like on this link
Thanks
Bay