The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"[SOLVED] Stemming: Keep Information {original word, stem}"
Hi there,
I'm currently doing some text processing using the different stemming operators. Right now I'm wondering if there is a way to keep/show the information which words are conflated to which stem. Without doing any adjustment the results of stemming (wordlist, example set) only contain the stems and the associated information like occurences.
What I primaliry need is something like {original word, stem}.
I'm sure there is a quite easy task, but as I'm not that familiar with RM yet I don't see it. Any idea how to do this?
Many thanks in advance,
Regards,
Urs
I'm currently doing some text processing using the different stemming operators. Right now I'm wondering if there is a way to keep/show the information which words are conflated to which stem. Without doing any adjustment the results of stemming (wordlist, example set) only contain the stems and the associated information like occurences.
What I primaliry need is something like {original word, stem}.
I'm sure there is a quite easy task, but as I'm not that familiar with RM yet I don't see it. Any idea how to do this?
Many thanks in advance,
Regards,
Urs
Tagged:
0
Answers
actually, the stemming operators dismiss the original tokens, such that it is not possible to see which stem results from which token. The only solution may be to compare the stemmed document with the original document token-wise in a rather complex process and write the mapping manually into an example set.
Best, Marius
that's quite unpleasent. But OK I do see the workaround. Thanks for your help.
Best,
Urs
me once again. I really have to ask. Otherwise it will take me a long time to find the right operators/functions.
How can I use the Stemming-Operator in a way that words are "replaced" within a given document rather than "conflated". Because right now if I, for example, do have a document with the words "Autos" and "Auto" the wordlist will only contain the stem "auto".
Thanks in advance,
Urs