"[SOLVED] Stemming: Keep Information {original word, stem}"

Urselinho · November 2012

Hi there,
I'm currently doing some text processing using the different stemming operators. Right now I'm wondering if there is a way to keep/show the information which words are conflated to which stem. Without doing any adjustment the results of stemming (wordlist, example set) only contain the stems and the associated information like occurences.

What I primaliry need is something like {original word, stem}.

I'm sure there is a quite easy task, but as I'm not that familiar with RM yet I don't see it. Any idea how to do this?

Many thanks in advance,
Regards,
Urs

MariusHelf · November 2012

Hi Urs,

actually, the stemming operators dismiss the original tokens, such that it is not possible to see which stem results from which token. The only solution may be to compare the stemmed document with the original document token-wise in a rather complex process and write the mapping manually into an example set.

Best, Marius

Urselinho · November 2012

Hi Marius,
that's quite unpleasent. But OK I do see the workaround. Thanks for your help.

Best,
Urs

Urselinho · November 2012

Hi Marius,
me once again. I really have to ask. Otherwise it will take me a long time to find the right operators/functions.

How can I use the Stemming-Operator in a way that words are "replaced" within a given document rather than "conflated". Because right now if I, for example, do have a document with the words "Autos" and "Auto" the wordlist will only contain the stem "auto".

Thanks in advance,
Urs

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"[SOLVED] Stemming: Keep Information {original word, stem}"

Answers