The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Stem Completion
Is there a "stem completion" operator that does something similar to stemCompletion in R? For example, stemming converts service, servicing, services, serviced etc. to servic, but I can't see an operator which then returns the stem to a meaningful form, e.g. service, based on some parameters, e.g. shortest form, longest form etc.
0
Answers
Hi Carl,
Nope, there is no such operator. I also must admit that this might be a bit "dangerous" since you never would know if the completion is actually close to the original word or not... I guess for visualization purposes this might still be nice though.
Of course you could call the R function from the R Scripting operator (https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_r_scripting) which should be relatively easy.
Cheers,
Ingo
That would be the lemma logic opposed to stemming. And I would love to see that supported in RM also :-)
I've done this myself using an 'execute python' operator and then using the NLTK toolkit which has a good lemmatizer.
One of the main complexities is that you need to know the part of speech in order to get the best lemma, so to get the best results you need to run quite some of the textprocessing logic in python. Not a real dealbreaker but it makes the RM workflow less clear.
If you are familiar with python and have the NLTK toolkit installed below raw and dirty operator does work, but you will have to modify the script a bit so that it accepts actual data from an example set instead of the inline test string. It's not the fastest and most elegant approach, but at least it's an option
Of course the same can be achieved with R also, but I am less familiar with that one. Just look at it as an alternative way to get external logic working with RM.