The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

WordNet in RM 5

simon_knollsimon_knoll Member Posts: 40 Contributor II
edited November 2018 in Help
Hello all,
short question: in RM 4.x there was this WordNetSynonymStemmer. is this operator gone in ver. 5 and one has to use groovy scripting instead?

thx
simon knoll

Answers

  • WanttoknowWanttoknow Member Posts: 6 Contributor II
    Hi,

    I was asking myself the same thing: Where is the Wordnet stemmer in RM5?
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee-RapidMiner, Member Posts: 295 RM Product Management
    Hi,

    I think the WordNet stemmer was removed since it did not work that well. Eventually, we try to re-animate it somewhen, but that is only speculation.

    Kind regards,
    Tobias
  • simon_knollsimon_knoll Member Posts: 40 Contributor II
    hi,
    i coded myself a wordnet operator, if someone is interested i can share code snippets.
    what i can say is that for my testing dataset i've got some good results by adding hyponyms  for kmeans clustering.

    all the best,
    simon
  • B_B_ Member Posts: 70 Maven
    Simon

    would appreciate seeing how you set this up. 
    thanks

    b.
  • simon_knollsimon_knoll Member Posts: 40 Contributor II
    hi,
    1st, you'll have to install wordnet
    2nd, you need a java wordnet api, i took this one http://projects.csail.mit.edu/jwi/ (not for commercial purposes, but the fastest i know)
    3rd, you'll have to implement an Operator (i added a new Class in the "com.rapidminer.operator.text.io.wordfilter" package)
    for this i just copied an operator of the text plugin, deleted all the things i do not need and added the code for wordnet (here i add hypernyms)

    i hope this was more helpful  than confusing ;)
    package com.rapidminer.operator.text.io.wordfilter;

    import java.io.File;
    import java.net.MalformedURLException;
    import java.net.URL;
    import java.util.ArrayList;
    import java.util.List;

    import com.rapidminer.operator.OperatorDescription;
    import com.rapidminer.operator.OperatorException;
    import com.rapidminer.operator.text.Document;
    import com.rapidminer.operator.text.Token;
    import com.rapidminer.operator.text.io.AbstractTokenProcessor;
    import com.rapidminer.parameter.UndefinedParameterError;

    import edu.mit.jwi.Dictionary;
    import edu.mit.jwi.IDictionary;
    import edu.mit.jwi.item.IIndexWord;
    import edu.mit.jwi.item.ISynset;
    import edu.mit.jwi.item.ISynsetID;
    import edu.mit.jwi.item.IWord;
    import edu.mit.jwi.item.IWordID;
    import edu.mit.jwi.item.POS;
    import edu.mit.jwi.item.Pointer;
    import edu.mit.jwi.morph.WordnetStemmer;

    public class WordnetHyponymOperator extends AbstractTokenProcessor {
    private WordnetStemmer stemmer;
    private IDictionary dict;

    public WordnetHyponymOperator(OperatorDescription description) {
    super(description);
    String wnhome = "/usr/local/WordNet-3.0/";
    String path = wnhome + File.separator + "dict";
    URL url = null;
    try {
    url = new URL("file", null, path);
    } catch (MalformedURLException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }

    // construct the dictionary object and open it
    IDictionary dict = new Dictionary(url);
    dict.open();
    WordnetStemmer stemmer = new WordnetStemmer(dict);
    this.dict = dict;
    this.stemmer = stemmer;
    }

    @Override
    protected Document doWork(Document textObject) throws OperatorException {

    List<Token> newSequence = new ArrayList<Token>(textObject
    .getTokenSequence().size());
    for (Token token : textObject.getTokenSequence()) {
    List<String> stems = stemmer.findStems(token.getToken(), POS.NOUN);
    if (stems != null && stems.size() > 0) {
    String word2 = stems.get(0);
    IIndexWord idxWord = dict.getIndexWord(word2, POS.NOUN);
    if (idxWord != null && idxWord.getWordIDs().size() > 0) {
    if (idxWord != null && idxWord.getWordIDs().size() > 0) {
    IWordID wordID = idxWord.getWordIDs().get(0);
    IWord word = dict.getWord(wordID);
    ISynset synset = word.getSynset();
    List<ISynsetID> blub = synset.getRelatedMap().get(
    Pointer.HYPERNYM);

    for (ISynsetID iSynsetID : blub) {
    ISynset set = dict.getSynset(iSynsetID);
    List<IWord> bla = set.getWords();
    for (IWord iWord : bla) {
    newSequence.add(new Token(iWord.getLemma(),
    token.getWeight()));
    }

    }
    }
    }
    }
    newSequence.add(token);
    }
    textObject.setTokenSequence(newSequence);
    return textObject;
    }

    }
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee-RapidMiner, Member Posts: 295 RM Product Management
    Hi Simon,

    thank you very much for sharing your work. At the moment, our work at the text processing extension is almost idle because of other work. But maybe we have a look at it sometime ...?!

    Best regards,
    Tobias
  • simon_knollsimon_knoll Member Posts: 40 Contributor II
    Yes, would be cool if this kind of features would be added again to the text plugin.
  • B_B_ Member Posts: 70 Maven
    thanks for the example Simon
Sign In or Register to comment.