The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Text Processing extension and Java/Groovy script

ccrichaccricha Member Posts: 9 Contributor II
edited November 2018 in Help

I am interested in using a Groovy script to interact with the Text Processing extension. I cannot seem find any sourcecode, javadoc, etc for this extension. Is there any documention or a repository available somewhere for this extension?

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    The source code for the Text Processing extension is not open source. I would reach out to your sales rep to see if he/she can faciliate the information you need. 

  • ccrichaccricha Member Posts: 9 Contributor II

    So apparently the Text Processing extension is proprietary and there is no documentation at all available. Ok, so can someone at RM please explain how to use an "Execute Script" operator with a Text Processing operator? I am interested in being able to work with a Document or a word list in Groovy / Java. Thanks.

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    Can you describe a bit about what you are trying to do?  

    Are you wanting to pass the doc into the script operator whilst it's in the Process Documents operator do something on it and then pass it back?  

     

     

    I've done this many times if that's what you mean.  

     

  • abol3zabol3z Member Posts: 5 Learner III

    Hello @JEdward  

    I am trying to do lemmatization and POT for Arabic inside my textprocessing operator, but there is only stemming operators in Rapidminer, so I want to do it in script.

    I have a the following textprocessing pipeline (tokenize, filter, stopwords, <here>), after removing stop words I want to do t using my script. After some looking around, I found that I can receive input of Document type, but I don't know its properties or how to do lemma on it.

    Appreciate your help.

Sign In or Register to comment.