The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Does Rapid Miner have Normalize White space in Text processing"
nawafpower
Member Posts: 34 Contributor II
Hi everybody,
I just wonder if the Rapid Miner does have "Normalize White Space" in its built in functions? I am trying to preprocess a text documents by normalizing the Case " To lower case", and Normalize White Space in the text files. If anybody can help with this it will be great.
Thanks
I just wonder if the Rapid Miner does have "Normalize White Space" in its built in functions? I am trying to preprocess a text documents by normalizing the Case " To lower case", and Normalize White Space in the text files. If anybody can help with this it will be great.
Thanks
Tagged:
0
Answers
sorry, I did not get what you are after. Could you give an example for a text before and after the desired transformation together with a description about what happened in between?
Cheers,
Ingo
By Normalize white space I mean "removing any leading or trailing space and reducing any internal white space to one space character per occurrence" . It's available in JGAAP application by Patrick Juola , I found out that this preprocessing step is very important in the text classification process. I need to implement it in RM if it possible.
Regards
But if for some reason you really needed to do it, it could be accomplished with one line of groovy script.
Why do you need to do this?
I have been playing with JGAAP and I found that best results came with normalize whitespace and unify case for Authorship purposes, when you mentioned doing one line code for this process, how can I do own programming with Rapid Miner GUI? I did ask you on your youtube channel if you can do a small video on Authorship but may be you don't have time, but if you can it will be great.
I valuate your notes Neil, they were always helpful.
well, you could use a combination of the operators "Trim" (removing leading and trailing white spaces) with "Replace" (replacing any "surviving" white space by a single space) for this task. Please note, that those two operators work on attributes (and not on documents or tokens) so you would have to perform the transformation before you use the text processing operators.
Below you will find a sample process which demonstrates the two operators. There is a white paper in our shop which explains that:
http://rapid-i.com/component/page,shop.product_details/flypage,flypage.tpl/product_id,52/category_id,5/option,com_virtuemart/Itemid,180/
Cheers,
Ingo