The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"filter by upper case letter?"
Hey there,
I just recently installed Rapid Miner for a university project. I only worked with R so far so this is quite new and challenging for me.
I want to extract text from newspaper frontpages as part of analyzing agenda setting in German politics.
My question would be if it is possible to filter by upper case letter... German nouns start with upper case and I would like to filter that. Unfortunately, I have no idea how to do that. Any help is appreciated
I just recently installed Rapid Miner for a university project. I only worked with R so far so this is quite new and challenging for me.
I want to extract text from newspaper frontpages as part of analyzing agenda setting in German politics.
My question would be if it is possible to filter by upper case letter... German nouns start with upper case and I would like to filter that. Unfortunately, I have no idea how to do that. Any help is appreciated
Tagged:
0
Answers
Don't be scared of regular expressions this one is especially straightforward.
- ^ means start at the beginning of the text, as you are filtering within the tokens the start should be
- [A-Z] means any uppercase letter between A & Z
- . dot means any character at all.
- * asterix means any number of the preceding element (in this case . )
Have a play with the example below, simply copy & paste the XML into the XML view of RapidMiner and press the green tick to load it.
if you are interested in german nouns, you can use Filter POS as well. There you can specifically search for Nouns, Adjectives etc. German and English are supported. The process below uses it to get nouns out of the document. Of course you can use this in Process Documents. Further details on the syntax is available on: http://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/TagSets/stts-table.html
~Martin
Dortmund, Germany