The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to extract/filter text elements using regex?
Dear community,
Currently I've been trying to use regular expressions in my RAPIDMINER model to filter extracts from a text. The text is extracted from excel under one content attribute. Each text is extracted from one of these cells in excel. I would like to extract specific sentences using regular expressions ([^.?!]*(?<=[.?\s!])flung(?=[\s.?!])[^.?!]*[.?!]) from these text, based on one word that must be included. I've tried to use 'select attributes' and 'filter examples', but I don't get the right output (i.e., only those sentences that contain one of the words). I would also need to correct for multiple words within one sentence (perhaps using the use exception?). Perhaps you have an idea how to integrate such process into the model? Any help is greatly appreciated!
Currently I've been trying to use regular expressions in my RAPIDMINER model to filter extracts from a text. The text is extracted from excel under one content attribute. Each text is extracted from one of these cells in excel. I would like to extract specific sentences using regular expressions ([^.?!]*(?<=[.?\s!])flung(?=[\s.?!])[^.?!]*[.?!]) from these text, based on one word that must be included. I've tried to use 'select attributes' and 'filter examples', but I don't get the right output (i.e., only those sentences that contain one of the words). I would also need to correct for multiple words within one sentence (perhaps using the use exception?). Perhaps you have an idea how to integrate such process into the model? Any help is greatly appreciated!
Tagged:
0
Answers
Also "filter example" could also work like this
If you need to detect the keywords and find the location of the keywords, NLP Tagger from this extension would be useful.
https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_corenlp
HTH!
Another option is to use NLP Tagger to find all keywords positions and then apply filter on NLP Tagger results.
HTH!
You could tokenize your text by linguistic sentences and then apply the filter tokens by content operator.
Hope this example helps you.