The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"looping through regex matches and groups"
markus_dressel
Member Posts: 5 Contributor I
Hi community,
I might have an easy questions regarding handling regex matches.
I have a document (loaded with the document operator), and now I want to use a regex to retrieve a certain part of the document. My regex code got e.g. three matches. So when running rapidminer, all three matches will be shown together (appended/joined together). So my questions is, if there is a way to loop through all regex matches like I can do it in Java or Python ?
For example like:
<SPAN class="kwd">import</SPAN><SPAN class="pln"> re s </SPAN><SPAN class="pun">=</SPAN> <SPAN class="str">"ABC12DEF3G56HIJ7"</SPAN><SPAN class="pln"> pattern </SPAN><SPAN class="pun">=</SPAN><SPAN class="pln"> re</SPAN><SPAN class="pun">.</SPAN><SPAN class="pln">compile</SPAN><SPAN class="pun">(</SPAN><SPAN class="pln">r</SPAN><SPAN class="str">'([A-Z]+)([0-9]+)'</SPAN><SPAN class="pun">)</SPAN> <SPAN class="kwd">for</SPAN> <SPAN class="pun">(</SPAN><SPAN class="pln">letters</SPAN><SPAN class="pun">,</SPAN><SPAN class="pln"> numbers</SPAN><SPAN class="pun">)</SPAN> <SPAN class="kwd">in</SPAN><SPAN class="pln"> re</SPAN><SPAN class="pun">.</SPAN><SPAN class="pln">findall</SPAN><SPAN class="pun">(</SPAN><SPAN class="pln">pattern</SPAN><SPAN class="pun">,</SPAN><SPAN class="pln"> s</SPAN><SPAN class="pun">):</SPAN> <SPAN class="kwd">pass # do anything</SPAN>
This is just a sample code, and not my specific task. I just want to know how to loop through regex matches.
I hope my question is quite clear :-)
Best regards,
Markus
Tagged:
0
Answers
Hi Markus,
have a look at the attached process. It builds something like this with operators. It uses the new 7.4 loop. There is for sure a way to built this with 7.3 as well.
~Martin
Dortmund, Germany
You can use the replace dictionairy operator for this purpose.
Easiest way to proceed is to create a csv containing the regex you want to use (the from atribute) and the replacement (the to atribute), tell the operator to use regular expressions and of you go. It will loop through the whole file and replaces content accordingly.
Hi,
thank you for the quick response and provided solution. I have loaded your solution but maybe I have not correctly described my problem:
Lets say, we have a document with the following text:
Item here is some important text Item
Here is no important text
Item here is some additional important text Item
If I will use the regex: "(?s)(?i)Item.*?Item" , I have two matches
1: Item here is some important text Item
2: Item here is some additional important text Item
See https://regex101.com/r/WYn2nm/1
So the question is, how can I loop through each match and do some stuff with it, keeping in mind that the amount of matches is not static in different documents.
Something like that
Best regards and thank you for your great support
Markus
I see. As you stated you know how to do it in python so how about using an execute python process? You just create your regex script, pump your data through it and you are covered.
Should be pretty simple this way, probably you can achieve it with plain RM vanilla but without having a clear idea on the data you have and what you want to achieve it's a bit complex to support.
Something like this :