The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Conditional Tag Search with RegEx Output
Limegreenman900
Member Posts: 6 Contributor II
Hi everyone,
i am currently working on a big set of data (~ 4 million HTML files stored on my computer) and I am wondering if there is any search/parse fuction in RM that allows me to search all documents for a unique tag and IF the criteria is found THAN search in the same string for an regular expression that will match a certain number.
For example i do habe a string like:
<ix:nonFraction name="AuditFeesExpenses" contextRef="FY1.segment.bus-ThirdPartyAgentTypeDimension.bus-EntityAccountantsOrAuditorsGroupCompanyDimension.-Consolidated" unitRef="USD" xmlns:aurep="http://www.xbrl.org/reports/aurep/2009-09-01" decimals="0" format="ixt:numcommadot">14,825</ix:nonFraction>
I want to search for the tag "AuditFeesExpenses" and IF it is found RM should search for an regular expression that meets the criteria of the digit "14,825" (the RegEx is not my problem!).
Anyone of you have an idea if this is possible in RM?
Thanks!
Flo
i am currently working on a big set of data (~ 4 million HTML files stored on my computer) and I am wondering if there is any search/parse fuction in RM that allows me to search all documents for a unique tag and IF the criteria is found THAN search in the same string for an regular expression that will match a certain number.
For example i do habe a string like:
<ix:nonFraction name="AuditFeesExpenses" contextRef="FY1.segment.bus-ThirdPartyAgentTypeDimension.bus-EntityAccountantsOrAuditorsGroupCompanyDimension.-Consolidated" unitRef="USD" xmlns:aurep="http://www.xbrl.org/reports/aurep/2009-09-01" decimals="0" format="ixt:numcommadot">14,825</ix:nonFraction>
I want to search for the tag "AuditFeesExpenses" and IF it is found RM should search for an regular expression that meets the criteria of the digit "14,825" (the RegEx is not my problem!).
Anyone of you have an idea if this is possible in RM?
Thanks!
Flo
Tagged:
0
Answers
Otherwise - you may check Extract Information or (funnily) Replace.
Dortmund, Germany
But, assuming you don't want to use Solr (no... I really recommend you do for 4 million files), then here is a way to do it.
I would also suggest (from the file structure) that an XPath might also work better than a regular expression. Here's a quick example using your one below. You can use XPath both with the ReadXML operator, but for that many documents (if not using Solr) I would recommend using some Groovy Script within your workflow to process them.
In this example I convert from Html to XML, but you might not need this if your documents are already in well formatted XML. Give it a try on a couple of files.
Thanks for your code proposal but I think it will take too long to convert every document in a XML file first before processing it
~Martin
Dortmund, Germany