The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Reg Eexp Not Working as I Expected in Generate Attributes Operator
I am trying to use a regular expression in the Generate Attributes operator to find a portion of text that contains a date. I want to use the index() function to find the start of the text block containing the date. The text block always looks something like this:
pn - 2013-03-21
and it always starts on a new line and the line ends right after the date.
However, I'd like to have something more robust that specifies the date format, to make sure I don't pick up any old line of text that starts with "ad ". So I tried:
Blah blah
Innovation Export
ad - 2013-03-21
pd - 2011-20-32
blah, blah
done
We also tried the same sort of reg exp with the Generate Extract operator and that did not find the matching text either.
What am I doing wrong?
pn - 2013-03-21
and it always starts on a new line and the line ends right after the date.
index(text, "\nad ")works to find the start of the text block.
However, I'd like to have something more robust that specifies the date format, to make sure I don't pick up any old line of text that starts with "ad ". So I tried:
index(text,"ad.{3}20[0-1][0-9]-[0-9]{2}-[0-9]{2}")and it finds no match in Rapidminer. But if I use the same expression in Expresso, it does find a match in a text sample like:
Blah blah
Innovation Export
ad - 2013-03-21
pd - 2011-20-32
blah, blah
done
We also tried the same sort of reg exp with the Generate Extract operator and that did not find the matching text either.
What am I doing wrong?
Tagged:
0
Answers
Here's an example that uses Generate Extract. I used the '^' method within the regular expression to specify the start of the string. regards
Andrew
Here is the documentation from Java 1.7: Best,
Nils
Andrew, unfortunately your example process does not seem to work for me if there is additional text in the text attribute besides the date strings you have in your first two examples. So I'm still having some trouble, but will play around with it some more.
Nils, thanks for letting me know about the index() limitation. It would be nice to add that limitation to the Generate Attribute Help documentation.
Could you post some more example data so we could fit some regular expressions to it? I'm not a Regular Expression Ninja but I'm working at it.
regards
Andrew