The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Filter text on regex."
I want to find all text snippets containing 1 or several words via regex. if I write select Filter Examples, and set it to "Expression" and provide it with: finds(Text, "(?i)\blootbox|micro\b") it doesn't work, although it is syntactically correct.
If I remove |micro, it only returns all snippts that contain lootbox - why does it not return an example with one of them? If I use RapidMiner's regex checker on some dummy data it works with the match on both of them, just not with "Filter Examples".
Kindly help!
Well, this expression actually IS an exact match ;-)
So I assume you would like to only match if there is a non-word character before and after? Is that what you mean? In this case, the correct expression is finds([Text],".*\\W+lootbox\\W+.*|.*\\W+micro\\W+.*") - process below.
Please note however that in this case you would also no longer find plurals easily, so for example "lootboxes" would not trigger this any longer.
<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="UTF-8"/><br> <process expanded="true"><br> <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34"><br> <parameter key="generator_type" value="comma separated text"/><br> <parameter key="number_of_examples" value="100"/><br> <parameter key="use_stepsize" value="false"/><br> <list key="function_descriptions"/><br> <parameter key="add_id_attribute" value="false"/><br> <list key="numeric_series_configuration"/><br> <list key="date_series_configuration"/><br> <list key="date_series_configuration (interval)"/><br> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/><br> <parameter key="time_zone" value="SYSTEM"/><br> <parameter key="input_csv_text" value="Text This is a text about lootboxes This is a text about micro transactions This is a text about lootboxes and micro transactions And this is a text talking about other things"/><br> <parameter key="column_separator" value=","/><br> <parameter key="parse_all_as_nominal" value="false"/><br> <parameter key="decimal_point_character" value="."/><br> <parameter key="trim_attribute_names" value="true"/><br> </operator><br> <operator activated="true" class="filter_examples" compatibility="9.2.000" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34"><br> <parameter key="parameter_expression" value="finds([Text],".*\\W+lootbox\\W+.*|.*\\W+micro\\W+.*")"/><br> <parameter key="condition_class" value="expression"/><br> <parameter key="invert_filter" value="false"/><br> <list key="filters_list"/><br> <parameter key="filters_logic_and" value="true"/><br> <parameter key="filters_check_metadata" value="true"/><br> </operator><br> <connect from_op="Create ExampleSet" from_port="output" to_op="Filter Examples" to_port="example set input"/><br> <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process>