The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Filtering collection with criteria
I am on a quest to retrieve useful data from PDF.
I already conquered the first battle with the Table Extraction extension. I am now faced with another challenge:
How do I filter out a collection? Let's say I want to ignore examplesets with less than 10 examples in a collection and output a collection of al the other examplesets. How can I go about going that?
Tagged:
1
Best Answer
-
David_A Administrator, Moderator, Employee-RapidMiner, RMResearcher, Member Posts: 297 RM ResearchHi @pblack476 ,you can use the Loop Collection operator to evaluate each example set individually.
Inside the loop, you can use a branch operator and discard those example sets, that don't fit requirement.
See the example process below for an example.
Best,
David<process version="9.6.000-BETA"><br><div> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.4.000" expanded="true" name="Process" origin="GENERATED_TUTORIAL"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="SYSTEM"/><br> <process expanded="true"><br> <operator activated="true" class="concurrency:loop" compatibility="9.6.000-BETA" expanded="true" height="82" name="Loop" width="90" x="179" y="34"><br> <parameter key="number_of_iterations" value="20"/><br> <parameter key="iteration_macro" value="iteration"/><br> <parameter key="reuse_results" value="false"/><br> <parameter key="enable_parallel_execution" value="true"/><br> <process expanded="true"><br> <operator activated="true" class="generate_macro" compatibility="9.6.000-BETA" expanded="true" height="68" name="Generate Macro" width="90" x="179" y="34"><br> <list key="function_descriptions"><br> <parameter key="random" value="round(rand()*100)"/><br> </list><br> <description align="center" color="transparent" colored="false" width="126">Generate a randon number between 1 and 100</description><br> </operator><br> <operator activated="true" class="generate_data" compatibility="9.6.000-BETA" expanded="true" height="68" name="Generate Data" width="90" x="380" y="34"><br> <parameter key="target_function" value="random"/><br> <parameter key="number_examples" value="%{random}"/><br> <parameter key="number_of_attributes" value="5"/><br> <parameter key="attributes_lower_bound" value="-10.0"/><br> <parameter key="attributes_upper_bound" value="10.0"/><br> <parameter key="gaussian_standard_deviation" value="10.0"/><br> <parameter key="largest_radius" value="10.0"/><br> <parameter key="use_local_random_seed" value="false"/><br> <parameter key="local_random_seed" value="1992"/><br> <parameter key="datamanagement" value="double_array"/><br> <parameter key="data_management" value="auto"/><br> </operator><br> <connect from_op="Generate Data" from_port="output" to_port="output 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_output 1" spacing="0"/><br> <portSpacing port="sink_output 2" spacing="0"/><br> </process><br> </operator><br> <operator activated="true" class="loop_collection" compatibility="9.6.000-BETA" expanded="true" height="82" name="Loop Collection" width="90" x="715" y="34"><br> <parameter key="set_iteration_macro" value="false"/><br> <parameter key="macro_name" value="iteration"/><br> <parameter key="macro_start_value" value="1"/><br> <parameter key="unfold" value="false"/><br> <process expanded="true"><br> <operator activated="true" class="branch" compatibility="9.6.000-BETA" expanded="true" height="82" name="Branch" width="90" x="447" y="34"><br> <parameter key="condition_type" value="min_examples"/><br> <parameter key="condition_value" value="50"/><br> <parameter key="expression" value=""/><br> <parameter key="io_object" value="ANOVAMatrix"/><br> <parameter key="return_inner_output" value="true"/><br> <process expanded="true"><br> <connect from_port="condition" to_port="input 1"/><br> <portSpacing port="source_condition" spacing="0"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_input 1" spacing="0"/><br> <portSpacing port="sink_input 2" spacing="0"/><br> </process><br> <process expanded="true"><br> <portSpacing port="source_condition" spacing="0"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_input 1" spacing="0"/><br> <portSpacing port="sink_input 2" spacing="0"/><br> </process><br> </operator><br> <connect from_port="single" to_op="Branch" to_port="condition"/><br> <connect from_op="Branch" from_port="input 1" to_port="output 1"/><br> <portSpacing port="source_single" spacing="0"/><br> <portSpacing port="sink_output 1" spacing="0"/><br> <portSpacing port="sink_output 2" spacing="0"/><br> <description align="center" color="yellow" colored="false" height="245" resized="false" width="180" x="408" y="130">Here the branch condition is minimum number of examples.<br/><br/>If it's over 50, the example set is passed through, if not it's discarded.<br/><br/>The same logic could be applied on number of attributes, or number of missings.</description><br> </process><br> </operator><br> <connect from_op="Loop" from_port="output 1" to_op="Loop Collection" to_port="collection"/><br> <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="142" y="142">Generate a collection with 20 example sets,<br/>with a random number of example (between 1 and 100)</description><br> <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="664" y="159">Loop through all example sets in the collection and evaluate the number of examples</description><br> </process><br> </operator><br></process></div>
8
Answers
try the Loop Collection operator.
In each loop execution you'll get one example set. You can then use for example Extract Macro to determine the number of examples, and conditionally return the example set or not.
Regards,
Balázs