How can I influence the order, in which the LOOP COLLECTION Operator works
Hi there,
I have a process, where I get some data from a database for the last 7 days for up to 20 different machines.
In a second step, I want to do some aggregations and reports for each of the included equipments. This works pretty straight forward with the LOOP INTO COLLECTION and LOOP COLLECTIONS Operator. But the problem is, that the LOOP COLLECTIONS Operator does this in a (random) order, e.g. starts with machine1, than machine3, equipment 5, machine10, machine6, ...
But I want to influence the order, e.g. by increasing names of the machines, to have it in defined order for each run.
Any suggestions? Tried of course , e.g. sorting before grouping, but with no success
Thanks in advance!
Best Answer
-
tftemme Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
Hi @uenge-san,
Unfortunately Martin is right. At the moment the Group Into Collection operator generates a collection with a not defined (and therefor random) order of its entries.
For the same dataset (when you for example rerun a process) it should be the same (random) order, but there is no chance to have a specific order in the Group Into Collection Operator at the moment.
The Loop Collection just loops over this random order.
I hope we can include an additional option to the Group Into Collection Operator to put out an ordered collection, in the next release of the Operator Toolbox extension.
Until then I can only think about a workaround to reorder the collection by yourself after the Group Into Collection.
For example with this process, where basically two Loop are used to reorder the Collection. If you have a really large number of machines this can take of course a while.
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="112" y="34">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="operator_toolbox:group_into_collection" compatibility="0.8.000" expanded="true" height="82" name="Group Into Collection" width="90" x="380" y="34">
<parameter key="group_by_attribute" value="Outlook"/>
<description align="center" color="yellow" colored="true" width="126">Group the golf data according to the attribute Outlook</description>
</operator>
<operator activated="true" class="operator_toolbox:create_exampleset_from_doc" compatibility="0.8.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="380" y="289">
<parameter key="Input Csv" value="Outlook rain sunny overcast"/>
<description align="center" color="transparent" colored="false" width="126">Create an ExampleSet with the three values of the Outlook attribute</description>
</operator>
<operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="514" y="34"/>
<operator activated="true" class="concurrency:loop" compatibility="8.0.001" expanded="true" height="103" name="Loop" width="90" x="648" y="136">
<parameter key="number_of_iterations" value="3"/>
<process expanded="true">
<operator activated="true" class="extract_macro" compatibility="8.0.001" expanded="true" height="68" name="Extract Macro" width="90" x="112" y="85">
<parameter key="macro" value="outlook_value"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="Outlook"/>
<parameter key="example_index" value="%{iteration}"/>
<list key="additional_macros"/>
<description align="center" color="transparent" colored="false" width="126">Extract the name of the current value of the Outlook Attribute</description>
</operator>
<operator activated="true" class="loop_collection" compatibility="8.0.001" expanded="true" height="82" name="Loop Collection" width="90" x="313" y="34">
<process expanded="true">
<operator activated="true" class="extract_macro" compatibility="8.0.001" expanded="true" height="68" name="Extract Macro (2)" width="90" x="45" y="34">
<parameter key="macro" value="current_value_from_collection"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="Outlook"/>
<parameter key="example_index" value="1"/>
<list key="additional_macros"/>
<description align="center" color="transparent" colored="false" width="126">Extract the value for the Outlook attribute from the current ExampleSet of the collection</description>
</operator>
<operator activated="true" class="branch" compatibility="8.0.001" expanded="true" height="103" name="Branch" width="90" x="179" y="34">
<parameter key="condition_type" value="expression"/>
<parameter key="expression" value="%{outlook_value} == %{current_value_from_collection}"/>
<process expanded="true">
<connect from_port="input 1" to_port="input 1"/>
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
<process expanded="true">
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Only if both macros are the same, deliver the ExampleSet, otherwise do nothing</description>
</operator>
<connect from_port="single" to_op="Extract Macro (2)" to_port="example set"/>
<connect from_op="Extract Macro (2)" from_port="example set" to_op="Branch" to_port="input 1"/>
<connect from_op="Branch" from_port="input 1" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Loop over the collection and search for the corresponding ExampleSet</description>
</operator>
<connect from_port="input 1" to_op="Loop Collection" to_port="collection"/>
<connect from_port="input 2" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Loop Collection" from_port="output 1" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="source_input 3" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Loop over the three different values from Create ExampleSet and search for the corresponding ExampleSet in the Collection</description>
</operator>
<operator activated="true" class="collect" compatibility="8.0.001" expanded="true" height="82" name="Collect" width="90" x="782" y="136">
<parameter key="unfold" value="true"/>
<description align="center" color="transparent" colored="false" width="126">Flatten resulting collection</description>
</operator>
<connect from_op="Retrieve Golf" from_port="output" to_op="Group Into Collection" to_port="exa"/>
<connect from_op="Group Into Collection" from_port="col" to_op="Multiply" to_port="input"/>
<connect from_op="Create ExampleSet" from_port="output" to_op="Loop" to_port="input 2"/>
<connect from_op="Multiply" from_port="output 1" to_port="result 1"/>
<connect from_op="Multiply" from_port="output 2" to_op="Loop" to_port="input 1"/>
<connect from_op="Loop" from_port="output 1" to_op="Collect" to_port="input 1"/>
<connect from_op="Collect" from_port="collection" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>Hopefully this helps a bit,
Best regards,
Fabian1
Answers
ok this is a question for the collection maestro @mschmitz
Hi,
i think currently this is not possible. Maybe @tftemme knows a way? He wrote the operator.
Best,
Martin
Dortmund, Germany
Here's a quick example on how to do it with the Loop operator.
If the problem is that you can't guarantee that your machines are in the collection in a certain order you could also use a Loop, but inside the loop perform a Loop Collections to filter out only the collections with your desired machine number.
Hi Fabian, @tftemme
thanks for your answer and the example process.
I also used a double looping workaround which of course lacks the simplicity of the LOOP COLLECTION Operator...
So looking forward for a new release of the Operator Toolbox Extension
BR
Martin