The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to "loop" over a filter?
Hey everybody,
I have a dataset containing the hours of a day of the whole year. What I want to do is to filter each day. Obviously doing that manually would be very hard, as I had to do that 365 times. Is there a way to somehow loop this thing?
Thanks
Tagged:
0
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
I think you need to get the data set into your loop collection using remember recall. See attached process
With 1-2 more operators we could use a usual loop, with select operator. The standard loop has an additional input and is working in parallel. Quite some options to go there .
Best,
Martin
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="85">
<parameter key="csv_file" value="/Users/Philipp/Desktop/Tank_Muenster.csv"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="UTF-8"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="brand.true.polynominal.attribute"/>
<parameter key="1" value="name.true.polynominal.attribute"/>
<parameter key="2" value="Day.true.polynominal.attribute"/>
<parameter key="3" value="Time.true.polynominal.attribute"/>
<parameter key="4" value="street.true.polynominal.attribute"/>
<parameter key="5" value="lat.true.real.attribute"/>
<parameter key="6" value="lng.true.real.attribute"/>
<parameter key="7" value="place.true.polynominal.attribute"/>
<parameter key="8" value="post_code.true.integer.attribute"/>
<parameter key="9" value="Benzin e5 in ¨.true.polynominal.attribute"/>
<parameter key="10" value="Diesel in ¨.true.polynominal.attribute"/>
<parameter key="11" value="stid.true.polynominal.attribute"/>
<parameter key="12" value="TagdW.true.polynominal.attribute"/>
<parameter key="13" value="Feiertag.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="nominal_to_date" compatibility="7.3.001" expanded="true" height="82" name="Nominal to Date" width="90" x="179" y="85">
<parameter key="attribute_name" value="Time"/>
<parameter key="date_type" value="time"/>
<parameter key="date_format" value="h:mm a"/>
<parameter key="locale" value="German (Germany)"/>
</operator>
<operator activated="true" class="date_to_numerical" compatibility="7.3.001" expanded="true" height="82" name="Date to Numerical (2)" width="90" x="313" y="85">
<parameter key="attribute_name" value="Time"/>
<parameter key="time_unit" value="minute"/>
<parameter key="minute_relative_to" value="day"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="85">
<list key="function_descriptions">
<parameter key="Grid" value="if(Time>0&&Time<=15,15, if(Time>15&&Time<=30,30, if(Time>30&&Time<=45,45, if(Time>45&&Time<=60,60, if(Time>60&&Time<=75,75, if(Time>75&&Time<=90,90, if(Time>90&&Time<=105,105, if(Time>105&&Time<=120,120, if(Time>120&&Time<=135,135, if(Time>135&&Time<=150,150, if(Time>150&&Time<=165,165, if(Time>165&&Time<=180,180, if(Time>180&&Time<=195,195, if(Time>195&&Time<=210,210, if(Time>210&&Time<=225,225, if(Time>225&&Time<=240,240, if(Time>240&&Time<=255,255, if(Time>255&&Time<=270,270, if(Time>270&&Time<=285,285, if(Time>285&&Time<=300,300, if(Time>300&&Time<=315,315, if(Time>315&&Time<=330,330, if(Time>330&&Time<=345,345, if(Time>345&&Time<=360,360, if(Time>360&&Time<=375,375, if(Time>375&&Time<=390,390, if(Time>390&&Time<=405,405, if(Time>405&&Time<=420,420, if(Time>420&&Time<=435,435, if(Time>435&&Time<=450,450, if(Time>450&&Time<=465,465, if(Time>465&&Time<=480,480, if(Time>480&&Time<=495,495, if(Time>495&&Time<=510,510, if(Time>510&&Time<=525,525, if(Time>525&&Time<=540,540, if(Time>540&&Time<=555,555, if(Time>555&&Time<=570,570, if(Time>570&&Time<=585,585, if(Time>585&&Time<=600,600, if(Time>600&&Time<=615,615, if(Time>615&&Time<=630,630, if(Time>630&&Time<=645,645, if(Time>645&&Time<=660,660, if(Time>660&&Time<=675,675, if(Time>675&&Time<=690,690, if(Time>690&&Time<=705,705, if(Time>705&&Time<=720,720, if(Time>720&&Time<=735,735, if(Time>735&&Time<=750,750, if(Time>750&&Time<=765,765, if(Time>765&&Time<=780,780, if(Time>780&&Time<=795,795, if(Time>795&&Time<=810,810, if(Time>810&&Time<=825,825, if(Time>825&&Time<=840,840, if(Time>840&&Time<=855,855, if(Time>855&&Time<=870,870, if(Time>870&&Time<=885,885, if(Time>885&&Time<=900,900, if(Time>900&&Time<=915,915, if(Time>915&&Time<=930,930, if(Time>930&&Time<=945,945, if(Time>945&&Time<=960,960, if(Time>960&&Time<=975,975, if(Time>975&&Time<=990,990, if(Time>990&&Time<=1005,1005, if(Time>1005&&Time<=1020,1020, if(Time>1020&&Time<=1035,1035, if(Time>1035&&Time<=1050,1050, if(Time>1050&&Time<=1065,1065, if(Time>1065&&Time<=1080,1080, if(Time>1080&&Time<=1095,1095, if(Time>1095&&Time<=1110,1110, if(Time>1110&&Time<=1125,1125, if(Time>1125&&Time<=1140,1140, if(Time>1140&&Time<=1155,1155, if(Time>1155&&Time<=1170,1170, if(Time>1170&&Time<=1185,1185, if(Time>1185&&Time<=1200,1200, if(Time>1200&&Time<=1215,1215, if(Time>1215&&Time<=1230,1230, if(Time>1230&&Time<=1245,1245, if(Time>1245&&Time<=1260,1260, if(Time>1260&&Time<=1275,1275, if(Time>1275&&Time<=1290,1290, if(Time>1290&&Time<=1305,1305, if(Time>1305&&Time<=1320,1320, if(Time>1320&&Time<=1335,1335, if(Time>1335&&Time<=1350,1350, if(Time>1350&&Time<=1365,1365, if(Time>1365&&Time<=1380,1380, if(Time>1380&&Time<=1395,1395, if(Time>1395&&Time<=1410,1410, if(Time>1410&&Time<=1425,1425, if(Time>1425&&Time<=1440,1440,666)))) ))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))"/>
</list>
</operator>
<operator activated="true" class="numerical_to_real" compatibility="7.3.001" expanded="true" height="82" name="Numerical to Real" width="90" x="313" y="238">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Grid"/>
</operator>
<operator activated="true" class="read_excel" compatibility="7.3.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="391">
<parameter key="excel_file" value="/Users/Philipp/Desktop/Zeit_.xlsx"/>
<parameter key="imported_cell_range" value="A1:B97"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Time.true.time.attribute"/>
<parameter key="1" value="Timegrid.true.time.attribute"/>
</list>
</operator>
<operator activated="true" class="date_to_numerical" compatibility="7.3.001" expanded="true" height="82" name="Date to Numerical" width="90" x="179" y="391">
<parameter key="attribute_name" value="Time"/>
<parameter key="time_unit" value="minute"/>
<parameter key="minute_relative_to" value="day"/>
</operator>
<operator activated="true" class="date_to_numerical" compatibility="7.3.001" expanded="true" height="82" name="Date to Numerical (3)" width="90" x="313" y="391">
<parameter key="attribute_name" value="Timegrid"/>
<parameter key="time_unit" value="minute"/>
<parameter key="minute_relative_to" value="day"/>
</operator>
<operator activated="true" class="numerical_to_real" compatibility="7.3.001" expanded="true" height="82" name="Numerical to Real (2)" width="90" x="447" y="391"/>
<operator activated="true" class="remember" compatibility="7.3.001" expanded="true" height="68" name="Remember" width="90" x="581" y="391">
<parameter key="name" value="data"/>
</operator>
<operator activated="true" class="operator_toolbox:group_into_collection" compatibility="0.1.000" expanded="true" height="82" name="Group Into Collection (2)" width="90" x="447" y="238">
<parameter key="group_by_attribute" value="Day"/>
</operator>
<operator activated="true" class="delay" compatibility="7.3.001" expanded="true" height="103" name="Delay" width="90" x="648" y="238">
<parameter key="delay" value="none"/>
<description align="center" color="transparent" colored="false" width="126">Just to ensure execution order</description>
</operator>
<operator activated="true" class="loop_collection" compatibility="7.3.001" expanded="true" height="82" name="Loop Collection" width="90" x="782" y="238">
<process expanded="true">
<operator activated="false" class="operator_toolbox:group_into_collection" compatibility="0.1.000" expanded="true" height="82" name="Group Into Collection" width="90" x="112" y="238">
<parameter key="group_by_attribute" value="stid"/>
</operator>
<operator activated="true" class="recall" compatibility="7.3.001" expanded="true" height="68" name="Recall" width="90" x="112" y="85">
<parameter key="name" value="data"/>
</operator>
<operator activated="true" class="join" compatibility="7.3.001" expanded="true" height="82" name="Join" width="90" x="246" y="34">
<parameter key="remove_double_attributes" value="false"/>
<parameter key="join_type" value="right"/>
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="Grid" value="Time"/>
</list>
</operator>
<connect from_port="single" to_op="Join" to_port="left"/>
<connect from_op="Recall" from_port="result" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
<connect from_op="Nominal to Date" from_port="example set output" to_op="Date to Numerical (2)" to_port="example set input"/>
<connect from_op="Date to Numerical (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Numerical to Real" to_port="example set input"/>
<connect from_op="Numerical to Real" from_port="example set output" to_op="Group Into Collection (2)" to_port="exa"/>
<connect from_op="Read Excel" from_port="output" to_op="Date to Numerical" to_port="example set input"/>
<connect from_op="Date to Numerical" from_port="example set output" to_op="Date to Numerical (3)" to_port="example set input"/>
<connect from_op="Date to Numerical (3)" from_port="example set output" to_op="Numerical to Real (2)" to_port="example set input"/>
<connect from_op="Numerical to Real (2)" from_port="example set output" to_op="Remember" to_port="store"/>
<connect from_op="Remember" from_port="stored" to_op="Delay" to_port="through 2"/>
<connect from_op="Group Into Collection (2)" from_port="col" to_op="Delay" to_port="through 1"/>
<connect from_op="Delay" from_port="through 1" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany1
Answers
Hey,
loop values would do the job. Maybe our new Group Into Collection operator from the Operator Toolbox is even better, it gives you a collection with an example set per day. You can work with Loop Collection trhough the days.
~Martin
Dortmund, Germany
Thanks for your reply,
that sounds pretty good. But could you specify? I downloaded the operator toolbox, but as soon as I put the Group into Collection operator into the Loop Collection operator the Error Message "Expected IOObjectCollection but received Examples set" occurs. Since I would call myself a newbie I would be grateful if you could provide me how to do so :-).
Regards
Philipp
And besides that is it possible to group by 2 attributes?
_______________________________________________
Okay, I solved this by putting another Group into Collection operator into the loop collection?! Now the problem occured that I can't join a collection with another dataset?
Hi,
i currently cannot run you proces, but i think you need to use an append before the join to get an example set again.
Edit: For the two attributes. Thats on our list to add. The Toolbox extension is a community like extension, even tough it is a rapidminer-interal community . So far you need to go for Generate Attribute and Concat to do two attributes.
Loop Values with Filter Example is by the way also a viable option, but slightly slower in execution time.
~Martin
Dortmund, Germany
Oh, okay. That's maybe because I have so many dataset etc.
But if I append now I have the same result as before. What I whant to do is to join every collection in this case e.g. 365 with another example set (which contains e.g. the name of the days of the week). So to append wouldn't be an option or?
I think we are near the finish line. Thank you for your process, that looks like it can work. But there is one error message occuring in the recall "no object with name data was found" despite we set it "data" in remember operator.
Could you check if the remember operator is executed before the recall?
See: http://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Change-the-Execution-Order-of-Processes/ta-p/31780
Best,
Martin
Dortmund, Germany
Thanks for your fast response.
According to RapidMiner it is definitely executed before the recall.
Got it, Could you remove the remove from store option in recall. Otherwise it's not available in iteration 2. Sorry for this.
~Martin
Dortmund, Germany
If I "remove from store" to negative it works :-). Is that plausible?
Okay, I did it parallel. Thank you very much for this long discussion and helpful answers! Process works fine now :smileyhappy:
Best regards
Philipp
Yes,
usually the objects are deleted once you recall them. This is to safe memory. In your special case you do not want to have it deleted. if you deactivate this option it's deleted once your process finishes.
~Martin
Dortmund, Germany
Hello again,
I have a question concerning the metadata, because if I want to apply replace missing values (series) on each IOObject I can't pick them in the dropdown of the operator. :smileyfrustrated:
Hi,
you an simply type in the attributes by hand. It works anyway.
I think we need to investigate our meta data propagation there. But maybe it's just fine to take the meta data from Last execution (under Process).
~Martin
Dortmund, Germany
Thank you that also worked! :-)
Now that I wanted to do two collections (2 attributes) I created another collection of the collection. The Input of the join operator then says that it's the wrong input type.
Hi,
do you want to group by two attributes? If so, then first built an indicator variable like concat(att1,att2) and then do one grouping.
~Martin
Dortmund, Germany
Okay that worked. Thank you! I think it's the routine, which hopefully lets me find this kind of solutions, too.
The whole process is finished now. It works fine, but is would there be a way to create or rather get back the meta data? It took me some typing to manually write all attribute names in a couple of different operators.
Hi,
usually Process->Synchronize Data with Real Data should do the job.
Propagating meta data from recalls in complex loops is kind of difficult..
Best,
Martin
Dortmund, Germany