Reducing example set using average
Hello,
Attached to this is the image of my example set, it shows i have 683 examples and 343 regular attributes. I would like to reduce the examples set down to 100 examples and keep the number of attributes, with each of the new example is the average of a group of old example. For example, 683/100 = 6.83, so each new example will be the average for 6 of the old example. I know it is similar to moving average, but with moving average operator can only select 1 column and the result of that operator is just the average continuously. I also plan to use this on different example set and each of them have varied size from 500 examples to 2000 examples.
Regards,
Best Answer
-
tftemme Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
Hi @hung9022,
You can use the Process Windows operator (from the Time Series Extension, which is bundled with RapidMiner Studio since 9.0.0) with a window size of 6 and no overlapping windows selected (deselect create horizon). Inside the Process Windows you can use Extract Aggregates (also from Time Series Extension) to extract the average (and other aggregated values). Use Append to collect all results.
Hopes this helps,
Best regards,
FabianPS.: Since 9.0.0 the Moving Average, as well as Process Windows and Extract Aggregates work on several attributes at once
1
Answers
Hi @tftemme,
Thanks, that is what i am looking for. Incidentally, do you know how to set the macros so that I can loop example set of different sizes to reduce to 100 examples using the above method? The size of these example set ranging from 500 to 2000 examples .
Regards,
You can use Extract Macro to extract the number of examples, and then Generate Macro to calculate the window size (floor(eval(%{number_of_examples})/100)