The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
time series: create weakly, monthly, yearly averages on time series
I have a number to financial timeries
Some are daily, some are weekly, some are monthly etc
I want to convert them into the same frequency by averaging ie all values inside a month
ie if X1: has daily values for the previous 10 years, Date:the corresponding index
I want to group all the values of the Date index by month and replace them with a new variable that has one value ie the average of all dates corresponding to that month
Any suggestions how one could do that?
Some are daily, some are weekly, some are monthly etc
I want to convert them into the same frequency by averaging ie all values inside a month
ie if X1: has daily values for the previous 10 years, Date:the corresponding index
I want to group all the values of the Date index by month and replace them with a new variable that has one value ie the average of all dates corresponding to that month
Any suggestions how one could do that?
Tagged:
0
Learner II
Answers
However, if you have different series data frequency, you can do this the conventional way inside RapidMiner by simply using Date to Numerical, for example, extracting the month and year (separately) from your date/time stamps, and then use the regular Aggregate operator to get your results grouped by month and year.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks for the feedback about the time series extension. It is always good to hear feedback. You are right the current time series operators are not yet directly designed to handle time series of different frequencies or non-equidistant time series. This is planned for upcoming releases.
I want to suggest also another solution. Performing an outer join on time series with different frequencies, results in a joined data set with missing values for the timestamps for which the corresponding attributes don't have values in the original data sets. You can then use Replace Missing Values (Series) to interpolate these and have a resulting dataset with a single frequency.
Here is a demo process, how to do this:
<process version="9.0.003"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="subprocess" compatibility="9.0.003" expanded="true" height="103" name="Subprocess" width="90" x="45" y="85"> <process expanded="true"> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34"> <parameter key="number_of_examples" value="104"/> <list key="function_descriptions"> <parameter key="time" value="date_add(date_parse("01/01/2000"),id-1,DATE_UNIT_WEEK)"/> <parameter key="weekly data" value="10*rand()"/> </list> <parameter key="add_id_attribute" value="true"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> </operator> <operator activated="true" class="set_role" compatibility="9.0.003" expanded="true" height="82" name="Set Role" width="90" x="179" y="34"> <parameter key="attribute_name" value="time"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="id"/> <parameter key="invert_selection" value="true"/> </operator> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.7.000" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="45" y="136"> <parameter key="number_of_examples" value="24"/> <list key="function_descriptions"> <parameter key="time" value="date_add(date_parse("01/01/2000"),id-1,DATE_UNIT_MONTH)"/> <parameter key="monthly data" value="100*rand()"/> </list> <parameter key="add_id_attribute" value="true"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> </operator> <operator activated="true" class="set_role" compatibility="9.0.003" expanded="true" height="82" name="Set Role (2)" width="90" x="179" y="136"> <parameter key="attribute_name" value="time"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes (2)" width="90" x="313" y="136"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="id"/> <parameter key="invert_selection" value="true"/> </operator> <connect from_op="Create ExampleSet" from_port="output" to_op="Set Role" to_port="example set input"/> <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_port="out 1"/> <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Set Role (2)" to_port="example set input"/> <connect from_op="Set Role (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/> <connect from_op="Select Attributes (2)" from_port="example set output" to_port="out 2"/> <portSpacing port="source_in 1" spacing="0"/> <portSpacing port="sink_out 1" spacing="0"/> <portSpacing port="sink_out 2" spacing="0"/> <portSpacing port="sink_out 3" spacing="0"/> </process> <description align="center" color="transparent" colored="false" width="126">Creates two ExampleSets, one with weekly data, one with monthly data</description> </operator> <operator activated="true" class="generate_attributes" compatibility="9.0.003" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34"> <list key="function_descriptions"> <parameter key="flag weekly data" value=""true""/> </list> </operator> <operator activated="true" class="concurrency:join" compatibility="9.0.003" expanded="true" height="82" name="Join" width="90" x="313" y="85"> <parameter key="join_type" value="outer"/> <parameter key="use_id_attribute_as_key" value="true"/> <list key="key_attributes"/> </operator> <operator activated="true" class="sort" compatibility="9.0.003" expanded="true" height="82" name="Sort" width="90" x="447" y="85"> <parameter key="attribute_name" value="time"/> </operator> <operator activated="true" class="time_series:replace_missing_values" compatibility="9.1.000-BETA" expanded="true" height="68" name="Replace Missing Values (Series)" width="90" x="648" y="85"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attributes" value="weekly data|monthly data"/> <parameter key="has_indices" value="true"/> <parameter key="indices_attribute" value="time"/> <parameter key="replace_type_numerical" value="linear interpolation"/> <parameter key="ensure_finite_values" value="true"/> </operator> <operator activated="true" class="filter_examples" compatibility="9.0.003" expanded="true" height="103" name="Filter Examples" width="90" x="782" y="85"> <list key="filters_list"> <parameter key="filters_entry_key" value="flag weekly data.is_not_missing."/> </list> </operator> <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes (3)" width="90" x="916" y="85"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="original weekly data"/> <parameter key="invert_selection" value="true"/> </operator> <connect from_op="Subprocess" from_port="out 1" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Subprocess" from_port="out 2" to_op="Join" to_port="right"/> <connect from_op="Generate Attributes" from_port="example set output" to_op="Join" to_port="left"/> <connect from_op="Join" from_port="join" to_op="Sort" to_port="example set input"/> <connect from_op="Sort" from_port="example set output" to_op="Replace Missing Values (Series)" to_port="example set"/> <connect from_op="Replace Missing Values (Series)" from_port="example set" to_op="Filter Examples" to_port="example set input"/> <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/> <connect from_op="Select Attributes (3)" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <description align="center" color="transparent" colored="true" height="67" resized="false" width="126" x="158" y="139">Generate a flag attribute to used later in the filter</description> <description align="center" color="transparent" colored="true" height="229" resized="false" width="180" x="332" y="186">Perform an outer join using the id attribute (which is the time attribute).<br><br>The resulting ExampleSet will have missing values for the timestamps which don't occur in the original datasets<br/><br/>Use sort to sort again for the joint timestemps</description> <description align="center" color="transparent" colored="true" height="216" resized="true" width="286" x="674" y="199">Use Replace Missing Values with linear interpolation to fill the missing values. <br/><br/>Use Filter Examples with the flag attribute is not missing to reduce again to weekly timestamps.<br/><br/>Deselect the flag attribute. Now you have one ExampleSet with weekly data for both time series attributes.</description> </process> </operator> </process>Best regards,
Fabian