problems with process window operator
Hi all,
EDIT: I have found the problem.
I ran into a problem with the process window operator in the Time Series. I am trying to smooth out data on a set of different data sizes and combine them by using average. Before this process works fine, but now each time I tried run the same process, it gave me an error. As shown with the attached picture. I have verified there the example set into the operator has the correct attributes. I attached 2 files that I want to apply the process to. I have verified this process works before. All I changed was filtering out some of the reading that was not neccessary and reduce the size of each files further. The process is as follow:
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="concurrency:loop_files" compatibility="9.0.001" expanded="true" height="82" name="Loop Files" width="90" x="246" y="34">
<parameter key="directory" value="D:\Intern\WHL_East\tsne test"/>
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="9.0.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
<parameter key="column_separators" value=","/>
<parameter key="date_format" value="MM/dd/yyyy"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="set_role" compatibility="9.0.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
<parameter key="attribute_name" value="att1"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="transpose" compatibility="9.0.001" expanded="true" height="82" name="Transpose" width="90" x="313" y="34"/>
<operator activated="true" class="select_attributes" compatibility="9.0.001" expanded="true" height="82" name="Select Attributes (6)" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Time"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="time_series:process_windows" compatibility="9.0.000" expanded="true" height="82" name="Process Windows" width="90" x="648" y="34">
<parameter key="no_overlapping_windows" value="true"/>
<parameter key="create_horizon_(labels)" value="false"/>
<process expanded="true">
<operator activated="true" class="time_series:extract_std_descriptive_features" compatibility="9.0.000" expanded="true" height="82" name="Extract Aggregates" width="90" x="246" y="34">
<parameter key="sum" value="false"/>
<parameter key="geometric_mean" value="false"/>
<parameter key="first_quartile" value="false"/>
<parameter key="mode" value="false"/>
<parameter key="third_quartile" value="false"/>
<parameter key="min" value="false"/>
<parameter key="max" value="false"/>
<parameter key="std_deviation" value="false"/>
<parameter key="kurtosis" value="false"/>
<parameter key="skewness" value="false"/>
</operator>
<operator activated="true" class="transpose" compatibility="9.0.001" expanded="true" height="82" name="Transpose (2)" width="90" x="447" y="34"/>
<connect from_port="windowed example set" to_op="Extract Aggregates" to_port="example set"/>
<connect from_op="Extract Aggregates" from_port="features" to_op="Transpose (2)" to_port="example set input"/>
<connect from_op="Transpose (2)" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_windowed example set" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_port="file object" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Transpose" to_port="example set input"/>
<connect from_op="Transpose" from_port="example set output" to_op="Select Attributes (6)" to_port="example set input"/>
<connect from_op="Select Attributes (6)" from_port="example set output" to_op="Process Windows" to_port="example set"/>
<connect from_op="Process Windows" from_port="output 1" to_port="output 1"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="9.0.001" expanded="true" height="82" name="Append" width="90" x="380" y="34"/>
<connect from_op="Loop Files" from_port="output 1" to_op="Append" to_port="example set 1"/>
<connect from_op="Append" from_port="merged set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Answers
hmm @hung9022 I played with this a bit and also could not figure out why that error message is occurring. I pinged some folks in RapidMiner and hopefully we'll get an answer soon!
Scott
hi @sgenzer, I found the problem. It was because of the Time row. It was one of the changes I made to the data that I forgot about, it forced the read CSV operator to change all my data type into Polynomial and apparently Process Window Operator cannot read Polynomial data. I removed the Time row from my data file before import through read CSV operator and the process works fine now.
Regards,
ah interesting! Ok thanks for the update. I am going to tag @tftemme on this thread so he can take a look when he gets a chance (he's the author of these operators).
Happy RapidMining!
Scott
Hi @hung9022, hi @sgenzer.
You figured it out correctly. Currently all time_series operator are not yet capabale of handling nominal data. This will come in a future update.
Best regards,
Fabian
Hi Fabian-
I think I've come across this issue. The data set I'm using definitely has a timeseries aspect (macroeconomic indicators with periods of recession as the label), but when that is ignored, logistic regression seems to be the best learner. (I interpret this to mean, ignoring any forecasting, log reg is good at determining what mix of indicators are present during times of recession.)
Now I want to take time into account and try to predict what mix of indicators give rise to recession. I introduce windowing and seem to have a collision: log reg needs a nominal label but windowing can only work with numeric data.
Any thots appreciated.
@ndoromal
You can download the older time series operators from the marketplace. You can use those until the new operaters have been updated to use more than numeric data.
Thank you, Hugh!