The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Forecasting based on Two dependent variables

sunnyalsunnyal Member Posts: 44 Contributor II
edited November 2018 in Help

Hi,

 

I want to create a forecast based on two variables (i.e) weather forecast and historical actual reads. How is that implemented in Rapidminer?

Also, I was able to generate a forecast using vector linear regression for 12 months, but the forecast is way out of range, what techniques can we leverage to improve the forecast?


Further how do we use 2 variables for windowing technique. I don’t see this operator "MultivariateSeries2WindowExamples" in Rapidminer 7.5 as discussed in some examples here. I'm I missing something. 

 

Also, Is R forecast model accurate than Rapidminer windowing technique. If yes how to use R model in Rapisminer (do you have any sample R forecast model I can leverage)

 

Thx in dvance

S

Tagged:

Best Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    Solution Accepted

    So a couple of things, that video of mine is available on my YouTube channel here: https://www.youtube.com/watch?v=UmGIGEJMmN8&t=2s

     

    and with respect to power consumption, you might want to check out this paper on using RM and SVM's to forecast electricity consumption. It starts on page 46 or thereabouts. I would add in your weather as an attribute and the use your power consumption as a label. 

     

    From there build a process like this that loads and ETL's your data and use a Windowing and Sliding Validation operator. Insert a SVM set to RBF kernel and then optimize around the gamma and C parameters. 

     

    For a sample process you can try this process in this thread: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Financial-Time-Series-Prediction/m-p/33456

     

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    Solution Accepted

    You should see this if you are using version 6+

    SVM.png

     

     

     

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    The old Multivariate2examples operator is now called the Windowing operator, so it's still there. 

     

    With respect to algorithms, you might want to examine using a SVM with and RBF kernel, optimized around gamma and C parameters. There's been some research around them that says they're pretty good for time series modeling. I have some links on my site here: http://www.neuralmarkettrends.com/using-svm-kernels-for-time-series-analysis/

     

    With respect to multiple labels, with the Windowing operator you have to select one attribute column as your label but your other variable can be loaded into the window model as well. Windowing in Rapidminer is considered a "cross sectional based" approach of building a forecast, which is different than say a ARIMA forecast. Here's another great read by Simafore: http://www.simafore.com/blog/bid/109175/Time-Series-Forecasting-using-RapidMiner-for-cost-modeling-2-of-2

     

    You can use an R package to make predictions and feed them into your RapidMiner window as well. 

     

  • sunnyalsunnyal Member Posts: 44 Contributor II

    Thank you for the quick reply sir.

     

    I see a reference to a video regarding forecasting using SVM. http://www.neuralmarkettrends.com/wp-content/uploads/Rapidminer5-vid10.mp4, which apparently isn’t working. Is there a sample process that you can share, which illustrates how a SVM forecast is done using a time series data?

     

    Also with respect to multiple dependent variables. I have one historical readings and the other future (forecasted), I’m not sure how we design a process that has a historical attribute column as a label and forecasted values as other variable into the window model. Basically, I want to predict the forecast for next 12 months of power consumption based on forecasted weather, as you can see power and weather are directly proportionate .

     

    Would you be kind enough to share a sample process.

     

    As Always, Many Thanks for your time

  • sunnyalsunnyal Member Posts: 44 Contributor II
    Sir,

    Would appreciate your valuable input on the below.

    designing a process that has a historical attribute column as a label and forecasted values as other variable into the window model ? E.g. Historical power readings -as Label and forecasted weather readings as other varibale

    Thx
    Raj
  • sunnyalsunnyal Member Posts: 44 Contributor II

    Thank you sir. I try to follow along and see how it goes.

  • sunnyalsunnyal Member Posts: 44 Contributor II

    Sir,

    Is there a way we can train the model for 2 years worth of history? such that the weather seasonaility is accounted in the power consumption forecast? Further, is there a easy way to create dummy variables for Testing data set that conatins futuer dates to predict the power consumption.

     

    Also, with LibSVM, which has RBF kernal, is giving quite a challenge. I couldnt run it successfully, i'm having problmes with LibSVM data type and performance operator, both doenst seem work together. Do you happen to have any examples of LibSVM with RBF kernal that i can use?

     

    Thx

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I use the RapidMiner SVM instead of the LibSVM. I find it to be easier.

     

    You can train the model on 2 years of data, just toggle on culmulative training on the Sliding Window Validation operator. 

     

    There is a Generate Date operator that might let you create dummy testing data. 

  • sunnyalsunnyal Member Posts: 44 Contributor II

    I guess i'm missing soemthing here. I do not see RBF kernal in SVM, I only see in LibSVM. Is that correct?

  • sunnyalsunnyal Member Posts: 44 Contributor II

    Thank you I just figuered out that radil kernal is ntohing but  rbf. 

     

    Thx

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Its my mistake, I should have told you that RBF stands for Radial Basis Function. 

  • sunnyalsunnyal Member Posts: 44 Contributor II

    No worries sir.

     

    But I really need your advice here. I'm compltely lost with forecasted numbers of Rapidminer. I have also tried optimizing the parameters for SVM , but no luck. My performance is still same with/without optimizing parameters.

     

    I'm attaching you the source files and my process export. I have also attached some of my questions along with the screenshots in the AnalysisQustions.doc.

     

     

    As Always Many Thanks for your time

    Raj

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    From your process, i don't see any optimization. What parameters beside C and gamma did you try to optimize?

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

     I'm running some optimization and it appears that your Window Size and Training Window Width gives a big boost to performance. I'm doing a pretty big optimzation so it's taking like 3 hours to run but it's leading me to believe that you need to build a bigger cross section of data to train on. 

     

    Time Series Pred.png

     

     

     

     

  • sunnyalsunnyal Member Posts: 44 Contributor II

    Thanks Sir. Apparently my results seem to be better using KNN compared to SVM. I currently have a below accuracy with KNN.  

    prediction_trend_accuracy: 0.695 +/- 0.021 (mikro: 0.695). Can you recommend some parameter optimizations for finding an optimal K value?

     

    I'm kind of lost when the most research papers recommend SVM and apparently my dataset clearly wouldn’t work with SVM. I guess I’m lost:-)

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Well with k-nn you might want to use Numerical Measures, and optimize around K and distance measures (i.e. cosine.manhattan, etc). You could also try the GLM modeler too. 

  • sunnyalsunnyal Member Posts: 44 Contributor II

    Thank you sir. I will keep you posted on my optimization.

  • sunnyalsunnyal Member Posts: 44 Contributor II

    Sir, I made a quite a improvement on this model. I'm seeking your assistance in operationalizing this model

     

    Based on the research paper that you sighted earlier http://community.rapidminer.com/ejhxb44622/attachments/ejhxb44622/GettingStartForum/629/1/proceedings_rcomm_2010.pdf .

    On page 46 under section “ The application is used in the following sequence“ I need clarity on the step (2) which states “Testing data table is refreshed with new dummy variables, values for label is omitted.

     

    How do we generate this testing data set? I believe my forecast dataset would look something like below. When I pass this data the model fails to apply, since the training data set and testing data set are different (we do not have values for label), we only have weather forecast.

    Forecast dates.JPG

    Do you have any example process like this that I can infer from? I would appreciate your input on this.

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    SO when you train the model you will need a label, but for testing you need to omit that label. Here's a sample to try out. 

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <context>
    <input/>
    <output/>
    <macros>
    <macro>
    <key>horizon</key>
    <value>10</value>
    </macro>
    <macro>
    <key>symbol</key>
    <value>GPRO</value>
    </macro>
    <macro>
    <key>start_date</key>
    <value>2017-01-01</value>
    </macro>
    <macro>
    <key>end_date</key>
    <value>2017-05-01</value>
    </macro>
    <macro>
    <key>window_size</key>
    <value>1</value>
    </macro>
    </macros>
    </context>
    <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="open_file" compatibility="7.5.001" expanded="true" height="68" name="Open File" width="90" x="45" y="34">
    <parameter key="resource_type" value="URL"/>
    <parameter key="url" value="http://www.google.com/finance/historical?q=NASDAQ:%{symbol}&amp;amp;ei=Ylb_WOBdzbmZAcqZn9AG&amp;amp;output=csv"/>
    </operator>
    <operator activated="true" class="read_csv" compatibility="7.5.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="136">
    <parameter key="csv_file" value="C:\Users\THOMAS~1\AppData\Local\Temp\rm_file_2048734474644171292.dump"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Date.true.polynominal.attribute"/>
    <parameter key="1" value="Open.false.real.attribute"/>
    <parameter key="2" value="High.false.real.attribute"/>
    <parameter key="3" value="Low.false.real.attribute"/>
    <parameter key="4" value="Close.true.real.attribute"/>
    <parameter key="5" value="Volume.false.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="nominal_to_date" compatibility="7.5.001" expanded="true" height="82" name="Nominal to Date" width="90" x="179" y="34">
    <parameter key="attribute_name" value="Date"/>
    <parameter key="date_format" value="dd-MMM-yy"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.5.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="136">
    <parameter key="attribute_name" value="Date"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="false" class="rename" compatibility="7.5.001" expanded="true" height="82" name="Rename" width="90" x="45" y="289">
    <parameter key="old_name" value="%{symbol}_CLOSE"/>
    <parameter key="new_name" value="Close"/>
    <list key="rename_additional_attributes"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="238">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Close|Date"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="6.4.000" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="340">
    <parameter key="condition_class" value="no_missing_attributes"/>
    <list key="filters_list"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.5.001" expanded="true" height="82" name="Multiply" width="90" x="179" y="442"/>
    <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing" width="90" x="380" y="34">
    <parameter key="window_size" value="%{window_size}"/>
    <parameter key="create_label" value="true"/>
    <parameter key="label_attribute" value="Close"/>
    <parameter key="horizon" value="%{horizon}"/>
    </operator>
    <operator activated="false" class="optimize_parameters_grid" compatibility="7.5.001" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="514" y="187">
    <list key="parameters">
    <parameter key="SVM.kernel_gamma" value="[0.0001;1000;5;logarithmic]"/>
    <parameter key="SVM.C" value="[1;10000;5;logarithmic]"/>
    <parameter key="Validation.training_window_width" value="[4;10;3;linear]"/>
    <parameter key="Validation.test_window_width" value="[4;10;3;linear]"/>
    </list>
    <process expanded="true">
    <operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="112" y="34">
    <parameter key="training_window_width" value="4"/>
    <parameter key="training_window_step_size" value="1"/>
    <parameter key="test_window_width" value="4"/>
    <parameter key="horizon" value="%{horizon}"/>
    <process expanded="true">
    <portSpacing port="source_training" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="series:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
    <parameter key="horizon" value="%{horizon}"/>
    <parameter key="main_criterion" value="prediction_trend_accuracy"/>
    <parameter key="prediction_trend_accuracy" value="false"/>
    </operator>
    <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_averagable 1" spacing="0"/>
    <portSpacing port="sink_averagable 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="log" compatibility="7.5.001" expanded="true" height="82" name="Log" width="90" x="313" y="136">
    <parameter key="filename" value="tmp"/>
    <list key="log">
    <parameter key="Gamma" value="operator.SVM.parameter.kernel_gamma"/>
    <parameter key="C" value="operator.SVM.parameter.C"/>
    <parameter key="Training Width" value="operator.Validation.parameter.training_window_width"/>
    <parameter key="Testing Width" value="operator.Validation.parameter.test_window_width"/>
    <parameter key="Forecast Perf" value="operator.Validation.value.performance"/>
    </list>
    <parameter key="sorting_type" value="top-k"/>
    <parameter key="sorting_dimension" value="Forecast Perf"/>
    <parameter key="sorting_k" value="10"/>
    <parameter key="persistent" value="true"/>
    </operator>
    <connect from_port="input 1" to_op="Validation" to_port="training"/>
    <connect from_op="Validation" from_port="model" to_port="result 1"/>
    <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
    <connect from_op="Log" from_port="through 1" to_port="performance"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_performance" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing (2)" width="90" x="380" y="136">
    <parameter key="window_size" value="%{window_size}"/>
    <parameter key="label_attribute" value="Close"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="7.5.001" expanded="true" height="68" name="Extract Macro" width="90" x="380" y="238">
    <parameter key="macro" value="n_examples"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="generate_macro" compatibility="7.5.001" expanded="true" height="82" name="Generate Macro" width="90" x="380" y="340">
    <list key="function_descriptions">
    <parameter key="filter_range" value="eval(%{n_examples})-1"/>
    </list>
    </operator>
    <operator activated="true" class="sort" compatibility="7.5.001" expanded="true" height="82" name="Sort (2)" width="90" x="380" y="442">
    <parameter key="attribute_name" value="Date"/>
    </operator>
    <operator activated="true" class="filter_example_range" compatibility="7.5.001" expanded="true" height="82" name="Filter Example Range" width="90" x="514" y="442">
    <parameter key="first_example" value="1"/>
    <parameter key="last_example" value="%{filter_range}"/>
    <parameter key="invert_filter" value="true"/>
    </operator>
    <operator activated="true" class="remember" compatibility="7.5.001" expanded="true" height="68" name="Remember" width="90" x="648" y="442">
    <parameter key="name" value="LastRow"/>
    </operator>
    <operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Deep Learning Window" width="90" x="514" y="34">
    <parameter key="training_window_width" value="8"/>
    <parameter key="training_window_step_size" value="1"/>
    <parameter key="test_window_width" value="8"/>
    <parameter key="horizon" value="%{horizon}"/>
    <process expanded="true">
    <operator activated="true" class="h2o:deep_learning" compatibility="7.5.000" expanded="true" height="82" name="Deep Learning" width="90" x="238" y="34">
    <enumeration key="hidden_layer_sizes">
    <parameter key="hidden_layer_sizes" value="50"/>
    <parameter key="hidden_layer_sizes" value="50"/>
    </enumeration>
    <enumeration key="hidden_dropout_ratios"/>
    <list key="expert_parameters"/>
    <list key="expert_parameters_"/>
    </operator>
    <operator activated="false" class="support_vector_machine" compatibility="7.5.001" expanded="true" height="124" name="SVM" width="90" x="246" y="136">
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="0.1"/>
    <parameter key="C" value="100.0"/>
    </operator>
    <connect from_port="training" to_op="Deep Learning" to_port="training set"/>
    <connect from_op="Deep Learning" from_port="model" to_port="model"/>
    <portSpacing port="source_training" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model (3)" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="series:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="34">
    <parameter key="horizon" value="%{horizon}"/>
    <parameter key="main_criterion" value="prediction_trend_accuracy"/>
    </operator>
    <connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
    <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_averagable 1" spacing="0"/>
    <portSpacing port="sink_averagable 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="loop" compatibility="7.5.001" expanded="true" height="82" name="Loop" width="90" x="648" y="34">
    <parameter key="set_iteration_macro" value="true"/>
    <parameter key="macro_name" value="loop_forecasts"/>
    <parameter key="iterations" value="%{horizon}"/>
    <process expanded="true">
    <operator activated="true" class="recall" compatibility="7.5.001" expanded="true" height="68" name="Recall" width="90" x="45" y="85">
    <parameter key="name" value="LastRow"/>
    <parameter key="remove_from_store" value="false"/>
    </operator>
    <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" width="90" x="246" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.5.001" expanded="true" height="82" name="Generate Attributes" width="90" x="380" y="34">
    <list key="function_descriptions">
    <parameter key="Date" value="date_add(Date,eval(%{loop_forecasts}),DATE_UNIT_DAY)"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="5.3.013" expanded="true" height="82" name="Set Role (2)" width="90" x="514" y="34">
    <parameter key="attribute_name" value="prediction(label)"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="648" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="prediction(label)"/>
    </operator>
    <operator activated="true" class="replace" compatibility="7.5.001" expanded="true" height="82" name="Replace" width="90" x="782" y="34">
    <parameter key="replace_what" value="Close"/>
    <parameter key="replace_by" value="$1-"/>
    </operator>
    <operator activated="true" class="rename" compatibility="7.5.001" expanded="true" height="82" name="Rename (2)" width="90" x="916" y="34">
    <parameter key="old_name" value="prediction(label)"/>
    <parameter key="new_name" value="Close"/>
    <list key="rename_additional_attributes"/>
    </operator>
    <operator activated="true" class="materialize_data" compatibility="7.5.001" expanded="true" height="82" name="Materialize Data (2)" width="90" x="1050" y="34"/>
    <connect from_port="input 1" to_op="Apply Model" to_port="model"/>
    <connect from_op="Recall" from_port="result" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
    <connect from_op="Set Role (2)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
    <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
    <connect from_op="Rename (2)" from_port="example set output" to_op="Materialize Data (2)" to_port="example set input"/>
    <connect from_op="Materialize Data (2)" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="7.5.001" expanded="true" height="82" name="Append" width="90" x="782" y="34"/>
    <operator activated="true" class="sort" compatibility="7.5.001" expanded="true" height="82" name="Sort" width="90" x="916" y="34">
    <parameter key="attribute_name" value="Date"/>
    </operator>
    <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
    <connect from_op="Read CSV" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
    <connect from_op="Nominal to Date" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Windowing" to_port="example set input"/>
    <connect from_op="Windowing" from_port="example set output" to_op="Deep Learning Window" to_port="training"/>
    <connect from_op="Windowing" from_port="original" to_op="Windowing (2)" to_port="example set input"/>
    <connect from_op="Windowing (2)" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
    <connect from_op="Generate Macro" from_port="through 1" to_op="Sort (2)" to_port="example set input"/>
    <connect from_op="Sort (2)" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
    <connect from_op="Filter Example Range" from_port="example set output" to_op="Remember" to_port="store"/>
    <connect from_op="Deep Learning Window" from_port="model" to_op="Loop" to_port="input 1"/>
    <connect from_op="Deep Learning Window" from_port="averagable 1" to_port="result 1"/>
    <connect from_op="Loop" from_port="output 1" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

     

  • sunnyalsunnyal Member Posts: 44 Contributor II

    Thank you sir. 

     

    In your process I dont see the option of reading testing data set contaiing just forecasted weather and date? I want to forecast based on my forecasted weather data?

  • sunnyalsunnyal Member Posts: 44 Contributor II

    Also when I exclude the load values column, it states that input example set doesn’t not match with training example set. As of now I’m thinking to generate a dummy load values and input the model, just to get the required format for the model to forecast. I’m heading in the right direction here?

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Just disconnect the 2nd Windowing operator from the main process branch and load in your test data there. Just make sure all the preprocessing is the same. Then check the order of execution and run it. 

     

    That lower branch generates the 'dummy' dates that you use to forecast out into the future BUT your test data will need to contain your temperature and other input variables. 

  • sunnyalsunnyal Member Posts: 44 Contributor II

    Dear Sir,

    Please excuse me. I'm not able to follow this logic.

    Each day we get weather forecasted results, which are stored in a Database. I'm directly reading the database for testing data and trying to generate the load forecast. However, since I only have weather forecast. The model is failing at apply model ( since the system load is missing in the training data set).

     

    Below is my code. Would you kindly let me know what needs to be done here to make it forecast 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="7.5.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
    <parameter key="excel_file" value="W:\Raj Stuff\Raj\DMAAP\Predictive Maintenance\Rapidminer\APS Sample Data\Analysis_Hourly_Load_Weather.xlsx"/>
    <parameter key="imported_cell_range" value="A1:K20425"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="date_time.true.date_time.attribute"/>
    <parameter key="1" value="system_load.true.integer.attribute"/>
    <parameter key="2" value="Flag_temp.true.integer.attribute"/>
    <parameter key="3" value="Yuma_temp.true.integer.attribute"/>
    <parameter key="4" value="PHX_temp.true.integer.attribute"/>
    <parameter key="5" value="Flag_RH.true.integer.attribute"/>
    <parameter key="6" value="Yuma_RH.true.integer.attribute"/>
    <parameter key="7" value="PHX_RH.true.integer.attribute"/>
    <parameter key="8" value="Flag_CloudPct.true.integer.attribute"/>
    <parameter key="9" value="Yuma_CloudPct.true.integer.attribute"/>
    <parameter key="10" value="PHX_CloudPct.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.5.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
    <parameter key="attribute_name" value="date_time"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing" width="90" x="313" y="34">
    <parameter key="window_size" value="1"/>
    <parameter key="create_single_attributes" value="false"/>
    <parameter key="create_label" value="true"/>
    <parameter key="label_attribute" value="system_load"/>
    </operator>
    <operator activated="false" class="optimize_parameters_grid" compatibility="7.5.001" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="447" y="442">
    <list key="parameters">
    <parameter key="k-NN (3).k" value="[1.0;50;10;linear]"/>
    <parameter key="k-NN (3).numerical_measure" value="EuclideanDistance,CamberraDistance,ChebychevDistance,CorrelationSimilarity,DiceSimilarity,DynamicTimeWarpingDistance,InnerProductSimilarity,JaccardSimilarity,KernelEuclideanDistance,ManhattanDistance,MaxProductSimilarity,OverlapSimilarity,CosineSimilarity"/>
    </list>
    <process expanded="true">
    <operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="246" y="34">
    <parameter key="training_window_width" value="24"/>
    <parameter key="training_window_step_size" value="1"/>
    <parameter key="test_window_width" value="20"/>
    <process expanded="true">
    <operator activated="true" class="k_nn" compatibility="7.5.001" expanded="true" height="82" name="k-NN (3)" width="90" x="179" y="34">
    <parameter key="k" value="50"/>
    <parameter key="measure_types" value="NumericalMeasures"/>
    <parameter key="numerical_measure" value="CosineSimilarity"/>
    </operator>
    <connect from_port="training" to_op="k-NN (3)" to_port="training set"/>
    <connect from_op="k-NN (3)" from_port="model" to_port="model"/>
    <portSpacing port="source_training" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="series:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (2)" width="90" x="179" y="34">
    <parameter key="horizon" value="1"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
    <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_averagable 1" spacing="0"/>
    <portSpacing port="sink_averagable 2" spacing="0"/>
    </process>
    </operator>
    <connect from_port="input 1" to_op="Validation" to_port="training"/>
    <connect from_op="Validation" from_port="model" to_port="result 1"/>
    <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_performance" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="vector_linear_regression" compatibility="7.5.001" expanded="true" height="82" name="Vector Linear Regression (2)" width="90" x="447" y="85"/>
    <operator activated="true" breakpoints="after" class="jdbc_connectors:read_database" compatibility="7.5.001" expanded="true" height="68" name="Read Database" width="90" x="45" y="289">
    <parameter key="connection" value="PT_OPS"/>
    <parameter key="query" value="SELECT &quot;DateTime&quot; date_time, &quot;Flag_temp&quot;, &quot;Yuma_temp&quot;, &quot;PHX_temp&quot;, &quot;Flag_RH&quot;, &quot;Yuma_RH&quot;, &quot;PHX_RH&quot;, &quot;Flag_CloudPct&quot;, &quot;Yuma_CloudPct&quot;, &quot;PHX_CloudPct&quot;&#10;FROM &quot;dbo&quot;.&quot;WSIForecast&quot; &#10;where&#10;DateTime &gt;= DATEADD(day, 0, convert(date, GETDATE())) and datetime&lt; DATEADD(day, +1, convert(date, GETDATE()))"/>
    <enumeration key="parameters"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.5.001" expanded="true" height="82" name="Set Role (2)" width="90" x="179" y="289">
    <parameter key="attribute_name" value="date_time"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing (2)" width="90" x="380" y="289">
    <parameter key="window_size" value="1"/>
    <parameter key="create_single_attributes" value="false"/>
    <parameter key="label_attribute" value="system_load"/>
    </operator>
    <operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model: make predictions" width="90" x="648" y="238">
    <list key="application_parameters"/>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Windowing" to_port="example set input"/>
    <connect from_op="Windowing" from_port="example set output" to_op="Vector Linear Regression (2)" to_port="training set"/>
    <connect from_op="Vector Linear Regression (2)" from_port="model" to_op="Apply Model: make predictions" to_port="model"/>
    <connect from_op="Vector Linear Regression (2)" from_port="exampleSet" to_port="result 1"/>
    <connect from_op="Read Database" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
    <connect from_op="Set Role (2)" from_port="example set output" to_op="Windowing (2)" to_port="example set input"/>
    <connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model: make predictions" to_port="unlabelled data"/>
    <connect from_op="Apply Model: make predictions" from_port="labelled data" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>
Sign In or Register to comment.