The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
I want to predict a value by another values
davidraul36
Member Posts: 6 Learner III
Hello, I'm very newbie to RapidMiner and data science as well so bear me please.
I want to predict values from totally different values, it's like trying to finding a model for the relation between them.
For Example;
I have Excel spreedsheet with cloumns (A, B, C, D, F)
I want to use (A, B, C, D) to predict or getting model for the values in (F) then use it to test data...
Thanks in advance,
Tagged:
0
Best Answer
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
@davidraul36 Here's what I would do. Clean up the date and time attributes and use a different algo. 74% trend accuracy and you can most likely optimize that with Optimize Parameters.
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
<parameter key="excel_file" value="C:\Users\Thomas Ott\Desktop\Feed.xls"/>
<parameter key="imported_cell_range" value="A1:G3000"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="date_format" value="yyyy.mm.dd"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Date.true.date.attribute"/>
<parameter key="1" value="Time.true.time.attribute"/>
<parameter key="2" value="Open.true.real.attribute"/>
<parameter key="3" value="High.true.real.attribute"/>
<parameter key="4" value="Low.true.real.attribute"/>
<parameter key="5" value="Close.true.real.attribute"/>
<parameter key="6" value="Avg.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="generate_concatenation" compatibility="8.0.001" expanded="true" height="82" name="Generate Concatenation" width="90" x="179" y="34">
<parameter key="first_attribute" value="Date"/>
<parameter key="second_attribute" value="Time"/>
<parameter key="separator" value=" "/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Date||Time"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="nominal_to_date" compatibility="8.0.001" expanded="true" height="82" name="Nominal to Date" width="90" x="447" y="34">
<parameter key="attribute_name" value="Date Time"/>
<parameter key="date_type" value="date_time"/>
<parameter key="date_format" value="MMM dd, yyyy H:mm:ss"/>
</operator>
<operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="289">
<parameter key="excel_file" value="C:\Users\Thomas Ott\Desktop\test.xls"/>
<parameter key="imported_cell_range" value="A1:G703"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="date_format" value="yyyy.mm.dd"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Date.true.date.attribute"/>
<parameter key="1" value="Time.true.time.attribute"/>
<parameter key="2" value="Open.true.real.attribute"/>
<parameter key="3" value="High.true.real.attribute"/>
<parameter key="4" value="Low.true.real.attribute"/>
<parameter key="5" value="Close.true.real.attribute"/>
<parameter key="6" value="Avg.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role" width="90" x="581" y="34">
<parameter key="attribute_name" value="Date Time"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles">
<parameter key="Avg" value="label"/>
</list>
</operator>
<operator activated="true" class="sort" compatibility="8.0.001" expanded="true" height="82" name="Sort" width="90" x="715" y="34">
<parameter key="attribute_name" value="Date Time"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing" width="90" x="849" y="34">
<parameter key="window_size" value="1"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="Avg"/>
</operator>
<operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="983" y="34">
<parameter key="training_window_width" value="10"/>
<parameter key="training_window_step_size" value="5"/>
<parameter key="test_window_width" value="20"/>
<parameter key="horizon" value="5"/>
<parameter key="average_performances_only" value="false"/>
<process expanded="true">
<operator activated="true" class="h2o:deep_learning" compatibility="7.6.001" expanded="true" height="82" name="Deep Learning" width="90" x="240" y="34">
<enumeration key="hidden_layer_sizes">
<parameter key="hidden_layer_sizes" value="50"/>
<parameter key="hidden_layer_sizes" value="50"/>
</enumeration>
<enumeration key="hidden_dropout_ratios"/>
<list key="expert_parameters"/>
<list key="expert_parameters_"/>
</operator>
<connect from_port="training" to_op="Deep Learning" to_port="training set"/>
<connect from_op="Deep Learning" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="series:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
<parameter key="horizon" value="1"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="generate_concatenation" compatibility="8.0.001" expanded="true" height="82" name="Generate Concatenation (2)" width="90" x="179" y="289">
<parameter key="first_attribute" value="Date"/>
<parameter key="second_attribute" value="Time"/>
<parameter key="separator" value=" "/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="313" y="289">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Date||Time"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role (2)" width="90" x="447" y="289">
<parameter key="attribute_name" value="Date Time"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles">
<parameter key="Avg" value="dummy"/>
</list>
</operator>
<operator activated="true" class="sort" compatibility="8.0.001" expanded="true" height="82" name="Sort (2)" width="90" x="581" y="289">
<parameter key="attribute_name" value="Date Time"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="715" y="289">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Avg"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing (2)" width="90" x="849" y="187">
<parameter key="window_size" value="1"/>
<parameter key="label_attribute" value="Avg"/>
<parameter key="horizon" value="0"/>
</operator>
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="1184" y="187">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="join" compatibility="8.0.001" expanded="true" height="82" name="Join" width="90" x="1318" y="289">
<list key="key_attributes"/>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Generate Concatenation" to_port="example set input"/>
<connect from_op="Generate Concatenation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Date" to_port="example set input"/>
<connect from_op="Nominal to Date" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Read Excel (2)" from_port="output" to_op="Generate Concatenation (2)" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Windowing" to_port="example set input"/>
<connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Validation" from_port="training" to_port="result 1"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
<connect from_op="Generate Concatenation (2)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
<connect from_op="Select Attributes (3)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Sort (2)" to_port="example set input"/>
<connect from_op="Sort (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Windowing (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="original" to_op="Join" to_port="right"/>
<connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Join" to_port="left"/>
<connect from_op="Join" from_port="join" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>1
Answers
Hi @davidraul36,
Can you share your dataset(s) please ?
Regards,
Lionel
Here it's the data I use,
I want to find a model which finds the values of column "Avg" from all the other columns.
You should check out the "Getting Started" videos on the rapidminer.com webpage, they are designed to help you get started with a basic predictive modeling project such as this one. You will need to define your "label" (the thing you are trying to predict) first.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
@davidraul36 I would do what @Telcontar120 suggests, review some videos and try out the tutorials that are built into Studio itself. Then build a process and if you get stuck, post that XML to the community for help.
I already tried to do a model, but my model use the previous data of "Avg" to predict the next one.
I don't know what to do in the design to let "column (Avg)" as only a prediction without getting any info from it or its previous values.
@davidraul36 I see that you set this up as a time series problem. Was there a particular reason to seperate the time and date columns?
Since it's a direct time series problem, I have tried time series examples.
I was trying to predict the moving average values, instead of common lag.
I have tried another model, by selecting "Avg" as label and all other columns as "attributes" then use any operators like Neural, SVM, then apply model on test data...
So is that OK?
Sorry for my newbie behaviour
here it's the XML
This should work but your trend accuracy sucks now. So what was screwing this up was how you transformed your AVG attribute into the label. I made some small modifications and dropped out the AVG column from the test set (cause that's what you want to test). If you want to compare the test set AVG with what's predicted, then set the AVG attribute as a 'dummy' role. See the next process below this one.
With Dummy Role
The more I look at this, the more I think you need to use a Sort operator to feed in the time series correctly. I wouldn't split the Date and Time into two units, RapidMiner can easily understand date-time together.
Thank you so much for spending so much time helping me, I really appreciate that.
Great Software and Great community!
I'm just curious about why the chart doesn't plot smoothly.
However,
Thank you so much,
Kindest regards,
@davidraul36 That's probably because you have AVG values for each hour in your date-time. Rolled up to daily value you'd get the standard daily moving average. I would use an Aggregate operator for that.