The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Using SVM to predict a new row
Hi,
I would like to ask for help to build another prediction model, using SVM or other it is fine.
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Root">
<parameter key="random_seed" value="1969"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.2.001" expanded="true" height="68" name="Retrieve Polynomial" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Polynomial"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="8.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
<list key="function_descriptions">
<parameter key="newval" value="a1"/>
</list>
</operator>
<operator activated="true" breakpoints="after" class="transpose" compatibility="8.2.001" expanded="true" height="82" name="Transpose" width="90" x="313" y="34"/>
<operator activated="true" class="concurrency:loop_attributes" compatibility="8.2.001" expanded="true" height="82" name="Loop Attributes" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="attributes" value="|cluster"/>
<parameter key="regular_expression" value="newval.*"/>
<process expanded="true">
<operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role" width="90" x="45" y="34">
<parameter key="attribute_name" value="%{loop_attribute}"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="multiply" compatibility="8.2.001" expanded="true" height="103" name="Multiply" width="90" x="179" y="34"/>
<operator activated="true" class="support_vector_machine" compatibility="8.2.001" expanded="true" height="124" name="SVM" width="90" x="514" y="34"/>
<operator activated="true" class="apply_model" compatibility="8.2.001" expanded="true" height="82" name="Apply Model" width="90" x="380" y="289">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="715" y="85">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="newval.*"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<connect from_port="input 1" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="SVM" to_port="training set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="SVM" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="8.2.001" expanded="true" height="82" name="Append" width="90" x="648" y="34"/>
<connect from_op="Retrieve Polynomial" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Transpose" to_port="example set input"/>
<connect from_op="Transpose" from_port="example set output" to_op="Loop Attributes" to_port="input 1"/>
<connect from_op="Loop Attributes" from_port="output 1" to_op="Append" to_port="example set 1"/>
<connect from_op="Append" from_port="merged set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Above is the process i am using currently. As i understood SVM learn operator, it will create a model based on the row behaviour of the data set. What I need to is a model based on the behaviour of the column of data set. I tried to transpose my data but that would make me lose the label which I need for prediction. So for my sample process above, I need the predicted result for att_201 based on the behaviour of the data set before the first transpose.
Tagged:
0
Answers
Hi @hung9022,
I'm not sure to understand, but I assume your problem is a "time-series" study, isn't it ?
You want to predict the (N+1)-th value of your attribute "atti" based on the "history" of the attribute atti, that is to say
the first to the N-th values of attribute "atti" ?
If it is the case, you can take a look at the extension "Time series" (to install from Marketplace if Rm's Version <9.0 / directly implemented in RM 9.0).
If I'm misunderstood, can you explain more explicitly, by sharing your dataset and giving an example of what you want to obtain.
I hope it helps,
Regards,
Lionel
hi @lionelderkrikor,
I have looked at the time-series you mentioned but that is not what i wanted, although it is close. I have uploaded 3 pictures to describe what i intend to do. The first is a screenshot of the transposed Polynomial data example, which represent the data i have. If i feed this transposed data set to a predictive learner, i.e. SVM, the operator will build 7 models based on the number of row to predict a new value, let say att_10 since it is not in the screenshot, as show in the "Normal model Prediction. What i need is a process that predict the new attribute based on the behaviour in each column as show in "What i want.png". It may be there is a set up in the time-series you mentioned but I am still new to Rapidminer so I haven't exactly figured out all of its function.
Regards,
Hi @hung9022,
1."What i need is a process that predict the new attribute based on the behaviour in each column..."
Based, on your screenshot, you want to generate and predict the values of attribute att_10, based
on the values of att_1 to att_9 ? That's impossible as is.
First, you have to build a model (for example SVM) based on a labeled dataset. It means that you need to have a dataset with the
values of attributes att_1 to att_9 and the associated values of att_10 (which is called the "label").
Once you have built the model, you can predict the att_10 values by applying the model to a new dataset which contains new values of att_1 to att_9.
2. That's why, I allow myself to insist, your description make me think that you want perform a time-series study.
In this case, you need to have a timestamp (or maybe just an Id).
To help you better, can you share your original dataset ?
I hope it helps,
Regards,
Lionel
Hi @lionelderkrikor,
this is the original data set, the first attribute is imported as ID. For this data set, i will remove the first row and use it as my target for prediction. If I were to use this data as a time series, how would I set it up with the time-series extension for prediction?
Regards,
Hi @hung9022,
I can't import your .csv in RapidMiner.
I think it is because you haven't attribute name in the first row (you have currently only some "9.326" and "118.691").
Can you correct it ?
Moreover the ID has to be "numeric" (Id = 1,2,3,4,5, etc.) for a time-series problem. (It can't be "Wxxxxx").
Regards,
Lionel
Hi @lionelderkrikor,
How about this?
Hi @hung9022,
How said previously, I treated your problem as a time-series problem :
- The Id (1,2,3,4 etc.) is used as timestamp. I chose arbitrarily that Id corresponds to days.
- I select the kNN model because it is much more adapted than the SVM model to your data.(Performance measured by RMSE).
- I used a Loop Attributes to perform the forecasting of all your attributes
Here a screenshot of the forecast of your six first attributes (see row 1) :
The process :
hi @lionelderkrikor,
Thanks for your help, your solution comes quite close to what I wanted to do. I will try to figure out the rest. Sorry for the late reply since I did not have access to Internet to till now.
Regards,