The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[Solved] A logical challenge
Dear all,
today I come up with question which I assume can be solved with RapidMiner for sure but I don't get the logic approach...
I have this kind of table structure:
row predic date v1 v2 tag
1 6211 2006 5861 87 v1
2 6215 2007 6010 91 v1
3 105 2006 5845 100 v2
4 98 2007 5495 88 v2
And I am looking for a process that does the following calculation for all rows:
predic = (predic - attribute column named in tag) / attribute column named in tag
Example with numbers:
for row 1: predic new = (6211 - 5861) / 5861
for row 4: predic new = (98 - 88) / 88
The major challenge to me is how to receive the value of an attribute which is defined by a tag in another attribute.
I need the result of the calculated data in the same attribute ("predic") because I want to pivotize the collection afterwards.
Cheers
Sachs
today I come up with question which I assume can be solved with RapidMiner for sure but I don't get the logic approach...
I have this kind of table structure:
row predic date v1 v2 tag
1 6211 2006 5861 87 v1
2 6215 2007 6010 91 v1
3 105 2006 5845 100 v2
4 98 2007 5495 88 v2
And I am looking for a process that does the following calculation for all rows:
predic = (predic - attribute column named in tag) / attribute column named in tag
Example with numbers:
for row 1: predic new = (6211 - 5861) / 5861
for row 4: predic new = (98 - 88) / 88
The major challenge to me is how to receive the value of an attribute which is defined by a tag in another attribute.
I need the result of the calculated data in the same attribute ("predic") because I want to pivotize the collection afterwards.
Cheers
Sachs
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
<process expanded="true" height="672" width="748">
<operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
<operator activated="true" class="select_attributes" compatibility="5.2.003" expanded="true" height="76" name="Select Attributes (2)" width="90" x="179" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="label"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.2.003" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
<operator activated="true" class="loop_parameters" compatibility="5.2.003" expanded="true" height="94" name="Loop Parameters" width="90" x="447" y="30">
<list key="parameters">
<parameter key="Windowing.label_attribute" value="att1,att2"/>
</list>
<process expanded="true" height="416" width="748">
<operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing (2)" width="90" x="45" y="165">
<parameter key="window_size" value="1"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing" width="90" x="45" y="30">
<parameter key="horizon" value="1"/>
<parameter key="window_size" value="1"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="att2"/>
</operator>
<operator activated="true" class="series:sliding_window_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation" width="90" x="179" y="30">
<parameter key="training_window_width" value="20"/>
<parameter key="training_window_step_size" value="5"/>
<parameter key="test_window_width" value="20"/>
<parameter key="horizon" value="5"/>
<process expanded="true" height="371" width="321">
<operator activated="true" class="support_vector_machine" compatibility="5.2.003" expanded="true" height="112" name="SVM" width="90" x="115" y="30"/>
<connect from_port="training" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="371" width="321">
<operator activated="true" class="apply_model" compatibility="5.2.003" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="series:forecasting_performance" compatibility="5.1.002" expanded="true" height="76" name="Performance" width="90" x="183" y="30">
<parameter key="horizon" value="1"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="5.2.003" expanded="true" height="76" name="Apply Model (2)" width="90" x="313" y="120">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.2.003" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="120">
<list key="function_descriptions">
<parameter key="Tag" value="param("Windowing", "label_attribute")"/>
</list>
</operator>
<connect from_port="input 1" to_op="Windowing" to_port="example set input"/>
<connect from_port="input 2" to_op="Windowing (2)" to_port="example set input"/>
<connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="source_input 3" spacing="0"/>
<portSpacing port="sink_performance" spacing="72"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="5.2.003" expanded="true" height="76" name="Append" width="90" x="581" y="30"/>
<connect from_op="Generate Data" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Loop Parameters" to_port="input 1"/>
<connect from_op="Multiply" from_port="output 2" to_op="Loop Parameters" to_port="input 2"/>
<connect from_op="Loop Parameters" from_port="result 1" to_op="Append" to_port="example set 1"/>
<connect from_op="Append" from_port="merged set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0
Answers
Something like this?... regards
Andrew
Hi!
It works! That's it! 8)
I modified the code a little accoding my formula but it's your code that made this process fly
Though, I have to admit that I yet not fully understand what it does...
Anyways... Something left to discover for tomorrow ::)
Thanks again!
Bye
Sachs
When trying to understand the process I came across the "rename" operator.
replace_what parameter is: ^(.*)-.*$
replace_by parameter is: $1
I found a link to this article here in the Forum on regular expressions:
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html
Furthermore, in RapidMiner Tutorial 4.6 (not in the current one) the same chapter on regular expressions can be found.
However, I don't fully understand what it does:
replace_what also works without the boundaries ^ and $. Why are they necessary then?
Why is the first .* grouped by brackets?
What does $1 stand for?
What would be $2 and $3 mentioned in the operator description?
*** puzzled *** ???
Regards
Sachs
It's a regular expression that says something like...
Start at the beginning of the line then match any number of arbitrary characters, a "-" and then any number of arbitrary characters until the end of the line. The brackets are a capturing group so all the arbitrary characters before the "-" are placed in it and this is referred to as $1 when the replacement is done. This has the effect of stripping out everything from the "-" inclusive to the end of the line for the attribute name.
The beginning and end of line are probably not needed; it's often the way with regular expressions, there are many ways of solving a specific problem.
regards
Andrew
Ahhh, I wasn't aware that $1 refers to the term in brackets. Now it's much clearer.
Just for information: I was wondering about the behaviour of the "Extract Macro" parameter.
Use case 1: I put the name of an arbitrary attribute in there (e.g. v1) and the operator will return the examples of the attribute (e.g. 5861).
row predic date v1 v2 tag
1 6211 2006 5861 87 v1
2 6215 2007 6010 91 v1
3 105 2006 5845 100 v2
4 98 2007 5495 88 v2
Use case 2 (see code in this thread): I put the name of attribute in there (e.g. tag) whose examples are the names of other attributes. In that case the examples of the other attributes will be returned (e.g. again 5861 instead of v1).
row predic date v1 v2 tag
1 6211 2006 5861 87 v1
2 6215 2007 6010 91 v1
3 105 2006 5845 100 v2
4 98 2007 5495 88 v2
I was also suprised that the parameter "attribute name" was just only "Tag" and not %{Tag}.
...a little confusing for beginners but yet a handy feature...
Bye
Sachs