The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Transition Matrix Operator: simple question
Hi!
I've just started with rapidminer and think it's amazing. Being relatively new to data mining and machine learning, I'm starting simple, and so please forgive me if this question is naive.
I created a process (XML later) to generate some nominal data so I could try to understand the "transition matrix" operator.
The code results in the following transition matrix:
value0 0.0 0.3386693346673337 0.0
value1 0.0 0.33316658329164583 0.0
value2 0.0 0.0 0.3281640820410205
Now, I'm sure it's because I don't know what I'm looking at, but I wrote a quick perl script to calculate what I thought was the same thing, and it produced the following result (from the same example set that generated the above transition matrix):
value0 value1 value2
value0 0.325 0.360 0.315
value1 0.366 0.297 0.336
value2 0.323 0.341 0.335
So you can see that my perl code reveals my (perhaps mis-) understanding that the rows of the transition matrix should total 1.
It's obvious to me that I don't understand the nuance in the description of the Transition Matrix operator:
Many thanks!
Here's the XML for the process I created:
I've just started with rapidminer and think it's amazing. Being relatively new to data mining and machine learning, I'm starting simple, and so please forgive me if this question is naive.
I created a process (XML later) to generate some nominal data so I could try to understand the "transition matrix" operator.
The code results in the following transition matrix:
value0 0.0 0.3386693346673337 0.0
value1 0.0 0.33316658329164583 0.0
value2 0.0 0.0 0.3281640820410205
Now, I'm sure it's because I don't know what I'm looking at, but I wrote a quick perl script to calculate what I thought was the same thing, and it produced the following result (from the same example set that generated the above transition matrix):
value0 value1 value2
value0 0.325 0.360 0.315
value1 0.366 0.297 0.336
value2 0.323 0.341 0.335
So you can see that my perl code reveals my (perhaps mis-) understanding that the rows of the transition matrix should total 1.
It's obvious to me that I don't understand the nuance in the description of the Transition Matrix operator:
Would some kind soul please put me out of my misery and explain what it is I am seeing when I look at the output of the Transition Matrix operator?
This operator calculates the transition matrix of a specified attribute, i.e. the operator counts how often each possible nominal value follows after each other.
Many thanks!
Here's the XML for the process I created:
Here's my perl script:
?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
<process expanded="true" height="325" width="145">
<operator activated="true" class="generate_nominal_data" compatibility="5.1.014" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="255">
<parameter key="number_examples" value="2000"/>
<parameter key="number_of_attributes" value="1"/>
<parameter key="number_of_values" value="3"/>
</operator>
<operator activated="true" class="write_csv" compatibility="5.1.014" expanded="true" height="60" name="Write CSV" width="90" x="151" y="254">
<parameter key="csv_file" value="C:\Documents and Settings\MikeN\My Documents\Mike\tmat.csv"/>
<parameter key="column_separator" value=","/>
<parameter key="quote_nominal_values" value="false"/>
</operator>
<operator activated="true" class="transition_matrix" compatibility="5.1.014" expanded="true" height="76" name="Transition Matrix" width="90" x="333" y="227">
<parameter key="attribute" value="att1"/>
</operator>
<connect from_op="Generate Nominal Data" from_port="output" to_op="Write CSV" to_port="input"/>
<connect from_op="Write CSV" from_port="through" to_op="Transition Matrix" to_port="example set"/>
<connect from_op="Transition Matrix" from_port="example set" to_port="result 1"/>
<connect from_op="Transition Matrix" from_port="transition matrix" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
#!/usr/bin/perl -w
use strict;
my $curr_state;
my %trans;
my %state_counts;
<>;
while(<>){
my ($state,undef) =split /,/;
$state_counts{$state}++;
if($curr_state){
$trans{$curr_state}->{$state}++;
}
$curr_state = $state;
}
print "\t",join("\t",(sort keys %state_counts)),"\n";
foreach $curr_state (sort keys %trans){
print $curr_state;
foreach (sort keys %{$trans{$curr_state}}){
print "\t",sprintf("%0.3f",$trans{$curr_state}->{$_}/$state_counts{$curr_state});
#print join(",",$curr_state,$_,$trans{$curr_state}->{$_}/$state_counts{$curr_state}),"\n";
}
print "\n";
}
0
Answers
While there may or may not be other little problems (for e.g. why do I get so may 0's in my matrix?), I see that the Transition Matrix operator tries to define a matrix in which each entry [i,j] is the proportion of all transitions that are represented by that from state[i] to state[j], rather than what I would have thought would have been more interesting: the proportion of all transitions from state[i] that are represented by that from state[i] to state[j].
From the nature of the result, it might appear that there is another small problem:
From "com.rapidminer.tools.container.Tuple": From com.rapidminer.operator.visualisation.dependencies.TransitionMatrixOperator: So that explains why I get only 1 non-zero value in each row.
It seems to me that TransitionMatrixOperator might have at least 1, and possibly 2 bugs.
What is the correct procedure to ask that it be looked into by someone more knowledgeable than me?
Many Thanks!
I have to admit I would expect the same behaviour you have described but I am sure there is a reason it has been implemented this way.
Unfortunately I can't just change the behaviour a operator works - though this might not be the most used operator - because processes
that depend on this operator would be corrupted. I've created a bug report at http://bugs.rapid-i.com/ and we will discuss it later with the team.
Thanks for your hint anyway!
Regards,
Nils
and include it into RapidMiner.
Cheers,
Nils