The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Correlation Matrix does not include label attribute
Hi rapidminers,
Is there any reason for the fact that correlation matrix does not include label attribute (in case it is present in a dataset) and shows only regular ones?
Without label:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Samples/data/Titanic"/>
</operator>
<operator activated="false" breakpoints="after" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="112" y="238">
<parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
</operator>
<operator activated="true" class="correlation_matrix" compatibility="7.1.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="246" y="85"/>
<connect from_op="Retrieve Titanic" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
<connect from_op="Correlation Matrix" from_port="matrix" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
With label:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="false" breakpoints="after" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Samples/data/Titanic"/>
</operator>
<operator activated="true" breakpoints="after" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="112" y="238">
<parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
</operator>
<operator activated="true" class="correlation_matrix" compatibility="7.1.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="246" y="85"/>
<connect from_op="Retrieve Titanic Training" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
<connect from_op="Correlation Matrix" from_port="matrix" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
As an additional concern about correlation matrix.
What might be the reason that one of attributes shows mostly '?' (NaN) in the matrix, even with a correlation to itself?
Vladimir
http://whatthefraud.wtf
Hi @kypexin - I do not have a PhD in Data Science like other folks here but I will simply say that not having the label in a correlation matrix makes perfect sense to me. In a correlation matrix, you are simply finding r (r^2 if you check the box) for nC2 numerical attributes to see potential correlations two-by-two.
Scott
If you simply want the correlation of your attributes with the label, you can use "Weight by Correlation" to generate that.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks @sgenzer @Telcontar120, it's all clear.
Maybe I actually had to put the question in a different way: couldn't the label be automatically treated as a regular attribute in the process of matrix computation, at least by means of an option; it's clear to me that I can just turn it into a regular attribute manually before building the matrix. But that's more like a philosophical question
Vladimir
http://whatthefraud.wtf
ah yes @kypexin I understand exactly what you mean. It's a valid point. I guess I always look at "labels" in RapidMiner as "special variables that should be treated separately from others". Hence all the checkboxes that say "include special attributes". So whether or not "include special attributes" should be in Correlation Matrix is a good question. I will throw it on my "interesting suggestions from community users to the dev team" list.
Scott