The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Answers
I am not sure why you are trying to find the correlation for categorical variables, because practically there is no benefit of doing this as correlation is something related to continuous variables where covariance is calculated. You can still do it using nominal to numerical operator and then using a correlation matrix. Please find XML below. To check this XML, you need to open a new process then you need to access XML window by selecting (VIEW --> Show Panel --> XML), copy the code from here and paste it in the XML window and then click on Green tick mark so that you can see the process. @mschmitz can inform more about the issues with correlation on nominal attributes.
<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Titanic Unlabeled" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Samples/data/Titanic Unlabeled"/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="9.2.001" expanded="true" height="103" name="Nominal to Numerical" width="90" x="313" y="85">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="true"/>
<parameter key="coding_type" value="unique integers"/>
<parameter key="use_comparison_groups" value="false"/>
<list key="comparison_groups"/>
<parameter key="unexpected_value_handling" value="all 0 and warning"/>
<parameter key="use_underscore_in_name" value="false"/>
</operator>
<operator activated="true" class="concurrency:correlation_matrix" compatibility="9.2.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="514" y="85">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="normalize_weights" value="true"/>
<parameter key="squared_correlation" value="false"/>
</operator>
<connect from_op="Retrieve Titanic Unlabeled" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="Correlation Matrix" to_port="example set"/>
<connect from_op="Correlation Matrix" from_port="example set" to_port="result 3"/>
<connect from_op="Correlation Matrix" from_port="matrix" to_port="result 1"/>
<connect from_op="Correlation Matrix" from_port="weights" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
"correlation" as a term is not clrearly defined. The most common definition of correlation is Pearson-Correlation. Pearse-Correlation is not defined for non-numerical data.
So either you want not to use pearson correlation, or you want to use some preprocessing before calculating the correlation. E.g. by using the Nominal to Numerical operator.
I recommend to use another dependency measure, which is well defined for nominal data. My two go-to options are Gini-Index and Entropy (aka Information Gain).
Best,
Martin
Dortmund, Germany
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts