The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
TFIDF per Class/Label ?

Hi Guys,
The Process posted below is the on I am working with.
In addition I would like to know to more things:
1. Is it possible to calculate some kind of "TFIDF Score" no just for a seperated example but for a whole class/label so that i know what attribute is very characteristic for a label/class.
2. I would like to findout what combination of attributes is significant for a Label/Class.
2a. I would like to find out which correlation exists between attributes related to class. So that you can make assumptions like if an example has a certain combeination of attributes it belongs to certain class.
Thanks in advance for your time any help is really appreciated.
The Process posted below is the on I am working with.
First I will describe my data. The Set consists of 1500 examples each presenting a productnr (id of the product). The Products (examples) are grouped in two different Classes/Labels. Like you can see in my Process i calculated the TFIDF score for every Attribute(Error1...Errorx) for every example (product).
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
<operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
<process expanded="true" height="528" width="648">
<operator activated="true" class="read_excel" compatibility="5.2.000" expanded="true" height="60" name="Read Excel" width="90" x="45" y="75">
<parameter key="excel_file" value="C:\Dokumente und Einstellungen\rrojas\My Documents\myData\Spreadsheetversion1.0Forum.xls"/>
<parameter key="imported_cell_range" value="A1:AG1500"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
<list key="data_set_meta_data_information">
<parameter key="0" value=""/>
<parameter key="1" value="Class.true.binominal.label"/>
<parameter key="2" value="Error 1.true.integer.attribute"/>
<parameter key="3" value="Error 2.true.integer.attribute"/>
<parameter key="4" value="Error 3.true.integer.attribute"/>
<parameter key="5" value="Error 4.true.integer.attribute"/>
<parameter key="6" value="Error 5.true.integer.attribute"/>
<parameter key="7" value="Error 6.true.integer.attribute"/>
<parameter key="8" value="Error 7.true.integer.attribute"/>
<parameter key="9" value="Error 8.true.integer.attribute"/>
<parameter key="10" value="Error 9.true.integer.attribute"/>
<parameter key="11" value="Error 10.true.integer.attribute"/>
<parameter key="12" value="Error 11.true.integer.attribute"/>
<parameter key="13" value="Error 12.true.integer.attribute"/>
<parameter key="14" value="Error 13.true.integer.attribute"/>
<parameter key="15" value="Error 14.true.integer.attribute"/>
<parameter key="16" value="Error 15.true.integer.attribute"/>
<parameter key="17" value="Error 16.true.integer.attribute"/>
<parameter key="18" value="Error 17.true.integer.attribute"/>
<parameter key="19" value="Error 18.true.integer.attribute"/>
<parameter key="20" value="Error 19.true.integer.attribute"/>
<parameter key="21" value="Error 20.true.integer.attribute"/>
<parameter key="22" value="Error 21.true.integer.attribute"/>
<parameter key="23" value="Error 22.true.integer.attribute"/>
<parameter key="24" value="Error 23.true.integer.attribute"/>
<parameter key="25" value="Error 24.true.integer.attribute"/>
<parameter key="26" value="Error 25.true.integer.attribute"/>
<parameter key="27" value="Error 26.true.integer.attribute"/>
<parameter key="28" value="Error 27.true.integer.attribute"/>
<parameter key="29" value="Error 28.true.integer.attribute"/>
<parameter key="30" value="Error 29.true.integer.attribute"/>
<parameter key="31" value="Error 30.true.integer.attribute"/>
<parameter key="32" value="Total Errors.true.integer.attribute"/>
<operator activated="true" class="generate_tfidf" compatibility="5.2.000" expanded="true" height="76" name="Generate TFIDF" width="90" x="246" y="165"/>
<operator activated="true" class="data_to_similarity" compatibility="5.2.000" expanded="true" height="76" name="Data to Similarity" width="90" x="380" y="75">
<parameter key="measure_types" value="NumericalMeasures"/>
<parameter key="numerical_measure" value="CosineSimilarity"/>
<connect from_op="Read Excel" from_port="output" to_op="Generate TFIDF" to_port="example set input"/>
<connect from_op="Generate TFIDF" from_port="example set output" to_op="Data to Similarity" to_port="example set"/>
<connect from_op="Data to Similarity" from_port="similarity" to_port="result 2"/>
<connect from_op="Data to Similarity" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
In addition I would like to know to more things:
1. Is it possible to calculate some kind of "TFIDF Score" no just for a seperated example but for a whole class/label so that i know what attribute is very characteristic for a label/class.
2. I would like to findout what combination of attributes is significant for a Label/Class.
2a. I would like to find out which correlation exists between attributes related to class. So that you can make assumptions like if an example has a certain combeination of attributes it belongs to certain class.
Thanks in advance for your time any help is really appreciated.