The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Calculate TFIDF
Hello,
I would like to calculate the TF-IDF of words in different documents, in order to determine the words that are the most significant for each document.
I use the create document block for each new document and I add an attribute (a name) to each document. I then use a "documents to data" operator to generate an example set. Then I use a "process document from data" operator to compute the TF-IDF (which I selected on the parameter board).
The problem is that I don't get TF-IDF but only the number of occurences of the words and the number of documents in which they appear. Moreover, I don't see anymore the label of the document, so I am not able to distinguish the different documents.
Can somebody help me?
Thanks a lot,
Barthélémy
I would like to calculate the TF-IDF of words in different documents, in order to determine the words that are the most significant for each document.
I use the create document block for each new document and I add an attribute (a name) to each document. I then use a "documents to data" operator to generate an example set. Then I use a "process document from data" operator to compute the TF-IDF (which I selected on the parameter board).
The problem is that I don't get TF-IDF but only the number of occurences of the words and the number of documents in which they appear. Moreover, I don't see anymore the label of the document, so I am not able to distinguish the different documents.
Can somebody help me?
Thanks a lot,
Barthélémy
0
Answers
it sounds like you are only looking at the wordlist output (where the word occurences are shown). But also take a look at the example set output of the "Process Documents" operator. There you will see TF-IDF values and also the document's label.
Instead of chaining "Documents to Data" and "Process Documents from Data" you can use the single operator "Process Documents" instead.
Best regards
Matthias
Barth