The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
TF-IDF
Hello,
I have manually calculated a TF-IDF of a very simple case:
I have 2 documents:
1 The small cat is black
2 We see a black dog and a black cat
My word list contains:
cat dog black small
My calculations give a TF-IDF of
d1: [-0.707 0 -0.707 0]
d2: [-0.447 0 -0.894 0]
I've made and excel file with these simplified sentences:
small cat black
black dog black cat
I made a small process:
Read excel -> Nominal to text -> Process documents from data (inside Tokenize)
With the Process documents from data set to TF-IDF. When I run this it gives this result:
d1: [0 0 0 1]
d2: [0 1 0 0]
I'm pretty sure that my calculations are right. But I also don't understand the result from RapidMiner, how is it that small and dog have value 1 for document 1 and 2 respectively. There is something not right here and I do not know what.
Thanks
-Prentice
I have manually calculated a TF-IDF of a very simple case:
I have 2 documents:
1 The small cat is black
2 We see a black dog and a black cat
My word list contains:
cat dog black small
My calculations give a TF-IDF of
d1: [-0.707 0 -0.707 0]
d2: [-0.447 0 -0.894 0]
I've made and excel file with these simplified sentences:
small cat black
black dog black cat
I made a small process:
Read excel -> Nominal to text -> Process documents from data (inside Tokenize)
With the Process documents from data set to TF-IDF. When I run this it gives this result:
d1: [0 0 0 1]
d2: [0 1 0 0]
I'm pretty sure that my calculations are right. But I also don't understand the result from RapidMiner, how is it that small and dog have value 1 for document 1 and 2 respectively. There is something not right here and I do not know what.
Thanks
-Prentice
0
Best Answer
-
sgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Managerhi @Prentice hmm I'm puzzled how you did not find this knowledge base article after you searched for this question prior to posting? Anyway here's what I think you're looking for:
https://community.rapidminer.com/discussion/46333/term-frequencies-and-tf-idf-how-are-these-calculated
Scott
5
Answers