The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[Solved] Problem with TF IDF calculation
Hello everyone,
I am currently working on a task where I have to resample some data. Because I am unsure if it's okay to use a method like SMOTE on already calculated tfidf weights I wanted to calculate the term occurances in Rapidminer, export and smote the data and later import it and calculate the TFIDF weights.
When I ran a test without the smoting step I came across the following behaviour which I just can't find an explanation to.
1. For the first piece of data I just calculate the TFIDF weights using the Process Documents Operator.
2. Then for the same input data I calculate only the term occurences, binary term occurences and term frequency and then use the "Generate TFIDF" Operator to calculate the tfidf weights.
However none of the combinations from step 2 comes out with the same values as the calculation in step 1. Am I missing something?
Does anyone have an answer to this?
Okay the problem seems to be that the "Process Documents" and "Generate TFIDF" Operators don't seem to work together.
I am currently working on a task where I have to resample some data. Because I am unsure if it's okay to use a method like SMOTE on already calculated tfidf weights I wanted to calculate the term occurances in Rapidminer, export and smote the data and later import it and calculate the TFIDF weights.
When I ran a test without the smoting step I came across the following behaviour which I just can't find an explanation to.
1. For the first piece of data I just calculate the TFIDF weights using the Process Documents Operator.
2. Then for the same input data I calculate only the term occurences, binary term occurences and term frequency and then use the "Generate TFIDF" Operator to calculate the tfidf weights.
However none of the combinations from step 2 comes out with the same values as the calculation in step 1. Am I missing something?
Does anyone have an answer to this?
Okay the problem seems to be that the "Process Documents" and "Generate TFIDF" Operators don't seem to work together.
0