The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
TFIDF Output
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornThe values of TF-IDF are not term frequency values in terms of percentages, nor are they across all documents. That is why they are not directly comparable to the pruning input parameters, which is what @sgenzer said above. The pruning parameter is the percent of all documents that you want the term to appear in (either min or max). It should typically be entered as a whole number from 1-100.
@Prentice as far as whether there is a bug based on inputting contradictory values, that may be the case, but it would be a separate issue. Can you post an example process with data to show?8
Answers
I could be wrong, but I don't your post (@sgenzer) solves the question.
First of all, you selected a percentage, meaning you should express it in percentages: 34 and 80.
However if I try this, it won't work for me as well. To make it even stranger, if I set the prune below percent higher than the prune above percent I still get some values even though it should not be possible.
Very peculiar.
Maybe that it's a bug?
It's first pruning the values and after that it calculates the TF-IDF for the remaining values as far as my knowledge goes