The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

TFIDF Output

FlixportFlixport Member Posts: 33 Contributor II
Hello RM family,

I have concerning the TF IDF vector the question to the output, why it gives me under the value 0.34 (see screen 1) the words. That shouldn't happen, should it?

screen1
screen2

BR

Best Answer

Answers

  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Hi @Flixport so those numbers are not at all the same thing. I wrote a rather lengthy KB article explaining how TF-IDF is done - you can find it here: https://community.rapidminer.com/discussion/46333/term-frequencies-and-tf-idf-how-are-these-calculated
  • FlixportFlixport Member Posts: 33 Contributor II
    Hi @sgenzer i already read it, but the description of Prune Method states that the values below X should be ignored, which should be the case when I specify the Custom. This means that the output should ignore the values according to the Prune method.

  • PrenticePrentice Member Posts: 66 Maven
    Hi,

    I could be wrong, but I don't your post (@sgenzer) solves the question.

    First of all, you selected a  percentage, meaning you should express it in percentages: 34 and 80. 
    However if I try this, it won't work for me as well. To make it even stranger, if I set the prune below percent higher than the prune above percent I still get some values even though it should not be possible. 
    Very peculiar.

    Maybe that it's a bug?

  • PrenticePrentice Member Posts: 66 Maven
    Telcontar120, I actually think that I already see what's happening here.
    It's first pruning the values and after that it calculates the TF-IDF for the remaining values as far as my knowledge goes
Sign In or Register to comment.