The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Term Occurrences and Frequency - I have to be missing something

btibertbtibert Member, University Professor Posts: 146 Guru
I am following along with this post because I wanted to ensure my intuition was correct, because I was seeing results that didn't make sense, to me anyway.
https://community.rapidminer.com/discussion/46333/term-frequencies-and-tf-idf-how-are-these-calculated

The only difference that I see in my process to start is that I am reading in my data from Excel and not creating it by hand.

Here is the term occurrences after making it lower case, extracting stop words, tokenizing, and counting the tokens.



Just the like the post, I am using very simple sentences to keep the vocabulary small.

Now, here is the same exact data, the only difference is that I am now using term frequency within the Process Documents operator



Of course there is a very good change that I am missing a setting along the way, but why is the first example .577 for each of the three words, when the basic sentence, unprocessed, was I like turtles.

Thanks in advance.

Best Answer

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    are you sure you don't use TF/IDF?

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • btibertbtibert Member, University Professor Posts: 146 Guru
    See below, and my dataset/process attached. Entirely possible I am missing something obvious, just not sure what it could be.


  • btibertbtibert Member, University Professor Posts: 146 Guru
    Absolutely fantastic, thanks!  I completely missed (as the title suggested) the normalized part, I just saw the output I expected and stopped reading like a dummy.  Many thanks for the example process as well, I haven't had a chance to wrap my head around looping the way you did it, but it appears straight forward enough.
  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    You're welcome, @btibert

    Regards,

    Lionel
Sign In or Register to comment.