The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

entropy

rafeenarafeena Member Posts: 14 Contributor II
if i would like to calculate the entropy for each word, during my preprocessing what should i set my word vector to? it would not be advisable to set it to TFIDF right?

Best Answer

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Can you clarify, what do you mean by calculating the entropy of each word? Vectorization is simple preprocessing of texts in an unsupervised fashion, whereas entropy usually is with respect to a label.  So there is no built-in vector metric that would supply anything like a conventional entropy measure. If you are asking which vector you should use if you want to calculate entropy later, then I would think the simple term occurrences would be the appropriate one since that is merely a count of all instances of a given token in a given document.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • rafeenarafeena Member Posts: 14 Contributor II
    i would like to use entropy and TFIDF as my feature selection method. i would like to know will it effect the entropy result if i set the word vector to TFIDF.
Sign In or Register to comment.