The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How can I plot the frequency of word?


Hello everyone!
I'm trying to use the operator Generate Gaussian in order to plot the frequency of words, but comparing my results (calculated manually) with them they're really different. I need this operation to understand which values to discard through the pruning. What's the formula that RapidMiner uses to create the Gaussian?
Thank you.
Tagged:
0
Answers
I am also not clear how conformity to a hypothetically pure statistical distribution affects pruning. You might be better off simply setting pruning thresholds by frequency or by percentage at a few different levels and seeing what words are dropped as a consequence. Typically having a lot of words with only a handful of occurrences does nothing at all for model performance but can lead to large datasets and long runtimes.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts