Looking for help-Building Models for Text Classification/TopicModeling/Clustering

Doudr85 · April 2018

Hello all,

I'm just getting familiarized with topic modeling/text classification using clustering or supervised learning to build models. Is there a way to edit or manually create part of a model so that I can force a category or ensure that key words that may not have been in the tagged data are included in future runs of the model?

I haven't posted my current process because I don't know where to start with the model building. The test data I am working with is a list of past exam questions. I want to run them through a process that categorizes them based on topic. Is there a way to adjust the model after running a training set of data to ensure that specific key words rank high in the distribution table?

Thanks,

Ryan

Telcontar120 · April 2018

You can certainly create a wordlist from one dataset and then apply it on a future dataset with no problems. That's why the Process Documents operators have a wordlist input object. Or you can even create your own wordlist manually and use that if you like. But if that wordlist isn't part of a dataset that you use to build a model, it won't be incorporated into any machine-learning based model that you create using typical techniques such as Naive Bayes, SVM, neural net, etc.

So you would basically have to create a machine-learning model based on an actual dataset of words, and then you could supplement that with a set of rules or overrides manually, but it would be a multi-step process, not all combined in a single model.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Looking for help-Building Models for Text Classification/TopicModeling/Clustering

Answers