Looking for help-Building Models for Text Classification/TopicModeling/Clustering
Hello all,
I'm just getting familiarized with topic modeling/text classification using clustering or supervised learning to build models. Is there a way to edit or manually create part of a model so that I can force a category or ensure that key words that may not have been in the tagged data are included in future runs of the model?
I haven't posted my current process because I don't know where to start with the model building. The test data I am working with is a list of past exam questions. I want to run them through a process that categorizes them based on topic. Is there a way to adjust the model after running a training set of data to ensure that specific key words rank high in the distribution table?
Thanks,
Ryan
Answers
You can certainly create a wordlist from one dataset and then apply it on a future dataset with no problems. That's why the Process Documents operators have a wordlist input object. Or you can even create your own wordlist manually and use that if you like. But if that wordlist isn't part of a dataset that you use to build a model, it won't be incorporated into any machine-learning based model that you create using typical techniques such as Naive Bayes, SVM, neural net, etc.
So you would basically have to create a machine-learning model based on an actual dataset of words, and then you could supplement that with a set of rules or overrides manually, but it would be a multi-step process, not all combined in a single model.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts