"Text Mining: Ranking Word Vector Occurrences for Output to OLS Model"

kgilman · February 2014

Background:
I have a file of short (<170 char) text descriptions of chief medical complaints when a patient is logged at reception of a medical facility. I also have the total service time associated with that patient. There is already an established OLS regression for other attributes logged at reception to predict a patients length of stay. I wish to see if I can extract a signal from the text field to improve the performance of the OLS model. Initially, there doesn't appear to be much lift from looking at the text field alone. My hypothesis is that while most of the text is just noise, there are certain n-grams that should provide a pretty strong signal for a long (>1 std. dev.) or short (<1 hour) length of stay (LOS).

Questions:
1) How can I show the performance (contribution) of each word vector in RapidMiner toward predicting the Long or Short LOS label?
2) Specifically, how do I output a weight factor that can then be used in the OLS?
3) Any other ideas for alternative approaches to combining text mining with OLS models?

Thanks!

MariusHelf · February 2014

Hi,

the Linear Regression or SVM (linear) in RapidMiner have a weight output that provides weighting factors.

Best regards,
Marius

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Text Mining: Ranking Word Vector Occurrences for Output to OLS Model"

Answers