The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Text Mining: Ranking Word Vector Occurrences for Output to OLS Model"

kgilmankgilman Member Posts: 1 Learner III
edited June 2019 in Help
Background:
I have a file of short (<170 char) text descriptions of chief medical complaints when a patient is logged at reception of a medical facility.  I also have the total service time associated with that patient.  There is already an established OLS regression for other attributes logged at reception to predict a patients length of stay.  I wish to see if I can extract a signal from the text field to improve the performance of the OLS model.  Initially, there doesn't appear to be much lift from looking at the text field alone.  My hypothesis is that while most of the text is just noise, there are certain n-grams that should provide a pretty strong signal for a long (>1 std. dev.) or short (<1 hour) length of stay (LOS). 

Questions:
1)  How can I show the performance (contribution) of each word vector in RapidMiner toward predicting the Long or Short LOS label?
2)  Specifically, how do I output a weight factor that can then be used in the OLS?
3)  Any other ideas for alternative approaches to combining text mining with OLS models?

Thanks!

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    the Linear Regression or SVM (linear) in RapidMiner have a weight output that provides weighting factors.

    Best regards,
    Marius
Sign In or Register to comment.