"Text Mining: Ranking Word Vector Occurrences for Output to OLS Model"

kgilmankgilman Member Posts: 1 Contributor I
edited June 2019 in Help
I have a file of short (<170 char) text descriptions of chief medical complaints when a patient is logged at reception of a medical facility.  I also have the total service time associated with that patient.  There is already an established OLS regression for other attributes logged at reception to predict a patients length of stay.  I wish to see if I can extract a signal from the text field to improve the performance of the OLS model.  Initially, there doesn't appear to be much lift from looking at the text field alone.  My hypothesis is that while most of the text is just noise, there are certain n-grams that should provide a pretty strong signal for a long (>1 std. dev.) or short (<1 hour) length of stay (LOS). 

1)  How can I show the performance (contribution) of each word vector in RapidMiner toward predicting the Long or Short LOS label?
2)  Specifically, how do I output a weight factor that can then be used in the OLS?
3)  Any other ideas for alternative approaches to combining text mining with OLS models?



  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn

    the Linear Regression or SVM (linear) in RapidMiner have a weight output that provides weighting factors.

    Best regards,
Sign In or Register to comment.