Building a sustainability dictionary

Jonas97
we are beginner in the world of text mining and anlysis...
We would like to create a sustainability dictionary. The databasis are sustainability reports and 10-k reports from firms.
We should use 70% data for the ridge regression, so we can find our parameters for the dictionary. For a classification In the next step we should use 30% of the data to train the model and to classify reports. We should show how exactly is the Ridge Regression, i.e. "when one word in a sentencnes is a sustainability word from our analysis the sentences is a sustainability sentence". How we can model this in RapidMiner? Any tips or models or templates? We have the data in sentences in Excel.
Thank you very much for the further information!


    Noel
    Jonas97: Sounds like a really interesting project. Were you able to get assistance? How much progress were you able to make?
