Google trends in RapidMiner: sentences to value series
Did anyone frame texts from different time as a value series? I would like to do something like Google's Trends over a historical corpus of sentences. Now, I wonder how to turn the text representation into a value series. Any experiences out there?
I have done something very similar on log files where I treat each line as a document.
One issue is new words because some new word might turn up later that has never been seen before causing a gnashing of teeth and grinding of cog wheels. The solution is a strict word list and something to spot new words.