Time based term frequency analysis
Date (dd/mm/yyyy format) | Body of Text (text) | Publisher (name)
So each record in the data set relates to a specific body of text published at a specific date, and the name of the publisher.
My end goal is to identify words/terms in the texts which started occurring after a given date (i.e. after 1 January 2010), as well as see the word/term frequencies of these identified words/terms over time (can be per year) after the given date.
My current config is: Read Excel - Nominal to Text - Process Documents from Data (tokenizing, filtering and transforming) - Wordlist to Data
I am very new to rapidminer, so any assistance would be really appreciated!!