Time based term frequency analysis

dawidprozeskydawidprozesky Member Posts: 1 Learner I
Hi, I explored rapidminer a while ago, and have now returned with a specific analysis which I hope to achieve. I have a data set in Excel with the following columns:

Date (dd/mm/yyyy format) | Body of Text (text) | Publisher (name)

So each record in the data set relates to a specific body of text published at a specific date, and the name of the publisher.

My end goal is to identify words/terms in the texts which started occurring after a given date (i.e. after 1 January 2010), as well as see the word/term frequencies of these identified words/terms over time (can be per year) after the given date.

My current config is: Read Excel - Nominal to Text - Process Documents from Data (tokenizing, filtering and transforming) - Wordlist to Data

I am very new to rapidminer, so any assistance would be really appreciated!!

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,226   Unicorn
    You are probably going to want to do some preprocessing on your date/time data first before your text analysis to facilitate your subsequent comparisons.  Try Date to Numerical to summarize by month/year.  Then when you generate your word counts, you can aggregate by the appropriate time window later.
    As far as looking for occurrences after a specific date, a simple Filter Examples should suffice to handle that.
    This should get you started.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    mschmitz
Sign In or Register to comment.