"Historical Twitter data"

LCLC Member Posts: 1 Contributor I
edited June 2019 in Help
Hi all

Relatively new to rapidminer but looking at it as a useful analysis tool for some research I am doing with colleagues. In particular we are looking to use historical Twitter data but the biggest stumbling block seems to be how to import the data into Rapidminer in an efficient manner - our current 'manual' system of creating an excel sheet with the data separated into categories (ID, date, location, retweets, tweet text etc.) is somewhat labour intensive.

I have looked online but not found anything that would work for us, yet! For example, http://vancouverdata.blogspot.co.uk/2010/11/text-analytics-with-rapidminer-loading.html - a brilliant set of tutorials but hasn't greatly helped in getting what we need at the start.  ???

I have had a look at the Twitter connector - fantastic and the data is ready to go - but I'm not sure it works for historical data. We are looking at tweets from around 18 months ago and when I limit the results by date nothing comes back. Does anyone know if the Twitter connector would work here, and if so how? If not, can anyone point me in the right direction of how best to import historical Twitter data into rapid miner.

Thanks
Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi,

    i think twitter limites the api for 2 weeks back or something. There is nothing RM can do about this.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    I'm assuming Twiter limits access to historical tweets as they sell that data through their subsidiary Gnip. https://www.gnip.com/sources/twitter/historical/

    I don't know their pricing, but you can look at some of the partners plugged into their feed to see if any of those have RapidMiner connectors: Splunk for example.  
    https://www.gnip.com/partners/plugged-in/  (Although, not all these partners get as much as others, Alteryx only gets the last 30 days of Tweets through their Gnip connector).  Brandwatch was the one I used to use.  

    Another potential option is free API tools like Snapbird https://github.com/remy/snapbird who claim to circumvent limits with their search API.  I haven't tested I'm afraid.  
Sign In or Register to comment.