Analyze 77000 tweets
I have to deal with a dataset of 77000 tweet with the following attributes: post_id, username, hash_tag, sent_time, text, user_id, source, is_retweet, is_reply, lang, retweet_count, reply_count, latitude, longitude. I must do an analysis using association rules and clustering but I'm new on RM and I hope someone can give me advice on how to proceed.
My first problem is the free license: I can read only 10000 lines. Do operators exist that generate a significant sample?
Second problem: what kind of association rules can I use? I'm thinking of "manual" sentiment analysis ( I have seen that there is Aylien extension but it has limitation and it doesn't work with italian language): is there a way to find the most important words in the tweet in order to do a positive/negative classification?
Can you suggest me some association rules and/or clustering algorithms that I could use? How could I interpret them?
I apologize for all these questions and I would be very greatful if someone wants is kind enough to help me!