Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

WikiLeaks

MarinMarin Member Posts: 19 Contributor II
Anyone doing text mining / information extraction / sentiment analysis on WikiLeaks data?
I believe cablegate corpus will be particularly interested. I'd certainly like to see the results on important political issues. Might not be the highest scientific achievement but something tells me it will be heavily cited.
Tom, you are recently doing sentiment analysis, maybe trying to plug it into this thing?
Regards,

Marin

Answers

  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Cool idea, go for it!

    Cheers,
    Ingo
  • el_chiefel_chief Member Posts: 63 Contributor II
    I was thinking about it. It will require some serious computer power though...
  • MarinMarin Member Posts: 19 Contributor II
    Iraqi War diaries is 350 MB csv, and Cables that are being released have, well 3.000.000 pages. How much CPU hours is your estimate for the Iraqi War diaries?
  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    On a decent machine with tons of memory things do not have to be too hard. We have had analyses with a couple of millions of documents before although you are probably not wanting to calculate an SVM on them or even not a K-Means clustering. One single pass and you should be fine in a couple of weeks at maximum.

    Cheers,
    Ingo
  • PrekoPreko Member Posts: 21 Contributor II
    Great idea, keep us posted if anyone have some progress!
  • RichyRichy Member Posts: 20 Contributor II
    Hi,

    I will not have much time during the 2 next weeks, but from january, I can get some time to help you to do some text mining on Wikileaks.

    If you're interested, send me a private message.


    Regards,
    Richard.
  • MarinMarin Member Posts: 19 Contributor II
    Dear Neil,
        thumb up for the initiative.

    I'm in.
    Cablegate, might give a good diplomatic dictionary for a starter.
    I am particularly interested in financial series that might start leaking in January.

    Anyone willing to set up community ed. RapidAnalytics so those who are willing can create their own processes on text mining WikiLeaks?
Sign In or Register to comment.