What has been the most important things in news for the past one year? data mining possible?

vijay_dpgcvijay_dpgc Member Posts: 5 Contributor I
edited November 2018 in Help

So the idea is something like this. Feed the software with content for one year from newspapers, new websites and what not. The software should eliminate common English words and come up with a list of words which have been trending. Important things which are specified should be looked into by the software.


Is it possible to connected the news feed directly from a newspaper website like The Hindu, instead of copy pasting it daily?

How to do this?


I am looking to find out things like important persons in news, new technology, international organisations and global happenings etc.


PS: This is an amateurish question as I am new to this software.




  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder



    Well this is a bigger project and nobody here will probably be able to describe the complete solution in a single post :-)


    But here are some hints to get you started:


    For all types of text analysis (including sentiment analysis and other), you will need an extension for RapidMiner:



    You can download these extensions for free from our Marketplace.  You can find it in the menu “Extensions” – “Marketplace” and type “Text”, "Aylien", or "Rosette" in the search box. Or you can use the links above.  There are also many more extensions on our Marketplace so make sure that you check them out…


    There is a community member who created a nice set of tutorials for text analysis using the Text Minign extension with RapidMiner: http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-loading.html


    And our friends at Aylien have a great series of blog posts explaining the Aylien extension: http://blog.aylien.com/


    With the Web Mining extension (https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_web) you will also get the necessary operators to access data from online data sources automatically.


    So the answer is: YES, RapidMiner can do the job.  But it will require some time and efforts from you to define the right processes for your use case.




  • bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist

    As Ingo mentioned, this will be project where you can do everything using Rapidminer


    I had done something around twitter search term "zero day", check the video here



    I had process running periodically to capture tweets and then I would analyse them.


    it is difficult to get historic data from sites but mostly news website have some sort of RSS feeds available  rest of it is simple using RM textming, webmining extensions and other third party extension.

  • MCsMCs Member Posts: 8 Contributor II


    As far as I remember when we had to step back in time we used the "Wayback Machine" API. Google it. Maybe you can use RM WEB operator on that results.

    Additionally - in general - a very simple php sniplet can save whole page content, this is useful, if you want to save the content for later use. That can definitely be processed with RM, in linux the saving can be automated - no clue about Win, sorry.


    Hope it helps a little :)




  • vijay_dpgcvijay_dpgc Member Posts: 5 Contributor I

    Thank you, Ingo. I have been trying to work this with Text Mining. Aylien and Rosette are new. The tutorials are spot and there are some videos out there which help. Will get this cracking and share the results with people :)

  • vijay_dpgcvijay_dpgc Member Posts: 5 Contributor I

    Thank you, Patil. Connecting RSS feed to RM text mining is exactly what I am after. Your youtube post was of help!

  • vijay_dpgcvijay_dpgc Member Posts: 5 Contributor I

    This is very important for me, as I have to go back in time and the Wayback Machine will be of much help, I guess. Will try and work around this. Also, Linux is for geeks like you and windows is for amateurs like me ;)

  • vijay_dpgcvijay_dpgc Member Posts: 5 Contributor I

    It would be great if I could give the English Dictionary to the RM software and tell it, Here, can you ignore all these words and give me other stuff? Something like this, 


    Under a high Imphal sky on August 9, Irom Chanu Sharmila finally set herself free from an indefinite hunger strike. As the events of the day unfolded in the Manipur capital, it became poignantly clear that the act of breaking the fast was as much an uncommon act of resistance as her long and brave struggle these past 16 years against atrocity. She has done so with a sense of individual agency, writing poetry, and constantly speaking up with moral clarity. Ms. Sharmila had refused to take food or drink since November 5, 2000, till the Armed Forces (Special Powers) Act was rolled back and the security forces were denied the cloak of immunity in suspected human rights violations. Soon enough she had been arrested, force-fed in a Special Ward at the Jawaharlal Nehru Institute of Medical Sciences that served as her prison, to be released every year and re-arrested. She made the occasional journey out of Imphal, but never one to her family home within the city, having promised her mother that they would meet only once AFSPA was repealed. Now, when she chose to call off the fast, to join the electoral process and even try to become Chief Minister of Manipur, she found she had nowhere to go. Driven, variously, by a sense of betrayal and a fear of underground groups, nobody would shelter her on Tuesday. And she found herself back at the hospital.

    The fresh turns in Ms. Sharmila’s story — the ending of the fast, her desire to join the political process and her peculiar isolation — set a mirror to the state and society. As long as she was on fast, she was in a comfortable zone for both. For the state, the Gandhian non-violence implicit in her method allowed a comparison with the violence of others, positioning her as the good protester, as it were. And in fact it was the unflinching protest by Manipuri women at Imphal’s Kangla fort in 2004 that forced New Delhi to withdraw AFSPA from parts of the State. For others, Ms. Sharmila became the representative of a popular desire to hold the highest moral ground, even as they went on with their lives, though all in the face of the kind of government apathy in Manipur that must shame this country. Reading the intent behind Ms. Sharmila’s decision to pick up the threads of a personal life is akin to a Rorschach test. It’s pointless. But the breaking of the fast is a highly political act too. It demands that we respond to the cause she has given her adult life to. For herself, she has chosen to place faith in the electoral process for reform, a far more messy and risky option than the high pedestal of unyielding non-cooperation she had secured.


    If I do things for a two years in the past and one year in the future, it would be golden.

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder



    There is an operator "Filter Stopword (Dictionary)" in the Text Processing extension which does exactly that.  But of course you would need to define this dictionary yourself which might be a bit too cumbersome?




Sign In or Register to comment.