Google Scholar Citation Extraction

sgenzer · November 2017

Screen Shot 2017-11-01 at 1.35.57 PM.png

Hello RapidMiners -

So today I had the task to extract and organize content from a Google Scholar query. Google does a very good job preventing you from scraping/crawling so you have to start "old school" by going to each page of your search and saving the html as a text file. Once you do that, you can clean it all up and organize, etc... I did a search for the keyword "rapidminer" (of course), saved the first 100 pages (tedious but not too bad), and then used the attached process to clean it all up. Maybe some of you will find this useful?

Scott

puserc · June 2018

Would you please give us the xml version of this model ?

I found some problems to run it in Rapidminer 8.2.001

@sgenzer wrote:

Hello RapidMiners -

So today I had the task to extract and organize content from a Google Scholar query. Google does a very good job preventing you from scraping/crawling so you have to start "old school" by going to each page of your search and saving the html as a text file. Once you do that, you can clean it all up and organize, etc... I did a search for the keyword "rapidminer" (of course), saved the first 100 pages (tedious but not too bad), and then used the attached process to clean it all up. Maybe some of you will find this useful?

Scott

sgenzer · June 2018

hi @puserc - the XML is there in the attachment to the article. An ".rmp" file in RapidMiner is exactly the same as the XML you see.

puserc · June 2018

I know, the problem is that I couldn't run directly, there are some issues for some nodes. That's why I've asked for the XML version.

sgenzer · June 2018

just open that .rmp in any text editor - copy and paste the XML into RapidMiner XML panel. That should do the trick.

19316071 · October 2018

Hi @sgenzer

I am a new learner of RapidMiner and have the same task. I want to extract the Google Citations. I have run through the tutorial of RapidMiner for a bigenner level learning. Can you please explain me a little more for a head start that how have you built the process. It will be a great help for me.

I am also keen to learn the text mining in depth on RapidMiner for extracting information from published research articles. Can you or anyone else pleae also advise me some good learning resources?

Thanks in anticipation

Mudassar

19316071@student.westernsydney.edu.au

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Google Scholar Citation Extraction

Comments