I have list of urls and data should be crawl only from that urls using xpath

vpkrisvpkris Member Posts: 1 Contributor I
edited November 2018 in Help
Dear Team,

I am very much confused and stuck.

I have 1000 urls and i need to extract data from this 1000 urls.

I have stored 1000 urls in csv.

I also seen tutorial from http://vancouverdata.blogspot.com/2011/04/rapidminer-web-crawling-rapid-miner-web.html and http://vancouverdata.blogspot.com/2011/04/web-scraping-rapidminer-xpath-web.html. It is excellent but i am not sure where i am lost to understand.

I have enable all extensions.

Do we have one video tutorial which explains process of import url and getting data.

I must learn about this and i am very much interested. please guide me.

I have been trying this from past 2days but i am missing.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    I am not sure where exactly you got stuck, but if your problem is to access the urls stored in your file at first place, the Get Pages operator is for you. Just load your csv file containing the urls, then pass that data to get pages and specify in the link_attribute parameter which column contains the urls.

    Best regards,
    Marius
  • alphabetoalphabeto Member Posts: 8 Contributor II
    Hi,
    Can rapid miner do a automated regular research (say daily) of a list of words in a list of url, and get each page link?
    I have a list of  words and I want to regularly get every web link where any of these words appears in any of the web url from my predefined urls list.


    Eg. wordlist : qwe, rty
    url list: www.asd.com, www.zxc.com

    What is the process path in order to get daily and automated each web link where words "qwe" and/or "rty" apear in the www.asd.com and/or www.zxc.com


    Many thanks
    Dan
Sign In or Register to comment.