Options

"Web mining Rapidminer robot_filter"

antoineantoine Member Posts: 7 Contributor II
edited June 2019 in Help
Hello all,


I don't if it is the right place to post my request. I need to know how you ( a Rapid Miner user who uses it as a web miningusage tool)- when you're importing your web log file- do to set your robot_filter file.

  It works when I type in my robot_filter file just [g|G]oogle for example. However I don't really want to do so for a thousand different bots...

So I've tried to find a list which I can paste in my file. On this website http://www.robotstxt.org/db/all.txt   they offer the possibility to download the robots list in a .txt format .
But apparently RapidMiner doesn't like it, i got many errors due to bad characters and wrong enclosure...

  So what do I have to do in order to have a proper robots list which can be read by rapidminer ?


Thank you in advance,


          Antoine

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Antoine,
    what does RapidMiner complain about in detail? Unfortunately I'm not too familiar with the web mining operators, but I assume the file must consists of regular expressions? Then you would need to escape special characters of regular expressions, you will find some advice on this on google.

    Greetings,
      Sebastian
Sign In or Register to comment.