RapidMiner

RapidMiner

Web Server Log file Mining to identify frequent items

Regular Contributor

Web Server Log file Mining to identify frequent items

Hi,

I'm trying to use rapid miner web mining extension to mine a Apache access log file. Currently I could read the log file and transform it to sessions. Now I want to know how to do a frequent item set mining? According to my knowledge I have to use FP-Growth first with Binomial converter. But I could not do this, Always it gives errors. Can somebody please help me?

I can not publish the original log file, but this is very similar to it.

Sample of a Log file:

192.168.56.1 - - [15/Jul/2013:15:41:03 +0530] "HEAD /joomla/templates/tk_gen_free_ii/css/responsive.css HTTP/1.1" 200 313 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:03 +0530] "HEAD /joomla/templates/tk_gen_free_ii/css/print.css HTTP/1.1" 200 311 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:03 +0530] "HEAD /joomla/templates/tk_gen_free_ii/warp/js/warp.js HTTP/1.1" 200 328 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:03 +0530] "HEAD /joomla/templates/tk_gen_free_ii/warp/js/responsive.js HTTP/1.1" 200 327 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:03 +0530] "HEAD /joomla/templates/tk_gen_free_ii/warp/js/accordionmenu.js HTTP/1.1" 200 327 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:03 +0530] "HEAD /joomla/templates/tk_gen_free_ii/warp/js/dropdownmenu.js HTTP/1.1" 200 328 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:03 +0530] "HEAD /joomla/templates/tk_gen_free_ii/js/template.js HTTP/1.1" 200 327 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:03 +0530] "HEAD /joomla/images/online-banking.png HTTP/1.1" 200 295 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:03 +0530] "HEAD /joomla/ HTTP/1.1" 200 374 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:04 +0530] "GET /joomla/ HTTP/1.1" 200 13004 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:04 +0530] "GET /joomla/index.php/our-policy HTTP/1.1" 200 16516 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:04 +0530] "GET /joomla/index.php/our-services HTTP/1.1" 200 8035 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:04 +0530] "HEAD /joomla/index.php/contact-us HTTP/1.1" 200 343 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:05 +0530] "GET /joomla/index.php/contact-us HTTP/1.1" 200 7344 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"
192.168.56.1 - - [15/Jul/2013:15:41:05 +0530] "GET /joomla/index.php/test HTTP/1.1" 200 7285 "http://192.168.56.101/joomla/index.php/about-us" "Wget/1.13.4 (linux-gnu)"

1 REPLY
Regular Contributor

Re: Web Server Log file Mining to identify frequent items

Hi,

What do you want to count, i.e. what should be your item? What do you want to achieve? You can also post the XML of your process, so we could try to figure out why your errors occur.

Best
  Marcin