"Web Mining Operators"

fbarthfbarth Member Posts: 2 Contributor I
edited May 2019 in Help
All folks,

I'm trying to use the Reader Server Log Operator, but I cannot find any example about the config file (a necessary parameter of Reader Server Log Operador).

Anyone can tell me where I can find an example? I searched into http://polliwog.sourceforge.net/, but I couldn't find.

Best regards,

Fabrício J. Barth
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    perhaps this one will help you. Detailed instructions are available on the url you already posted.

    <!--
    Copyright 2005 - Gary Bentley

    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
    -->

    <!--

      This log format models the Apache Combined Log format (NCSA).
      i.e. "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""

      Note the order of the field elements IS important.  The fields are read in and the log entry
      processed by getting each field to "consume" the part that it handles.  The remainder of the
      entry is then passed to the next field.
    -->
    <config>

      <!--
        The hostname part of the log file.  (%h)
      -->
      <field class="org.polliwog.fields.HostnameField" />

      <!--
        A blank field used to "skip" that part of the line. (%l)
      -->
      <field blank="true" />

      <!--
        A blank field used to "skip" that part of the line. (%u)
      -->
      <field blank="true" />

      <!--
        Date/time of the entry. (%t)

        Note:  If your log file is in a language OTHER THAN english then you should modify the "locale" param value below.  Usually, if you are using Apache then the log file will be written (especially the dates) in "english".  The value should have 2 parts, the first part is the "language" (one of the constants defined in: http://www.loc.gov/standards/iso639-2/englangn.html, from the 639-1 column ONLY), the second part should be the "country" (one of the constants defined: http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html). ; The values should be separated by "/".  i.e. "en/US" or "fr/FR".  Only change this value IF your log file is written in a language other than English. 
      -->
      <field class="org.polliwog.fields.DateTimeField"
            openQuote="["
            closeQuote="]">
        <param id="locale"
              value="en/US" />
        <param id="format"
              value="dd/MMM/yyyy:HH:mm:ss Z" />
      </field>

      <!--
        The request line, i.e. what did the browser/search engine ask for. (\"%r\")
      -->
      <field class="org.polliwog.fields.RequestLineField"
              openQuote='"'
              closeQuote='"'
              escapedBy="\" />

      <!--
        The status code returned by the web server. (%>s)
      -->
      <field class="org.polliwog.fields.StatusCodeField" />

      <!--
        The size of the returned document. (%b)
      -->
      <field class="org.polliwog.fields.SizeField" />

      <!--
        The referer page. (\"%{Referer}i\")
      -->
      <field class="org.polliwog.fields.RefererHeaderField"
              openQuote='"'
              closeQuote='"' />

      <!--
        The request header, i.e. what did the browser/search engine announce itself as.
        (\"%{User-agent}i\")
      -->
      <field class="org.polliwog.fields.RequestHeaderField"
              openQuote='"'
              closeQuote='"'
              escapedBy="\">
        <param id="type"
              value="user-agent" />
      </field>

    </config>
  • makchishingmakchishing Member Posts: 6 Contributor II
    fbarth wrote:

    All folks,

    I'm trying to use the Reader Server Log Operator, but I cannot find any example about the config file (a necessary parameter of Reader Server Log Operador).

    Anyone can tell me where I can find an example? I searched into http://polliwog.sourceforge.net/, but I couldn't find.

    Best regards,

    Fabrício J. Barth
    I have server logs that are zipped to gz file(320 MB only),
    If upzip to text file, around 3GB. > <

    Can rapidminer support read server log for a zipped format?
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    actual RapidMiner can't read zipped log files. Of course it would be possible without much work, but what benefit would result from that? Before handling the data RapidMiner would have to extract it. So the data would be extracted not just once, but each time you process it...
    If you need to process the data in an online fashion and extract them each time the process is executed to work on the most recent data, just use the execute operator for shell commands.

    Greetings,
      Sebastian
  • makchishingmakchishing Member Posts: 6 Contributor II
    Sebastian Land wrote:

    Hi,
    actual RapidMiner can't read zipped log files. Of course it would be possible without much work, but what benefit would result from that? Before handling the data RapidMiner would have to extract it. So the data would be extracted not just once, but each time you process it...
    If you need to process the data in an online fashion and extract them each time the process is executed to work on the most recent data, just use the execute operator for shell commands.

    Greetings,
      Sebastian
    Thanks Sebastian Land ,

    It is very beneficial,
    Save harddisk space, save the time for extraction, lots of time is wasted to wait for the extraction.
    It is very easy to read zipped file by java, as I searched from web.
    >< I have found a simple program to solve that.
    http://www.java2s.com/Code/Java/File-Input-Output/Readsomedatafromagzipfile.htm
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I'm completely aware that it is pretty easy to read zipped files. I only doubt that it is useful, since you will have to extract the data anyway. What does it make for a difference if you extract it once before reading the data or during reading the data? If you execute the process twice, you will have to do the extraction twice. So where does the benefit comes from?

    Greetings,
      Sebastian
Sign In or Register to comment.