LogFileSource

keildkeild Member Posts: 10 Contributor II
edited November 2018 in Help

Hi guys,

After I managed to import around 200 000 examples into rapidMiner, I recognised that the software transforms the logfile date into a numeric value. But as I wanted to include descriptive statistics into the overall analysis I searched for an operator rebuilding the timestamp.

As I realized that there are only operators which transform date formats into numeric or nominal values I got really upset. Does anybody of you know how to transform numeric date values into a readable timestamp? Please help me as I don't want to use other logfile open source tools for analysing them in a descriptive way. I was looking for creating some basic operator chains for this task to use them in the future. :(

Thank you!

Answers

  • RalfKlinkenbergRalfKlinkenberg Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member, Unconfirmed, University Professor Posts: 68 RM Founder
    Hi Keild,

    you can transform numerical attributes into date attributes within two steps using the Numerical2Polynominal and Nominal2Date operators. Here is a small example:

        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="Nominal2Date" class="Nominal2Date">
            <parameter key="attribute_name" value="date"/>
            <parameter key="date_format" value="yyyyMMdd"/>
        </operator>
    If you want to restrict these transformations to some individual attributes, you can achieve this using the AttributeSubsetPreprocessing.  Extended Example:

        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="attribute_name_regex" value="date"/>
            <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
            </operator>
            <operator name="Nominal2Date" class="Nominal2Date">
                <parameter key="attribute_name" value="date"/>
                <parameter key="date_format" value="yyyyMMdd"/>
            </operator>
        </operator>
    Best regards,
    Ralf
  • keildkeild Member Posts: 10 Contributor II

    Thank you for the fast reply. I tried to fix the problem but it always stops with some error message; that it's unparseable somehow.

    Well I think it would be useful to post some details.

    The log file entries look like that :

    [01/Nov/2008:00:03:26 +0100]

    and the configuration file looks  this way :

    <!-- Date and Time-->
      <field class="org.polliwog.fields.DateTimeField"
            openQuote="["
            closeQuote="]">
        <param id="format"
              value="dd/MMM/yyyy:HH:mm:ss Z" />
      </field>


    After the LogFileSource operator is finished one date looks like this:

    20424903

    as well as after converting it with the Numerical2Polynominal operator

    I tried this configuration for the Nominal2Date operator:

    attr. name : "time" <-- correct one  ;D
    date_type : "date_time"
    date_format : dd'/'MM'/'yyyy':'hh':'mm':'ss' 'Z
    time_zone : SYSTEM
    locale : German <-- I am staying in Sweden right now, but the analysis is on logfiles of a mobile service used in the Alps
    keep_old_attr. : unchecked


  • RalfKlinkenbergRalfKlinkenberg Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member, Unconfirmed, University Professor Posts: 68 RM Founder
    keild wrote:

    The log file entries look like that :
    [01/Nov/2008:00:03:26 +0100]

    and the configuration file looks  this way :
    ...
        <param id="format"
              value="dd/MMM/yyyy:HH:mm:ss Z" />
    ...

    I tried this configuration for the Nominal2Date operator:
    ...
    date_format : dd'/'MM'/'yyyy':'hh':'mm':'ss' 'Z
    ...
    The date formats in the configuration file and in the Nominal2Date operator should be identical, i.e. do not use quotes or single quotes and use three Ms ("MMM"):

    date_format : dd/MMM/yyyy:hh:mm:ss Z

    or

    date_format : dd/MMM/yyyy:HH:mm:ss Z

    Best regards,
    Ralf
  • keildkeild Member Posts: 10 Contributor II
    Thank you again, but there is still the same error message :

    Message: Cannot parse the data in line 1 for attribute time with the date format dd/MMM/yyyy:hh:mm:ss Z: Unparseable date: "20424903"

    The operator chain looks like that

    Root
        ->operator chain
          ->LogFileSource
          ->AttributeSubsetPreprocessing
              ->Numerical2Polynominal
              ->Nominal2Date
              ...... a lot of other adjustments and filter ;D
  • RalfKlinkenbergRalfKlinkenberg Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member, Unconfirmed, University Professor Posts: 68 RM Founder
    Hi Keild,

    the LogFileSource operator obviously already transforms the date string into a number, i.e. the number of milliseconds or seconds or so from a reference date. Hence the date_format parameter of the Nominal2Date attribute should be:

        date_format:  S

    or

        date_format:  s

    or so. For more information on the date and time format strings please mark the Nominal2Date operator in the process view of RapidMiner and press F1 for the help text.

    Best regards,
    Ralf
  • keildkeild Member Posts: 10 Contributor II

    Well, after I tried putting in just "s" "S" "m" ... it started to put in some existing dates, but not the one in the log file ... it's exhausting  :-\

    I will try other inputs. I already read through that info box but anyway, thank you for your help!

    If you have any further information or idea, just come up with it.

    Perhaps you could put the date format of mine into one of your test  log files and delete the other entries to find a solution ?

    But I think you have other things to do ... let's see if I find a solution during the weekend. If so, I will of course put the information into the forum  ;)

    to be continued ... 
  • keildkeild Member Posts: 10 Contributor II
    Well, there is still no solution for this issue.  :(

    Does perhaps anybody know how the LogFileSource operator exactly transforms the timestamp of a log file?

    Is there any source code available to get an idea of what date_format (Nominal2Date operator value) should be chosen in order to rebuild the timestamp?

    Thank you for any answers to these questions!
  • keildkeild Member Posts: 10 Contributor II
    Hi community!

    I am not sure but I think I already solved the problem last Friday without recognizing it. ::)
    As you can read in one of my last posts I already tried the input "m" for date_format
    which should be the correct value for my log file timestamp format.

    I think I was confused that the first entries of the transformed timestamps havn't been the exact timestamps saved within the logfiles.
    Well, this is only true for the first 35 rows of one sample logfile with around 600 rows. But I think a deviation of around 0,5 % through
    transformation is still a good value to work with. ;)

    Ralf, thank you again for this very useful last hint of yours. The LogFileSource operator has somehow transformed the timestamp
    into a specific value
    :

    The sum of minutes which have  to be added to the timestamp of 1970-01-01 00:00 in order to more or less get the actual one. :D

    I hope this information will help others to rebuild their timestamps!

    Best regards!
  • keildkeild Member Posts: 10 Contributor II

    I don't want to open a new thread, thus I just point out another question ;) :

    As I want to create some basic operator chains to analyse log files in a descriptive way:

    Did anybody already have a similar idea using e.g. the OLAP Aggregation operator? ???

    Analysing log files by hits per day/month/year and many other analysing procedures would be possible using it.

    Has anybody some sample operator chains created already? ???

    I would be grateful for any help in this area as I am still a beginner in using rapidMiner.  ::)

  • keildkeild Member Posts: 10 Contributor II

    Hi community,

    perhaps something else : does anybody of you know which unit of time is used for the default session length
    the operator LogFileSource uses?

    Thank you!

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I guess your last question depends on what your server returns. I doubt, it does more than reading the parameter from the log file. The most common unit will be milliseconds, I think.

    Greetings,
      Sebastian
  • keildkeild Member Posts: 10 Contributor II
    Hi Sebastian,

    well, the problem is that I don't know anything about the server configuration for this mobile service. The thing is, that the operator LogFileSource works with a default "session_timeout" value of 400000 .

    And I agree with you that it should be milliseconds, thus a value of around 6.7 minutes. But what is the basis for this value? Is it some kind of average value where session lengths have been observed during a couple of years? ???

    Well, I will just put in some other values for testing purposes. Let's see if there are other results in the end ...

    Regards!
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I doubt that there so much thinking about the default of this parameter. I guess it was more like: "Uhmm, we have milliseconds, lets take an hour, that makes in milliseconds, uhmmm, ok now round it to 400000, looks nicer now."

    And the session length does not depend on the server anyway. It more or less depends on the user and will differ from user to user. But since you don't know, what a user does in one session and when it stops, you have to draw a line. So the default says: If the user didn't do anything for around an hour, the next access is part of a new session.

    Greetings,
      Sebastian
  • UKLN8860UKLN8860 Member Posts: 1 Contributor I
    ;)
Sign In or Register to comment.