"Web Usage Mining: Finding Exit Pages"

AreAre Member Posts: 5 Contributor II
edited May 2019 in Help
Hi Community,

right now I make some logfile analysis and I was wondering if there is an "easy" way to get the exit pages out of the Logs.
Of course I can manually look at all sessions an tell RM to take the URI in Row Nr. X as Exit Page for session Y but having a process doing this automatically would be much more convenient.  ;)

The goal is to create a new attribute containing the Exit Page for each session and use this afterwards for classification.

Does somebody have experience in this area and can help me out?




  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    this is possible if you make creative use of the operators that are available.

    As a short how to I came up with that process:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
        <process expanded="true" height="161" width="815">
          <operator activated="true" class="retrieve" compatibility="5.0.11" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          <operator activated="true" class="generate_id" compatibility="5.0.11" expanded="true" height="76" name="Generate ID" width="90" x="179" y="30"/>
          <operator activated="true" class="loop_values" compatibility="5.0.11" expanded="true" height="76" name="Loop Values" width="90" x="358" y="30">
            <parameter key="attribute" value="label"/>
            <process expanded="true" height="491" width="881">
              <operator activated="true" class="filter_examples" compatibility="5.0.11" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="label=%{loop_value}"/>
              <operator activated="true" class="sort" compatibility="5.0.11" expanded="true" height="76" name="Sort" width="90" x="179" y="30">
                <parameter key="attribute_name" value="id"/>
                <parameter key="sorting_direction" value="decreasing"/>
              <operator activated="true" class="extract_macro" compatibility="5.0.11" expanded="true" height="60" name="Extract Macro" width="90" x="313" y="30">
                <parameter key="macro" value="page"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="a1"/>
                <parameter key="example_index" value="1"/>
              <operator activated="true" class="generate_attributes" compatibility="5.0.11" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="30">
                <list key="function_descriptions">
                  <parameter key="Exit_Page" value="%{page}"/>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Sort" to_port="example set input"/>
              <connect from_op="Sort" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
              <connect from_op="Extract Macro" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
          <operator activated="true" class="append" compatibility="5.0.11" expanded="true" height="76" name="Append" width="90" x="492" y="30"/>
          <connect from_op="Retrieve" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
    It will extract the attribute value a1 from the sequence defined by the label values of iris :)

  • Options
    AreAre Member Posts: 5 Contributor II
    HI Sebastian,

    GREAT! Thanks a lot.

    Now I just have to decide if I want to switch RM5 or stick with 4.6 - Sorry for being unclear.

    I made some first steps in RM5 and it feels "different" but it  seems, that RM5 is much more convenient.


  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Edin,
    it is. Believe me, it might feel different and in the first time I doubt myself you could develop processes faster this way, but as soon as you have found some nice features like the new search box that allows using Camel Case to filter the operators, you will love it.
    So go to the search box in the new operator view, type "AMode" and you will understand Camel case. Just use the arrow keys to select an operator, press enter and it will show up in the process. You can build the processes very fast this way, especially if you then use the auto wire button and only change the wrong connections...

Sign In or Register to comment.