Log File Import

ljorzicljorzic Member Posts: 12 Contributor I
Hey Everyone,
I'm trying to use RapidMiner to Analyze Some Logged Events which I am getting from a System. Unfortunately, the Logs are not very well sorted and need cleaning up. I have tried the csv-import module on the textfile-logs which I have, but even with regex-matching I find it to be not customizable enough to cover all items in the log. Is there a similar extension or Plugin with more flexible Definitions which I could use to import logs into a Table?

Eventually, the lines look a bit like
<some Information> :Event :Timestamp : <some item containing "::" sometimes>: Action
but not all do, there are simple one-worder lines as well.

Do I need to buid external Data Preprocessing, or can this be done within RM?

I tried searching the forum and the extension marketplace as well with limited success. Any recommendations?

Best regards,
Lino

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @ljorzic,

    Could you share your log file, or a least a sample of this log file, in order we can 
    better understand and try to find the more relevant solution ?

    Regards,

    Lionel
  • ljorzicljorzic Member Posts: 12 Contributor I
    edited January 2019
    Sure thing. Items look like this:
    controller -- <perform> Client 'xHD-UI-ExU#1-GENERIC' tries to control device '_UI-PROFILER_' which is not in accessProfile
    controller -- Device::UIProfiler::_handleEvent() for _UI-PROFILER_: Event: 1262488099.821: mouse    pressed
    controller -- ClientConnection: ('xHD-UI-ExU#1-GENERIC'/10.10.10.1) <<< <perform action='mouse' id='_UI-PROFILER_' value='pressed'/>
    controller -- <perform> Client 'xHD-UI-ExU#1-GENERIC' tries to control device '_UI-PROFILER_' which is not in accessProfile
    controller -- Device::UIProfiler::_handleEvent() for _UI-PROFILER_: Event: 1262488101.397: mouse    pressed
    controller -- Device::UIProfiler::_handleEvent() for _UI-PROFILER_: Event: 1262488101.460: perform  BLUETOOTH_CONTROL_S4 => connectDevice: xx:xx:xx:xx:xx:xx (2)
    controller -- Device::UIProfiler::_handleEvent() for _UI-PROFILER_: Event: 1262488112.508: perform  BLUETOOTH_SOURCE_S4 => routeRequest: (2)
    I'm looking to filter out the timestamp, and both "mouse pressed" events as well als the "perform" requests and the numer in brackets (...) in the end on those.

    Thanks for the quick response! :smile:
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @ljorzic,

    I builded a process which extract the information like that : 

    the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000-SNAPSHOT">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000-SNAPSHOT" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="9.2.000-SNAPSHOT" expanded="true" height="68" name="Read Excel" width="90" x="112" y="34">
            <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Log_extraction\log_extraction.xlsx"/>
            <parameter key="sheet_selection" value="sheet number"/>
            <parameter key="sheet_number" value="1"/>
            <parameter key="imported_cell_range" value="A1"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="first_row_as_names" value="true"/>
            <list key="annotations"/>
            <parameter key="date_format" value=""/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Att1.true.polynominal.attribute"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.2.000-SNAPSHOT" expanded="true" height="103" name="Filter Examples" width="90" x="246" y="34">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="custom_filters"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="Att1.contains.Event"/>
            </list>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <operator activated="true" class="split" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Split" width="90" x="380" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="split_pattern" value="Event:"/>
            <parameter key="split_mode" value="ordered_split"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.2.000-SNAPSHOT" expanded="true" height="103" name="Multiply (2)" width="90" x="380" y="136"/>
          <operator activated="true" class="multiply" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Multiply" width="90" x="514" y="85"/>
          <operator activated="true" class="replace" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Replace (2)" width="90" x="648" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="perform"/>
          </operator>
          <operator activated="true" class="replace" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Replace (3)" width="90" x="782" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="perform"/>
          </operator>
          <operator activated="false" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="715" y="34">
            <parameter key="create_word_vector" value="true"/>
            <parameter key="vector_creation" value="TF-IDF"/>
            <parameter key="add_meta_information" value="true"/>
            <parameter key="keep_text" value="false"/>
            <parameter key="prune_method" value="none"/>
            <parameter key="prune_below_percent" value="3.0"/>
            <parameter key="prune_above_percent" value="30.0"/>
            <parameter key="prune_below_rank" value="0.05"/>
            <parameter key="prune_above_rank" value="0.95"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="select_attributes_and_weights" value="false"/>
            <list key="specify_weights"/>
            <process expanded="true">
              <operator activated="true" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Information" width="90" x="179" y="34">
                <parameter key="query_type" value="Regular Expression"/>
                <list key="string_machting_queries"/>
                <parameter key="attribute_type" value="Nominal"/>
                <list key="regular_expression_queries">
                  <parameter key="action" value="(?&lt;=perform)(.*)(?==&gt;)"/>
                </list>
                <list key="regular_region_queries"/>
                <list key="xpath_queries"/>
                <list key="namespaces"/>
                <parameter key="ignore_CDATA" value="true"/>
                <parameter key="assume_html" value="true"/>
                <list key="index_queries"/>
                <list key="jsonpath_queries"/>
              </operator>
              <operator activated="true" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Information (2)" width="90" x="380" y="34">
                <parameter key="query_type" value="Regular Expression"/>
                <list key="string_machting_queries"/>
                <parameter key="attribute_type" value="Nominal"/>
                <list key="regular_expression_queries">
                  <parameter key="number" value="\((.*?)\)"/>
                </list>
                <list key="regular_region_queries"/>
                <list key="xpath_queries"/>
                <list key="namespaces"/>
                <parameter key="ignore_CDATA" value="true"/>
                <parameter key="assume_html" value="true"/>
                <list key="index_queries"/>
                <list key="jsonpath_queries"/>
              </operator>
              <connect from_port="document" to_op="Extract Information" to_port="document"/>
              <connect from_op="Extract Information" from_port="document" to_op="Extract Information (2)" to_port="document"/>
              <connect from_op="Extract Information (2)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="split" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Split (2)" width="90" x="916" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="split_pattern" value="[(]"/>
            <parameter key="split_mode" value="ordered_split"/>
          </operator>
          <operator activated="true" breakpoints="after" class="replace" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Replace (4)" width="90" x="1050" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="[)]"/>
          </operator>
          <operator activated="true" breakpoints="after" class="select_attributes" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Select Attributes (2)" width="90" x="1184" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="split" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Split (3)" width="90" x="514" y="289">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="split_pattern" value=":"/>
            <parameter key="split_mode" value="ordered_split"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="289">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="Att1_2_1|Att1_2_2"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Rename" width="90" x="782" y="289">
            <parameter key="old_name" value="Att1_2_1"/>
            <parameter key="new_name" value="Timestamp"/>
            <list key="rename_additional_attributes">
              <parameter key="Att1_2_2" value="Action"/>
            </list>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Generate ID" width="90" x="1050" y="34">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Rename (2)" width="90" x="1318" y="187">
            <parameter key="old_name" value="Att1_2_2"/>
            <parameter key="new_name" value="Count"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Generate ID (2)" width="90" x="1251" y="85">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Join" width="90" x="1318" y="34">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
          <connect from_op="Multiply (2)" from_port="output 1" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply (2)" from_port="output 2" to_op="Split (3)" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Replace (2)" to_port="example set input"/>
          <connect from_op="Replace (2)" from_port="example set output" to_op="Replace (3)" to_port="example set input"/>
          <connect from_op="Replace (3)" from_port="example set output" to_op="Split (2)" to_port="example set input"/>
          <connect from_op="Split (2)" from_port="example set output" to_op="Replace (4)" to_port="example set input"/>
          <connect from_op="Replace (4)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
          <connect from_op="Split (3)" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Rename (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
          <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    
    Does this process answer to your need ?

    Regards, 

    Lionel

  • ljorzicljorzic Member Posts: 12 Contributor I
    Hey Lionel,
    yes, this does help quite a lot! Still working on understanding the details, but I will get there.

    As it is now, I have to copy my Logfiles into Excel first to open them, which is in line with my original concern: Is there no "easy" way to just read lines from a simple text/logfile, without creating an Excel or csv document?

    Thanks a bunch and best regards!
    Lino
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @ljorzic,

    What are the extension / type of your simple text/logfile ?

    Regards,

    Lionel
  • jczogallajczogalla Employee, Member Posts: 144 RM Engineering

    you can use the Read CSV operator with other file extensions than csv, as long as it is a text file. Another option might be to use the Read Documents operator from the text mining extension (but I think you should be fine with the CSV approach).

    Cheers
    Jan
  • ljorzicljorzic Member Posts: 12 Contributor I
    edited January 2019
    Hey guys,
    thank you both for your suggestions. In the meantime I have managed to get it done by using the csv import with just treating each row as one cell - works.

    Now to the second part: I want to used RapidMiner to compare the Count-variable to an ideal value, depending on the "Action" field and visualize the resulting statistics. I've tried playing around but am unsure on how to approach this in terms of the right operators for the job. Of course I am willing to do the heavy lifting myself but woul appreciate a hint in the right direction.

    Thanks :smile:
  • ljorzicljorzic Member Posts: 12 Contributor I
    Okay, I have now figured out, that I might want to use the Generate Attributes operator for this. But: Is it possible to access values in another row? E.g. subtract the value of the "Count" attribute and subtract it from the id to use a value in the resulting row for my operation?

    Thanks guys!
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    You can do lag transformations with several of the time series operators, either by windowing (operator now included in the base Studio distribution) or by using the Finance and Economics extension (a free extension available in the marketplace).

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • ljorzicljorzic Member Posts: 12 Contributor I
    edited January 2019
    Hi @Telcontar120 , I'm afraid that won't work for me, since I'm looking to compare specific events whose distance depends on an attribute in the actual row I'm looking at. All windowing operators I have looked at use static distances - or did I overlook something?

    So, looking at the example above: I would like to add an attribute to id 3 in the example data of @lionelderkrikor , subtracting the time values from id 1 and id 3 and write that to the attribute in the row with id 3. Is that possible?

    Thanks :smile:
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @ljorzic,
    Your static example (with id1 / id3) is possible using the Lag Series operator of the Values Series extension (to install from MarketPlace). 
    But I understand , you are searching a more general use : Can you describe the "rules" to create your new attribute and the "rule(s)" to subtracting the time values ?

    Regards,

    Lionel
  • ljorzicljorzic Member Posts: 12 Contributor I
    Hey @lionelderkrikor , thank you for getting on this case again. Sure, let's look at the data given above in your example:

    I have a "perform BLUETOOTH_CONTROL_..." Action, which is associated with a Count of 2, which correlates with the two "mouse pressed" actions before that. I want to find the "Timestamp" difference between the first of the two clicks in the "count" (could also be three, four, ten...clicks) and write it to that perform Action (id 3) as an additional attribute. Hope this wasn't too confusing.

    Generally speaking: Each logged "perform" request ist a result of some mouse pressed actions on an interface - and I am interested in how long it took to get to the request and write that alongside the request as an attribute.

    Thanks once more!
    Lino
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi again @ljorzic,
     - You are looking only at the "BLUETOOTH_CONTROL " action ?
     - If I good understand for example, the first BLUETOOTH_CONTROL is associated to the first "mouse pressed" ?
    and you want calculate the timestamp difference between these 2 events ? and write this value in a new attribute at the level of the "BLUETOOTH_CONTROL" row ?
     - Is the "BLUETOOTH_SOURCE" action is associated to the second "mouse pressed" ? 

    Regards,

    Lionel 

  • ljorzicljorzic Member Posts: 12 Contributor I
    no, I am looking at any action in general, that appears in the field "Action" and has a "Count" >0. The count given in the table is merely a count of "mouse pressed" Actions before the given action was performed, in this case "BLUETOOTH_CONTROL", but the action can differ. So I basically need to take the "Count" number and look in the past for the amount of "mouse pressed" actions specified by "count". Then take the "mouse pressed" at distance of "count" in the past and calculate the difference in time stamps. After that, append it to the row of the original Action.

    If this text doesn't help, I'll resort to drawing a picture :smile: 
    Thanks for your support!
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @ljorzic,

    OK, your explanations are clear : no need of drawing  ;)

    Unfortunately, I didn't find a solution with RapidMiner's native operators, so I propose
    a process using a Python script. In the results, you have a new column called "timestamp_diff" : 

    To execute this process, you need to : 
     - install Python on your computer.
     - install the Python Scripting extension (from the marketplace).

    The process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000-SNAPSHOT">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000-SNAPSHOT" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="9.2.000-SNAPSHOT" expanded="true" height="68" name="Read Excel" width="90" x="45" y="187">
            <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Log_extraction\log_extraction.xlsx"/>
            <parameter key="sheet_selection" value="sheet number"/>
            <parameter key="sheet_number" value="1"/>
            <parameter key="imported_cell_range" value="A1"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="first_row_as_names" value="true"/>
            <list key="annotations"/>
            <parameter key="date_format" value=""/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Att1.true.polynominal.attribute"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.2.000-SNAPSHOT" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="187">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="custom_filters"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="Att1.contains.Event"/>
            </list>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <operator activated="true" class="split" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Split" width="90" x="313" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="split_pattern" value="Event:"/>
            <parameter key="split_mode" value="ordered_split"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.2.000-SNAPSHOT" expanded="true" height="103" name="Multiply (2)" width="90" x="447" y="187"/>
          <operator activated="true" class="multiply" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Multiply" width="90" x="581" y="136"/>
          <operator activated="true" class="replace" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Replace (2)" width="90" x="715" y="136">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="perform"/>
          </operator>
          <operator activated="true" class="replace" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Replace (3)" width="90" x="849" y="136">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="perform"/>
          </operator>
          <operator activated="false" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="45" y="34">
            <parameter key="create_word_vector" value="true"/>
            <parameter key="vector_creation" value="TF-IDF"/>
            <parameter key="add_meta_information" value="true"/>
            <parameter key="keep_text" value="false"/>
            <parameter key="prune_method" value="none"/>
            <parameter key="prune_below_percent" value="3.0"/>
            <parameter key="prune_above_percent" value="30.0"/>
            <parameter key="prune_below_rank" value="0.05"/>
            <parameter key="prune_above_rank" value="0.95"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="select_attributes_and_weights" value="false"/>
            <list key="specify_weights"/>
            <process expanded="true">
              <operator activated="true" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Information" width="90" x="179" y="34">
                <parameter key="query_type" value="Regular Expression"/>
                <list key="string_machting_queries"/>
                <parameter key="attribute_type" value="Nominal"/>
                <list key="regular_expression_queries">
                  <parameter key="action" value="(?&lt;=perform)(.*)(?==&gt;)"/>
                </list>
                <list key="regular_region_queries"/>
                <list key="xpath_queries"/>
                <list key="namespaces"/>
                <parameter key="ignore_CDATA" value="true"/>
                <parameter key="assume_html" value="true"/>
                <list key="index_queries"/>
                <list key="jsonpath_queries"/>
              </operator>
              <operator activated="true" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Information (2)" width="90" x="380" y="34">
                <parameter key="query_type" value="Regular Expression"/>
                <list key="string_machting_queries"/>
                <parameter key="attribute_type" value="Nominal"/>
                <list key="regular_expression_queries">
                  <parameter key="number" value="\((.*?)\)"/>
                </list>
                <list key="regular_region_queries"/>
                <list key="xpath_queries"/>
                <list key="namespaces"/>
                <parameter key="ignore_CDATA" value="true"/>
                <parameter key="assume_html" value="true"/>
                <list key="index_queries"/>
                <list key="jsonpath_queries"/>
              </operator>
              <connect from_port="document" to_op="Extract Information" to_port="document"/>
              <connect from_op="Extract Information" from_port="document" to_op="Extract Information (2)" to_port="document"/>
              <connect from_op="Extract Information (2)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="split" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Split (2)" width="90" x="983" y="136">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="split_pattern" value="[(]"/>
            <parameter key="split_mode" value="ordered_split"/>
          </operator>
          <operator activated="true" class="replace" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Replace (4)" width="90" x="1117" y="136">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="[)]"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Select Attributes (2)" width="90" x="1251" y="136">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="split" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Split (3)" width="90" x="581" y="238">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Att1_2"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="split_pattern" value=":"/>
            <parameter key="split_mode" value="ordered_split"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Select Attributes" width="90" x="715" y="238">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="Att1_2_1|Att1_2_2"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Rename" width="90" x="849" y="238">
            <parameter key="old_name" value="Att1_2_1"/>
            <parameter key="new_name" value="timestamp"/>
            <list key="rename_additional_attributes">
              <parameter key="Att1_2_2" value="action"/>
            </list>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Generate ID" width="90" x="983" y="238">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Rename (2)" width="90" x="1385" y="187">
            <parameter key="old_name" value="Att1_2_2"/>
            <parameter key="new_name" value="count"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Generate ID (2)" width="90" x="1519" y="187">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Join" width="90" x="1653" y="187">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <operator activated="true" class="trim" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Trim" width="90" x="1787" y="187">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="python_scripting:execute_python" compatibility="9.1.000" expanded="true" height="103" name="Execute Python" width="90" x="1921" y="187">
            <parameter key="script" value="import pandas as pd&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main(data):&#10;&#10;  data_2 = data.sort_values(by='timestamp', ascending=False)&#10;  data_2.index = pd.RangeIndex(len(data_2.index))&#10;  data_2['timestamp_diff'] = '?'&#10;  for i in data_2['action']: &#10;    index_i = int(data_2[data_2['action']==i].index[0])&#10;    delta = 0&#10;    for j in range(index_i,len(data_2['action'])) : &#10;      if 'mouse' in data_2['action'][j] and 'pressed' in data_2['action'][j]:&#10;        delta+=1&#10;        if delta ==data_2['count'][index_i]:&#10;          print(data_2['count'][index_i])&#10;          data_2['timestamp_diff'][index_i] = round(data_2['timestamp'][index_i] - data_2['timestamp'][j],3)&#10;         &#10;        else :&#10;          delta = delta&#10;       &#10;  data = data_2.sort_values(by='timestamp')&#10;  data.index = pd.RangeIndex(len(data.index))&#10; &#10;    &#10;&#10;    # connect 2 output ports to see the results&#10;  return data"/>
            <parameter key="use_default_python" value="true"/>
            <parameter key="package_manager" value="conda (anaconda)"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.2.000-SNAPSHOT" expanded="true" height="82" name="Set Role" width="90" x="2055" y="187">
            <parameter key="attribute_name" value="id"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
          <connect from_op="Multiply (2)" from_port="output 1" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply (2)" from_port="output 2" to_op="Split (3)" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Replace (2)" to_port="example set input"/>
          <connect from_op="Replace (2)" from_port="example set output" to_op="Replace (3)" to_port="example set input"/>
          <connect from_op="Replace (3)" from_port="example set output" to_op="Split (2)" to_port="example set input"/>
          <connect from_op="Split (2)" from_port="example set output" to_op="Replace (4)" to_port="example set input"/>
          <connect from_op="Replace (4)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
          <connect from_op="Split (3)" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Rename (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
          <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_op="Trim" to_port="example set input"/>
          <connect from_op="Trim" from_port="example set output" to_op="Execute Python" to_port="input 1"/>
          <connect from_op="Execute Python" from_port="output 1" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    

    Hope it helps,

    Regards,

    Lionel


  • ljorzicljorzic Member Posts: 12 Contributor I
    Hey @lionelderkrikor , thank you so much for all that effort! I have tried your code and had actually started to whip something up in Python myself. Still working on it, since some details are still missing. Will report back once it's fully working or I get stuck again.

    Best regards and thank you!
    Lino
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @ljorzic,

    Don't hesitate to post your future questions, here in the community.

    Good continuation...

    Best regards,
    Lionel 
Sign In or Register to comment.