How to find sentences and to group results

neomzwneomzw Member Posts: 1 Newbie
I'm looking for the frequency of using some words and sentences in a directory of files. i would like to compare them all at once (the use of the words and the use of the sentences). I already have created regular expressions for the sentences I'm looking for in the text. 
My questions are: 
(1) how to search for sentences with a specific pattern?
I've used Tokenize and Filter Tokens for the words, but for the sentences I didn't know what to use.
(2)how to group results per project (each project is a folder of subfolders and text files) and per group of projects (a directory of zipped folders).
The results i'm getting so far are in tables showing a row per file instead of per folder or directory.

Tank you

Answers

  • MarlaBotMarlaBot The Friendly RapidMiner Dog Bot Administrator, Moderator, Employee, Member Posts: 57  Community Manager
    Hi @neomzw - this is MarlaBot. I found these great videos on our RapidMiner Academy that you may find helpful:
    Instructional Video: Text Association Rules (Viewing time: ~10m)
    Instructional Video: Loading Text into RapidMiner (Viewing time: ~6m)
    Please LIKE my comment if it helps! 👇

    MarlaBot <3
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,383  Community Manager
    hi @neomzw sorry no one has chimed in here. Is this still an issue?

    Scott
    Tghadially
  • kaymankayman Member Posts: 357   Unicorn
    If you still need an answer : the file folder problem can be solved by setting the 'enable macros' option in the parameter part of the loop files operator and generate a new field that will contain the needed values (like filename or folder). From there you can use other loop operators (like loop values to aggregate on the newly created folder field).

    As in attached example : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="concurrency:loop_files" compatibility="9.3.001" expanded="true" height="82" name="Loop Files" width="90" x="246" y="34">
            <parameter key="filter_type" value="glob"/>
            <parameter key="recursive" value="false"/>
            <parameter key="enable_macros" value="true"/>
            <parameter key="macro_for_file_name" value="file_name"/>
            <parameter key="macro_for_file_type" value="file_type"/>
            <parameter key="macro_for_folder_name" value="folder_name"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="34">
                <list key="function_descriptions">
                  <parameter key="MyFolder" value="%{folder_name}"/>
                </list>
                <parameter key="keep_all" value="true"/>
              </operator>
              <connect from_port="file object" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_port="output 1"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    


    Tghadially
Sign In or Register to comment.