Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

[Solved] Problem with Filter Tokens (by Region)

KallustKallust Member Posts: 4 Contributor I
edited November 2018 in Help
Hi,
I'm currently working on my bachelor thesis in which I examine how companies report about  Research and Development in their annual reports.
Unfortunately I'm completly unexperienced with rapidminer.
My current approach is as follows:
1. Load annual reports in rapid miner (I have them as .txt files)
2. Tokenize the reports into sentences, using .:?! as charakters
3.Use the Filter Tokens (by Region) to extract the sentence before and after a sentence containing a keyword (tokens before = 1 and tokens after = 1).
4. Save in an .xlsx file for further editing.

1. and 2. work as they should my problem is with 3.
I don't geht any results out of the Region filter. It just seems to do nothing. It also forces me to insert both a string and a regular expression, the differnce isn't clear to me (don't know if this is important). .I think this guy had a similar problem http://rapid-i.com/rapidforum/index.php/topic,6021.0.html

I used an online XML validator and it didn't find an error.
Hopefully you could help me with my problem.

my code looks as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="179" y="120">
       <list key="text_directories">
         <parameter key="input" value="C:\Users\Administrator\Documents\Studium\Bachelorarbeit\RapidMiner\Test\Quelle"/>
       </list>
       <process expanded="true">
         <operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize" width="90" x="179" y="120">
           <parameter key="mode" value="specify characters"/>
           <parameter key="characters" value=".:?!"/>
         </operator>
         <operator activated="true" class="text:filter_tokens_by_regions" compatibility="5.3.002" expanded="true" height="60" name="Filter Tokens (by Region)" width="90" x="380" y="120">
           <parameter key="string" value="Programm"/>
           <parameter key="regular_expression" value="Programm"/>
         </operator>
         <connect from_port="document" to_op="Tokenize" to_port="document"/>
         <connect from_op="Tokenize" from_port="document" to_op="Filter Tokens (by Region)" to_port="document"/>
         <connect from_op="Filter Tokens (by Region)" from_port="document" to_port="document 1"/>
         <portSpacing port="source_document" spacing="0"/>
         <portSpacing port="sink_document 1" spacing="0"/>
         <portSpacing port="sink_document 2" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="write_excel" compatibility="5.3.015" expanded="true" height="76" name="Write Excel" width="90" x="380" y="165">
       <parameter key="excel_file" value="C:\Users\Administrator\Documents\Studium\Bachelorarbeit\RapidMiner\Test\Ergebnis.xlsx"/>
       <parameter key="file_format" value="xlsx"/>
       <parameter key="sheet_name" value="RapidMiner Data0"/>
     </operator>
     <connect from_op="Process Documents from Files" from_port="example set" to_op="Write Excel" to_port="input"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
   </process>
 </operator>
</process>
[ /code]

Answers

  • KallustKallust Member Posts: 4 Contributor I
    I figured out the solution myself, problem was that I searched for "equal" and "contains" at the same time. With sentences as token I could never find a single word because it didn't equal the whole sentence.
Sign In or Register to comment.