The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

[Solved] Problem with Filter Tokens (by Region)

KallustKallust Member Posts: 4 Contributor I
edited November 2018 in Help
I'm currently working on my bachelor thesis in which I examine how companies report about  Research and Development in their annual reports.
Unfortunately I'm completly unexperienced with rapidminer.
My current approach is as follows:
1. Load annual reports in rapid miner (I have them as .txt files)
2. Tokenize the reports into sentences, using .:?! as charakters
3.Use the Filter Tokens (by Region) to extract the sentence before and after a sentence containing a keyword (tokens before = 1 and tokens after = 1).
4. Save in an .xlsx file for further editing.

1. and 2. work as they should my problem is with 3.
I don't geht any results out of the Region filter. It just seems to do nothing. It also forces me to insert both a string and a regular expression, the differnce isn't clear to me (don't know if this is important). .I think this guy had a similar problem,6021.0.html

I used an online XML validator and it didn't find an error.
Hopefully you could help me with my problem.

my code looks as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
 <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="179" y="120">
       <list key="text_directories">
         <parameter key="input" value="C:\Users\Administrator\Documents\Studium\Bachelorarbeit\RapidMiner\Test\Quelle"/>
       <process expanded="true">
         <operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize" width="90" x="179" y="120">
           <parameter key="mode" value="specify characters"/>
           <parameter key="characters" value=".:?!"/>
         <operator activated="true" class="text:filter_tokens_by_regions" compatibility="5.3.002" expanded="true" height="60" name="Filter Tokens (by Region)" width="90" x="380" y="120">
           <parameter key="string" value="Programm"/>
           <parameter key="regular_expression" value="Programm"/>
         <connect from_port="document" to_op="Tokenize" to_port="document"/>
         <connect from_op="Tokenize" from_port="document" to_op="Filter Tokens (by Region)" to_port="document"/>
         <connect from_op="Filter Tokens (by Region)" from_port="document" to_port="document 1"/>
         <portSpacing port="source_document" spacing="0"/>
         <portSpacing port="sink_document 1" spacing="0"/>
         <portSpacing port="sink_document 2" spacing="0"/>
     <operator activated="true" class="write_excel" compatibility="5.3.015" expanded="true" height="76" name="Write Excel" width="90" x="380" y="165">
       <parameter key="excel_file" value="C:\Users\Administrator\Documents\Studium\Bachelorarbeit\RapidMiner\Test\Ergebnis.xlsx"/>
       <parameter key="file_format" value="xlsx"/>
       <parameter key="sheet_name" value="RapidMiner Data0"/>
     <connect from_op="Process Documents from Files" from_port="example set" to_op="Write Excel" to_port="input"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
[ /code]


  • Options
    KallustKallust Member Posts: 4 Contributor I
    I figured out the solution myself, problem was that I searched for "equal" and "contains" at the same time. With sentences as token I could never find a single word because it didn't equal the whole sentence.
Sign In or Register to comment.