Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
[Solved] Problem with Filter Tokens (by Region)
Hi,
I'm currently working on my bachelor thesis in which I examine how companies report about Research and Development in their annual reports.
Unfortunately I'm completly unexperienced with rapidminer.
My current approach is as follows:
1. Load annual reports in rapid miner (I have them as .txt files)
2. Tokenize the reports into sentences, using .:?! as charakters
3.Use the Filter Tokens (by Region) to extract the sentence before and after a sentence containing a keyword (tokens before = 1 and tokens after = 1).
4. Save in an .xlsx file for further editing.
1. and 2. work as they should my problem is with 3.
I don't geht any results out of the Region filter. It just seems to do nothing. It also forces me to insert both a string and a regular expression, the differnce isn't clear to me (don't know if this is important). .I think this guy had a similar problem http://rapid-i.com/rapidforum/index.php/topic,6021.0.html
I used an online XML validator and it didn't find an error.
Hopefully you could help me with my problem.
my code looks as follows:
I'm currently working on my bachelor thesis in which I examine how companies report about Research and Development in their annual reports.
Unfortunately I'm completly unexperienced with rapidminer.
My current approach is as follows:
1. Load annual reports in rapid miner (I have them as .txt files)
2. Tokenize the reports into sentences, using .:?! as charakters
3.Use the Filter Tokens (by Region) to extract the sentence before and after a sentence containing a keyword (tokens before = 1 and tokens after = 1).
4. Save in an .xlsx file for further editing.
1. and 2. work as they should my problem is with 3.
I don't geht any results out of the Region filter. It just seems to do nothing. It also forces me to insert both a string and a regular expression, the differnce isn't clear to me (don't know if this is important). .I think this guy had a similar problem http://rapid-i.com/rapidforum/index.php/topic,6021.0.html
I used an online XML validator and it didn't find an error.
Hopefully you could help me with my problem.
my code looks as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="179" y="120">
<list key="text_directories">
<parameter key="input" value="C:\Users\Administrator\Documents\Studium\Bachelorarbeit\RapidMiner\Test\Quelle"/>
</list>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize" width="90" x="179" y="120">
<parameter key="mode" value="specify characters"/>
<parameter key="characters" value=".:?!"/>
</operator>
<operator activated="true" class="text:filter_tokens_by_regions" compatibility="5.3.002" expanded="true" height="60" name="Filter Tokens (by Region)" width="90" x="380" y="120">
<parameter key="string" value="Programm"/>
<parameter key="regular_expression" value="Programm"/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Tokens (by Region)" to_port="document"/>
<connect from_op="Filter Tokens (by Region)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="write_excel" compatibility="5.3.015" expanded="true" height="76" name="Write Excel" width="90" x="380" y="165">
<parameter key="excel_file" value="C:\Users\Administrator\Documents\Studium\Bachelorarbeit\RapidMiner\Test\Ergebnis.xlsx"/>
<parameter key="file_format" value="xlsx"/>
<parameter key="sheet_name" value="RapidMiner Data0"/>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Write Excel" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
[ /code]
0
Answers