Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Remove URL from document
Hello,
I have a problem with my text pre processing. Maybe anyone can help me
My text looks like this:
T-Mobile US Inc. and two regional carriers, General Communication Inc. in Alaska and CT Cube LP in Texas. The order is subject to review by President Barack Obama.
Commodities
Oil futures rose 67 cents to $93.98 a barrel as U.S. crude supplies dropped, while gold for August delivery climbed $8 to $1,405 an ounce.
Europe
European markets finished sharply lower today with shares in London leading the region. The FTSE 100 was down 2.12% while France's CAC 40 was off 1.87% and Germany's DAX fell lower by 1.20%.
[1]: http://www.proactiveinvestors.com/companies/overview/2245/Salesforce.com [2]: http://www.proactiveinvestors.comcompanies/overview/2245/salesforcecom--2245.html [3]: http://www.proactiveinvestors.com/companies/overview/2397/Goldman+Sachs [4]: http://www.proactiveinvestors.comcompanies/overview/3787/general-motors-company--3787.html [5]: http://www.proactiveinvestors.com/companies/overview/1189/Dell [6]: http://www.proactiveinvestors.comcompanies/overview/1189/dell-1189.html [7]: http://www.proactiveinvestors.com/companies/overview/1189/Dell [8]: http://www.proactiveinvestors.com/companies/overview/2306/Apple [9]: http://www.proactiveinvestors.comcompanies/overview/2306/apple-2306.html [10]: http://www.proactiveinvestors.com/companies/overview/4450/Samsung+Electronics [11]: http://www.proactiveinvestors.com/companies/overview/2306/Apple [12]:
I want to remove the URLs from the text. How can I do this?I think filter tokens does not work?! Is the solution Remove Document parts?
I think the solution should look like this rule: if the word starts with http. or www. then delete the word from the text..... (but only the url of the text)
Kind regards
I have a problem with my text pre processing. Maybe anyone can help me
My text looks like this:
T-Mobile US Inc. and two regional carriers, General Communication Inc. in Alaska and CT Cube LP in Texas. The order is subject to review by President Barack Obama.
Commodities
Oil futures rose 67 cents to $93.98 a barrel as U.S. crude supplies dropped, while gold for August delivery climbed $8 to $1,405 an ounce.
Europe
European markets finished sharply lower today with shares in London leading the region. The FTSE 100 was down 2.12% while France's CAC 40 was off 1.87% and Germany's DAX fell lower by 1.20%.
[1]: http://www.proactiveinvestors.com/companies/overview/2245/Salesforce.com [2]: http://www.proactiveinvestors.comcompanies/overview/2245/salesforcecom--2245.html [3]: http://www.proactiveinvestors.com/companies/overview/2397/Goldman+Sachs [4]: http://www.proactiveinvestors.comcompanies/overview/3787/general-motors-company--3787.html [5]: http://www.proactiveinvestors.com/companies/overview/1189/Dell [6]: http://www.proactiveinvestors.comcompanies/overview/1189/dell-1189.html [7]: http://www.proactiveinvestors.com/companies/overview/1189/Dell [8]: http://www.proactiveinvestors.com/companies/overview/2306/Apple [9]: http://www.proactiveinvestors.comcompanies/overview/2306/apple-2306.html [10]: http://www.proactiveinvestors.com/companies/overview/4450/Samsung+Electronics [11]: http://www.proactiveinvestors.com/companies/overview/2306/Apple [12]:
I want to remove the URLs from the text. How can I do this?I think filter tokens does not work?! Is the solution Remove Document parts?
I think the solution should look like this rule: if the word starts with http. or www. then delete the word from the text..... (but only the url of the text)
Kind regards
Tagged:
0
Answers
depending on your setup you might use "Replace" (for example sets) or "Replace Tokens" (for tokenized documents) and use a regex like this: \[\d*\][^\[\]]* to identify all url links from your text input.
Cheers,
Helge
To sovle the problem, my setup only consists of process document from files and then I tried replace for example sets.
I dont tokenize in my setup. (If I tokenize a URL like www.helpme.com , I would have www help me com. So If I search for www , I cannot delete the complete URL.
Thank you for comments
You don't need to. Please have a look: Cheers,
Helge
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.0.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="30">
<list key="text_directories">
<parameter key="test" value="C:\Users\chris_000\Desktop\Master Doks\Arbeitsstand\Dictionary\General Inquirer\Beispiele"/>
</list>
<parameter key="use_file_extension_as_type" value="false"/>
<parameter key="create_word_vector" value="false"/>
<parameter key="keep_text" value="true"/>
<process expanded="true">
<connect from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="6.0.003" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
<parameter key="attribute_name" value="text"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="multiply" compatibility="6.0.003" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files (2)" width="90" x="45" y="255">
<list key="text_directories">
<parameter key="positive" value="C:\Users\chris_000\Desktop\Master Doks\Arbeitsstand\Dictionary\General Inquirer\Positive"/>
</list>
<parameter key="vector_creation" value="Binary Term Occurrences"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize" width="90" x="313" y="30"/>
<operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (2)" width="90" x="447" y="30"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Stem (2)" to_port="document"/>
<connect from_op="Stem (2)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Data" width="90" x="179" y="255">
<parameter key="vector_creation" value="Term Occurrences"/>
<parameter key="keep_text" value="true"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize (2)" width="90" x="179" y="30"/>
<operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (Porter)" width="90" x="447" y="30"/>
<connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
<connect from_op="Tokenize (2)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
<connect from_op="Stem (Porter)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="generate_aggregation" compatibility="6.0.003" expanded="true" height="76" name="Generate Aggregation" width="90" x="313" y="255">
<parameter key="attribute_name" value="positive"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="6.0.003" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="255">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="metadata_path|text|positive|label|metadata_date|metadata_file"/>
</operator>
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files (3)" width="90" x="45" y="345">
<list key="text_directories">
<parameter key="negative" value="C:\Users\chris_000\Desktop\Master Doks\Arbeitsstand\Dictionary\General Inquirer\Negative"/>
</list>
<parameter key="vector_creation" value="Binary Term Occurrences"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize (3)" width="90" x="180" y="30"/>
<operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (3)" width="90" x="416" y="30"/>
<connect from_port="document" to_op="Tokenize (3)" to_port="document"/>
<connect from_op="Tokenize (3)" from_port="document" to_op="Stem (3)" to_port="document"/>
<connect from_op="Stem (3)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Data (2)" width="90" x="179" y="345">
<parameter key="vector_creation" value="Term Occurrences"/>
<parameter key="keep_text" value="true"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize (4)" width="90" x="180" y="30"/>
<operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (4)" width="90" x="484" y="30"/>
<connect from_port="document" to_op="Tokenize (4)" to_port="document"/>
<connect from_op="Tokenize (4)" from_port="document" to_op="Stem (4)" to_port="document"/>
<connect from_op="Stem (4)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="generate_aggregation" compatibility="6.0.003" expanded="true" height="76" name="Generate Aggregation (2)" width="90" x="313" y="345">
<parameter key="attribute_name" value="negative"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="6.0.003" expanded="true" height="76" name="Select Attributes (2)" width="90" x="447" y="345">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="metadata_path|text|negative|label|metadata_date|metadata_file"/>
</operator>
<operator activated="true" class="join" compatibility="6.0.003" expanded="true" height="76" name="Join" width="90" x="581" y="300">
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="metadata_path" value="metadata_path"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="6.0.003" expanded="true" height="76" name="Generate Attributes" width="90" x="715" y="300">
<list key="function_descriptions">
<parameter key="Sentiment" value="(positive-negative)/(positive+negative)"/>
</list>
</operator>
<operator activated="true" class="write_excel" compatibility="6.0.003" expanded="true" height="76" name="Write Excel" width="90" x="715" y="435">
<parameter key="excel_file" value="C:\Users\chris_000\Desktop\Output.xls"/>
<parameter key="file_format" value="xlsx"/>
<parameter key="sheet_name" value="RapidMiner Test"/>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Process Documents from Data (2)" to_port="example set"/>
<connect from_op="Process Documents from Files (2)" from_port="word list" to_op="Process Documents from Data" to_port="word list"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Generate Aggregation" to_port="example set input"/>
<connect from_op="Generate Aggregation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Process Documents from Files (3)" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
<connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Generate Aggregation (2)" to_port="example set input"/>
<connect from_op="Generate Aggregation (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Write Excel" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
But I have still the problem to implement your help into my process, because I have other operators. The load text doesn´t work for my setup beacuse I have lots of texts.
The purpose of my process is to find "positive" or "negative" words in text documents (.txt)
Another question: if I want to filter out any documents with the term "via twitter. How can I do this? I tried filter examples. My Setup only works for one word, but not for two or a term.
Best
here is your process with an alternative input chain in it showing you how to attach the filter techniques. It is useful not to convert your data to the example set format too early as this limits your options to filter and replace tokens or documents. Cheers,
Helge
I tried to add other text filters in this setup. So I want to filter texts with: GROK-126315. How can I implement this? I tried some ways (multiply, the same process again) but it doesn´t work up to now. [ Update: I found a basic way to solve this)
Besides my implemention of the new input chain doesn´t work
can you send me a process or provide more information regarding your current issues? If you want to filter more tokens just add another filter to your process (like the one for twitter). You may also set some regular expressions with your filters to reduce the amount of operators.
Cheers,
Helge
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.0.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files (2)" width="90" x="45" y="255">
<list key="text_directories">
<parameter key="positive" value="C:\Users\chris_000\Desktop\Master Doks\Arbeitsstand\Dictionary\General Inquirer\Positive"/>
</list>
<parameter key="vector_creation" value="Binary Term Occurrences"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize" width="90" x="313" y="30"/>
<operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (2)" width="90" x="447" y="30"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Stem (2)" to_port="document"/>
<connect from_op="Stem (2)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files (3)" width="90" x="45" y="345">
<list key="text_directories">
<parameter key="negative" value="C:\Users\chris_000\Desktop\Master Doks\Arbeitsstand\Dictionary\General Inquirer\Negative"/>
</list>
<parameter key="vector_creation" value="Binary Term Occurrences"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize (3)" width="90" x="180" y="30"/>
<operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (3)" width="90" x="416" y="30"/>
<connect from_port="document" to_op="Tokenize (3)" to_port="document"/>
<connect from_op="Tokenize (3)" from_port="document" to_op="Stem (3)" to_port="document"/>
<connect from_op="Stem (3)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="loop_files" compatibility="6.0.003" expanded="true" height="76" name="Loop Files" width="90" x="45" y="30">
<parameter key="directory" value="C:\Users\chris_000\Desktop\Master Doks\Arbeitsstand\Textdaten\Pre\Apple\Split"/>
<process expanded="true">
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:filter_documents_by_content" compatibility="5.3.002" expanded="true" height="76" name="Filter Documents (by Content)" width="90" x="45" y="75">
<parameter key="string" value="via twitter"/>
<parameter key="invert condition" value="true"/>
</operator>
<operator activated="true" class="text:filter_documents_by_content" compatibility="5.3.002" expanded="true" height="76" name="Filter Documents (2)" width="90" x="45" y="120">
<parameter key="string" value="via twitter"/>
<parameter key="invert condition" value="true"/>
</operator>
<operator activated="true" class="text:process_documents" compatibility="5.3.002" expanded="true" height="94" name="Process Documents" width="90" x="246" y="30">
<parameter key="keep_text" value="true"/>
<process expanded="true">
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="6.0.003" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
<parameter key="attribute_name" value="text"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="multiply" compatibility="6.0.003" expanded="true" height="94" name="Multiply" width="90" x="581" y="30"/>
<operator activated="true" class="text:process_document_from_data" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Data" width="90" x="179" y="255">
<parameter key="vector_creation" value="Term Occurrences"/>
<parameter key="keep_text" value="true"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize (2)" width="90" x="179" y="30"/>
<operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (Porter)" width="90" x="447" y="30"/>
<connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
<connect from_op="Tokenize (2)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
<connect from_op="Stem (Porter)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="generate_aggregation" compatibility="6.0.003" expanded="true" height="76" name="Generate Aggregation" width="90" x="313" y="255">
<parameter key="attribute_name" value="positive"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="6.0.003" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="255">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="metadata_path|text|positive|label|metadata_date|metadata_file"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Data (2)" width="90" x="179" y="345">
<parameter key="vector_creation" value="Term Occurrences"/>
<parameter key="keep_text" value="true"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize (4)" width="90" x="180" y="30"/>
<operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (4)" width="90" x="484" y="30"/>
<connect from_port="document" to_op="Tokenize (4)" to_port="document"/>
<connect from_op="Tokenize (4)" from_port="document" to_op="Stem (4)" to_port="document"/>
<connect from_op="Stem (4)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="generate_aggregation" compatibility="6.0.003" expanded="true" height="76" name="Generate Aggregation (2)" width="90" x="313" y="345">
<parameter key="attribute_name" value="negative"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="6.0.003" expanded="true" height="76" name="Select Attributes (2)" width="90" x="447" y="345">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="metadata_path|text|negative|label|metadata_date|metadata_file"/>
</operator>
<operator activated="true" class="join" compatibility="6.0.003" expanded="true" height="76" name="Join" width="90" x="581" y="300">
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="metadata_path" value="metadata_path"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="6.0.003" expanded="true" height="76" name="Generate Attributes" width="90" x="715" y="300">
<list key="function_descriptions">
<parameter key="Sentiment" value="(positive-negative)/(positive+negative)"/>
</list>
</operator>
<operator activated="true" class="write_excel" compatibility="6.0.003" expanded="true" height="76" name="Write Excel" width="90" x="715" y="435">
<parameter key="excel_file" value="C:\Users\chris_000\Desktop\Output.xls"/>
<parameter key="file_format" value="xlsx"/>
<parameter key="sheet_name" value="RapidMiner Test"/>
</operator>
<connect from_op="Process Documents from Files (2)" from_port="word list" to_op="Process Documents from Data" to_port="word list"/>
<connect from_op="Process Documents from Files (3)" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
<connect from_op="Loop Files" from_port="out 1" to_op="Filter Documents (by Content)" to_port="documents 1"/>
<connect from_op="Filter Documents (by Content)" from_port="documents" to_op="Filter Documents (2)" to_port="documents 1"/>
<connect from_op="Filter Documents (2)" from_port="documents" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="example set" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Process Documents from Data (2)" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Generate Aggregation" to_port="example set input"/>
<connect from_op="Generate Aggregation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Generate Aggregation (2)" to_port="example set input"/>
<connect from_op="Generate Aggregation (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Write Excel" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
The problem is to implement the input chain for the text data. I build the setup but it has an error and I don´t really know how to fix it.
Helge, thanks a lot for your whole support.
Happy Mining!