Summary of texts in RapidMiner

m_keshavarz_comm_keshavarz_com Member Posts: 28 Contributor I
edited November 2018 in Help
Hello
I searched the forum but did not get the desired result
So if my question is repetitive. Sorry
I want to summarize my articles and then I can analyze them
But I do not know how to summarize in the RapidMiner program?
is this possible
I know the aylian package has emotional analysis.
But I do not know how to summarize?
Can anyone help me?
Or should I use Python or R? Is it possible to make a simple example?

Thank you

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @m_keshavarz_com,

     

    First, my intimate conviction is that summarize a text is feasible with RapidMiner's native operators (I think with Tokenize / non letters and Tokenize / linguistic sentences...).

    But waiting, I propose you a Python script using the NLTK library.

    Many things to know to execute this process : 

    1. Install Python on your computer.

    2. Install NLTK on your computer.(pip install nltk)

    3. Download and install the necessary packages of NLTK (stopwords etc.) : For this uncomment and execute these 

    2 lines of code in the Execute Python operator : 

    Summarize_NLTK.png

    After successfully installing these packages, you need to comment again these 2 lines of code.

     

    4. Set your "text attribute" (with quotes) and your "sum up ratio" in the Set Macros's parameters : 

    Summarize_NLTK_2.png

    Note : To have an idea, you can vary the "sum up ratio" between 0.1 (very short sum up) and 10 (very long sum up ~ original text).

     

    To have an idea, here the result with a "sum up ratio" of 1 : 

    Summarize_NLTK_3.png

     

    I hope it helps,

     

    Regards,

     

    Lionel

     

     

     

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi,

     

    I forgot to share the process :catvery-happy:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA4">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.000-BETA4" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
    <parameter key="text" value="As its name states, EETS was begun as a club, and it retains certain features of that even now. It has no physical location, or even office, no paid staff or editors, but books in the Original Series are published in the first place to satisfy subscriptions paid by individuals or institutions. This means that there is need for a regular sequence of new editions, normally one or two per year; achieving that sequence can pose problems for the Editorial Secretary, who may have too few or too many texts ready for publication at any one time."/>
    </operator>
    <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data" width="90" x="246" y="34">
    <parameter key="text_attribute" value="Text"/>
    <parameter key="label_attribute" value="Text"/>
    </operator>
    <operator activated="true" class="set_macros" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Set Macros" width="90" x="380" y="34">
    <list key="macros">
    <parameter key="textAttribute" value="'Text'"/>
    <parameter key="ratio" value="0.9"/>
    </list>
    </operator>
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="514" y="34">
    <parameter key="script" value="import pandas as pd&#10;import nltk&#10;#nltk.download(&quot;stopwords&quot;)&#10;#nltk.download('punkt')&#10;from nltk.corpus import stopwords&#10;from nltk.tokenize import word_tokenize, sent_tokenize&#10;&#10;textAtt = %{textAttribute}&#10;rat = %{ratio}&#10;&#10;######################################&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def summarize(text):&#10;&#10; stopWords = set(stopwords.words(&quot;english&quot;))&#10; words = word_tokenize(text)&#10;&#10; freqTable = dict()&#10; for word in words:&#10; word = word.lower()&#10; if word in stopWords:&#10; continue&#10; if word in freqTable:&#10; freqTable[word] += 1&#10; else:&#10; freqTable[word] = 1&#10;&#10; sentences = sent_tokenize(text)&#10; sentenceValue = dict()&#10;&#10; for sentence in sentences:&#10; for word, wordOccurence in freqTable.items(): &#10; if word in sentence.lower(): &#10; if sentence in sentenceValue: &#10; sentenceValue[sentence] += wordOccurence &#10; #print(str(wordValue) + '/' +str(word))&#10; &#10; else:&#10; &#10; sentenceValue[sentence] = wordOccurence&#10; #print(str(wordValue) + '/' +str(word))&#10; &#10;&#10; sumValues = 0&#10; for sentence in sentenceValue:&#10; sumValues += sentenceValue[sentence]&#10; &#10; # Average value of a sentence from original text&#10; average = int(sumValues/ len(sentenceValue))&#10;&#10; summary = ''&#10; for sentence in sentences:&#10; if sentence[:] in sentenceValue and sentenceValue[sentence[:]] &gt; ((1/rat) * average):&#10; summary = summary + &quot; &quot; + sentence&#10;&#10; return summary&#10;&#10;########################################&#10;&#10;def rm_main(data):&#10;&#10; data['Summarize'] = data[textAtt].apply(summarize)&#10; &#10; return data"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
    <connect from_op="Documents to Data" from_port="example set" to_op="Set Macros" to_port="through 1"/>
    <connect from_op="Set Macros" from_port="through 1" to_op="Execute Python" to_port="input 1"/>
    <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Regards,

     

    Lionel

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @m_keshavarz_com,

     

    I have a good new and a bad new : 

    the good new is that summarize a text is theoretically possible with RapidMiner's native operators

    the bad new is that the resulting sentences of the sum up are in the mess.

     

    Here the process :

     

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA4">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.000-BETA4" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="85">
    <parameter key="text" value="my taylor is the factor and the handworker. goddess bless america and europa. "/>
    </operator>
    <operator activated="true" class="set_macros" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Set Macros" width="90" x="179" y="85">
    <list key="macros">
    <parameter key="textAttribute" value="'Text'"/>
    <parameter key="ratio" value="2"/>
    </list>
    </operator>
    <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data" width="90" x="313" y="85">
    <parameter key="text_attribute" value="Text"/>
    <parameter key="label_attribute" value="Text"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="9.0.000-BETA4" expanded="true" height="103" name="Multiply (4)" width="90" x="447" y="85"/>
    <operator activated="true" class="subprocess" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Processing" width="90" x="581" y="34">
    <process expanded="true">
    <operator activated="false" class="numerical_to_polynominal" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="1117" y="34"/>
    <operator activated="false" class="concurrency:loop_values" compatibility="9.0.000-BETA4" expanded="true" height="103" name="Loop Values" width="90" x="1184" y="442">
    <parameter key="attribute" value="wordOccurence"/>
    <process expanded="true">
    <operator activated="true" class="filter_example_range" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Filter Example Range" width="90" x="45" y="34">
    <parameter key="first_example" value="1"/>
    <parameter key="last_example" value="1"/>
    </operator>
    <operator activated="true" class="branch" compatibility="9.0.000-BETA4" expanded="true" height="103" name="Branch" width="90" x="246" y="34">
    <parameter key="condition_type" value="expression"/>
    <parameter key="condition_value" value="%{loop_value} &gt; average"/>
    <parameter key="expression" value="eval(%{loop_value}) &gt;= eval(%{average})"/>
    <process expanded="true">
    <operator activated="true" breakpoints="before,after" class="generate_attributes" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="85">
    <list key="function_descriptions">
    <parameter key="summarize" value="Text"/>
    </list>
    </operator>
    <connect from_port="input 1" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="input 2"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <portSpacing port="sink_input 3" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" breakpoints="after" class="generate_attributes" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="179" y="85">
    <list key="function_descriptions">
    <parameter key="summarize" value="&quot;&quot;"/>
    </list>
    </operator>
    <connect from_port="condition" to_port="input 1"/>
    <connect from_port="input 1" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="input 2"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    <portSpacing port="sink_input 3" spacing="0"/>
    </process>
    </operator>
    <connect from_port="input 1" to_op="Filter Example Range" to_port="example set input"/>
    <connect from_op="Filter Example Range" from_port="example set output" to_op="Branch" to_port="input 1"/>
    <connect from_op="Branch" from_port="input 1" to_port="output 1"/>
    <connect from_op="Branch" from_port="input 2" to_port="output 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <portSpacing port="sink_output 3" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="246" y="187">
    <parameter key="vector_creation" value="Term Occurrences"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
    <parameter key="mode" value="linguistic sentences"/>
    </operator>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="generate_aggregation" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate Aggregation (3)" width="90" x="380" y="187">
    <parameter key="attribute_name" value="sentenceCount"/>
    </operator>
    <operator activated="true" class="concurrency:loop_attributes" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Loop Attributes" width="90" x="581" y="238">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document (2)" width="90" x="45" y="85">
    <parameter key="text" value="%{loop_attribute}"/>
    </operator>
    <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data (2)" width="90" x="179" y="85">
    <parameter key="text_attribute" value="Text"/>
    <parameter key="label_attribute" value="Text"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="9.0.000-BETA4" expanded="true" height="103" name="Multiply (2)" width="90" x="380" y="85"/>
    <operator activated="true" class="generate_id" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate ID" width="90" x="514" y="136"/>
    <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (3)" width="90" x="514" y="34">
    <parameter key="vector_creation" value="Term Occurrences"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize (3)" width="90" x="246" y="34"/>
    <operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="447" y="34"/>
    <connect from_port="document" to_op="Tokenize (3)" to_port="document"/>
    <connect from_op="Tokenize (3)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
    <connect from_op="Filter Stopwords (2)" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="generate_aggregation" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate Aggregation" width="90" x="648" y="34">
    <parameter key="attribute_name" value="wordOccurence"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate ID (2)" width="90" x="782" y="85"/>
    <operator activated="true" class="concurrency:join" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Join" width="90" x="916" y="85">
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="id" value="id"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Select Attributes" width="90" x="1050" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="wordOccurence|Text"/>
    </operator>
    <connect from_op="Create Document (2)" from_port="output" to_op="Documents to Data (2)" to_port="documents 1"/>
    <connect from_op="Documents to Data (2)" from_port="example set" to_op="Multiply (2)" to_port="input"/>
    <connect from_op="Multiply (2)" from_port="output 1" to_op="Process Documents from Data (3)" to_port="example set"/>
    <connect from_op="Multiply (2)" from_port="output 2" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Process Documents from Data (3)" from_port="example set" to_op="Generate Aggregation" to_port="example set input"/>
    <connect from_op="Generate Aggregation" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
    <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Append" width="90" x="715" y="238"/>
    <operator activated="true" class="extract_macro" compatibility="9.0.000-BETA4" expanded="true" height="68" name="Extract Macro (2)" width="90" x="581" y="136">
    <parameter key="macro" value="sentenceCount"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="sentenceCount"/>
    <parameter key="example_index" value="1"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="9.0.000-BETA4" expanded="true" height="103" name="Multiply" width="90" x="849" y="187"/>
    <operator activated="true" class="aggregate" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Aggregate" width="90" x="983" y="289">
    <list key="aggregation_attributes">
    <parameter key="wordOccurence" value="sum"/>
    </list>
    </operator>
    <operator activated="true" class="rename" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Rename" width="90" x="1117" y="289">
    <parameter key="old_name" value="sum(wordOccurence)"/>
    <parameter key="new_name" value="totalWordCount"/>
    <list key="rename_additional_attributes"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="9.0.000-BETA4" expanded="true" height="68" name="Extract Macro" width="90" x="1251" y="289">
    <parameter key="macro" value="totalWordOccurence"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="totalWordCount"/>
    <parameter key="example_index" value="1"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="generate_macro" compatibility="9.0.000-BETA4" expanded="true" height="103" name="Generate Macro" width="90" x="983" y="136">
    <list key="function_descriptions">
    <parameter key="average" value="floor(eval(%{totalWordOccurence})/eval(%{sentenceCount}))"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="9.0.000-BETA4" expanded="true" height="103" name="Filter Examples" width="90" x="1117" y="136">
    <parameter key="parameter_expression" value="wordOccurence &gt; (1/eval(%{ratio}))*eval(%{average})"/>
    <parameter key="condition_class" value="expression"/>
    <list key="filters_list"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Select Attributes (2)" width="90" x="1251" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <connect from_port="in 1" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_op="Generate Aggregation (3)" to_port="example set input"/>
    <connect from_op="Generate Aggregation (3)" from_port="example set output" to_op="Extract Macro (2)" to_port="example set"/>
    <connect from_op="Generate Aggregation (3)" from_port="original" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Loop Attributes" from_port="output 1" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_op="Multiply" to_port="input"/>
    <connect from_op="Extract Macro (2)" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Generate Macro" to_port="through 2"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Rename" to_port="example set input"/>
    <connect from_op="Rename" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Generate Macro" from_port="through 2" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="generate_id" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate ID (4)" width="90" x="581" y="136"/>
    <operator activated="true" class="generate_id" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate ID (3)" width="90" x="715" y="34"/>
    <operator activated="true" class="transpose" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Transpose" width="90" x="849" y="34"/>
    <operator activated="true" class="generate_aggregation" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate Aggregation (2)" width="90" x="1050" y="85">
    <parameter key="attribute_name" value="Summarize_concat"/>
    <parameter key="aggregation_function" value="concatenation"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Multiply (3)" width="90" x="1184" y="85"/>
    <operator activated="true" class="generate_id" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate ID (5)" width="90" x="1318" y="85"/>
    <operator activated="true" class="concurrency:join" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Join (2)" width="90" x="1452" y="85">
    <list key="key_attributes"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Select Attributes (3)" width="90" x="1586" y="85">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="att.*"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Set Macros" to_port="through 1"/>
    <connect from_op="Set Macros" from_port="through 1" to_op="Documents to Data" to_port="documents 1"/>
    <connect from_op="Documents to Data" from_port="example set" to_op="Multiply (4)" to_port="input"/>
    <connect from_op="Multiply (4)" from_port="output 1" to_op="Processing" to_port="in 1"/>
    <connect from_op="Multiply (4)" from_port="output 2" to_op="Generate ID (4)" to_port="example set input"/>
    <connect from_op="Processing" from_port="out 1" to_op="Generate ID (3)" to_port="example set input"/>
    <connect from_op="Generate ID (4)" from_port="example set output" to_op="Join (2)" to_port="left"/>
    <connect from_op="Generate ID (3)" from_port="example set output" to_op="Transpose" to_port="example set input"/>
    <connect from_op="Transpose" from_port="example set output" to_op="Generate Aggregation (2)" to_port="example set input"/>
    <connect from_op="Generate Aggregation (2)" from_port="example set output" to_op="Multiply (3)" to_port="input"/>
    <connect from_op="Multiply (3)" from_port="output 1" to_op="Generate ID (5)" to_port="example set input"/>
    <connect from_op="Generate ID (5)" from_port="example set output" to_op="Join (2)" to_port="right"/>
    <connect from_op="Join (2)" from_port="join" to_op="Select Attributes (3)" to_port="example set input"/>
    <connect from_op="Select Attributes (3)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    You can adapt it to work with Twitter

     

    Regards,

     

    Lionel

     

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Dear all,

     

    In my previous post, I shared a process to "sum up" a text using only RapidMiner's operators.

    The sentences of the resulting sum up are in the mess ... (not in the same order than in the original text which is...unfortunate)

    After investigation(s), "guilty party" is the Process Documents to Data operator (associated to Tokenize (linguistic sentences)).

    Indeed, after processing this operator ranks the "sentences attributes" by alphabetical order and so the original order is lost.

     

    So my question is : Is there a way to conserve the original order of the sentences, in other words can these two operators render the results (the "sentences attributes") in the same order as the original text ?

     

    Thanks you for your answers.

     

    Regards,

     

    Lionel

     

    NB : I'm, of course, listening to an alternative way to "sum up" a text...

    NB2 : If needed, the process is shared in my previous post.

     

     

Sign In or Register to comment.