Options

N-grams do not sort in ascending/descending order

Vincent_de_VrieVincent_de_Vrie Member Posts: 3 Contributor I
edited November 2018 in Help

Hello,

 

I have got an exampleset that was converted to a wordslist with the process documents from data operator to count the term and document occurrences of my dataset. I also have a duiplicate of this process that generates n-grams. Now, the actual wordlist that has no n-grams can be sorted in ascending/descending order when clicking on the attribute column like "term occurrences"  or "document occurrences", however this does not work on the result that contains the n-grams. Does anyone know why the results with the n-grams cannot be sorted?

 

 

Answers

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @Vincent_de_Vrie You're making it really hard to visualize. Can you post screenshots, your process XML, and sample data?

  • Options
    Vincent_de_VrieVincent_de_Vrie Member Posts: 3 Contributor I

    Hello Thomas,

     

    Thanks for your response. I am basically trying to use a very simple function of the rapidminer results tab, which is clicking on the header of an attribute to make it sort in descending/ascending order. In this case I try to sort an amount of bigrams I created (see xml for process). However, clicking on the header of the attributes (see screenshots) does not do anything at all. I am wondering why this is, because now I have to export my exampleset to an excel and use the excel "smallest to largest" function to sort the bigrams.

     

    I hope this clarifies some things. I am sorry but I cant provide you with the sample data since it is classified, but the data is actually just text.

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @Vincent_de_Vrie I ran a modified version of this process using a Search Twitter operator and being able to sort Wordlist coming out of the WordList to Data operator works for me. Are you running a Mac? Might be a Mac thing since I'm on Windows. 

     

    Also, you can save the effort of saving to an XLS to sort. You can just use the Sort operator and select which attribute you want to sort.

     

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="false" class="read_excel" compatibility="8.1.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
    <parameter key="excel_file" value="C:\Users\vriesv\Documents\afstuderen\Library\Databases\COMPLETE_no_dup.xlsx"/>
    <parameter key="sheet_number" value="2"/>
    <parameter key="imported_cell_range" value="A1:BY18209"/>
    <parameter key="encoding" value="SYSTEM"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="KEY.true.polynominal.attribute"/>
    <parameter key="1" value="ECCAIRSNUMBER.true.integer.attribute"/>
    <parameter key="2" value="RESPONSIBLEENTITY.true.integer.attribute"/>
    <parameter key="3" value="File number.true.integer.attribute"/>
    <parameter key="4" value="Responsible entity.true.polynominal.attribute"/>
    <parameter key="5" value="Report identification.true.polynominal.attribute"/>
    <parameter key="6" value="Local date.true.polynominal.attribute"/>
    <parameter key="7" value="UTC date.true.polynominal.attribute"/>
    <parameter key="8" value="Local time.true.polynominal.attribute"/>
    <parameter key="9" value="UTC time.true.polynominal.attribute"/>
    <parameter key="10" value="Occurrence class.true.polynominal.attribute"/>
    <parameter key="11" value="Occurrence category.true.polynominal.attribute"/>
    <parameter key="12" value="Headline.true.polynominal.attribute"/>
    <parameter key="13" value="Narrative text.true.polynominal.attribute"/>
    <parameter key="14" value="Narrative text (2).true.polynominal.attribute"/>
    <parameter key="15" value="Narrative text (3).true.attribute_value.attribute"/>
    <parameter key="16" value="Narrative text (4).true.polynominal.attribute"/>
    <parameter key="17" value="Narrative text (5).true.attribute_value.attribute"/>
    <parameter key="18" value="Event type.true.polynominal.attribute"/>
    <parameter key="19" value="Event type [Level 1].true.polynominal.attribute"/>
    <parameter key="20" value="Event type [Level 2].true.polynominal.attribute"/>
    <parameter key="21" value="Event type [Level 3].true.polynominal.attribute"/>
    <parameter key="22" value="Event type [Level 4].true.polynominal.attribute"/>
    <parameter key="23" value="Event type [Level 5].true.attribute_value.attribute"/>
    <parameter key="24" value="Event justification.true.attribute_value.attribute"/>
    <parameter key="25" value="Phase [Level 1: Aircraft Category].true.polynominal.attribute"/>
    <parameter key="26" value="Phase [Level 2: Phase of Flight].true.polynominal.attribute"/>
    <parameter key="27" value="Phase [Level 3: Sub-Phases].true.polynominal.attribute"/>
    <parameter key="28" value="Descr factor subject.true.polynominal.attribute"/>
    <parameter key="29" value="Descr modifier.true.polynominal.attribute"/>
    <parameter key="30" value="Expl factor subject.true.attribute_value.attribute"/>
    <parameter key="31" value="Expl factor modifier.true.attribute_value.attribute"/>
    <parameter key="32" value="Phase.true.polynominal.attribute"/>
    <parameter key="33" value="Flight phase.true.polynominal.attribute"/>
    <parameter key="34" value="State/area of occ.true.polynominal.attribute"/>
    <parameter key="35" value="Location name.true.polynominal.attribute"/>
    <parameter key="36" value="Latitude of occ.true.attribute_value.attribute"/>
    <parameter key="37" value="Longitude of occ.true.attribute_value.attribute"/>
    <parameter key="38" value="Location indicator.true.polynominal.attribute"/>
    <parameter key="39" value="Aircraft category.true.polynominal.attribute"/>
    <parameter key="40" value="Aircraft registration.true.polynominal.attribute"/>
    <parameter key="41" value="Mass group.true.polynominal.attribute"/>
    <parameter key="42" value="State of registry.true.polynominal.attribute"/>
    <parameter key="43" value="Propulsion type.true.polynominal.attribute"/>
    <parameter key="44" value="Number of engines.true.integer.attribute"/>
    <parameter key="45" value="Operator.true.polynominal.attribute"/>
    <parameter key="46" value="Operation type.true.polynominal.attribute"/>
    <parameter key="47" value="Schedule type.true.polynominal.attribute"/>
    <parameter key="48" value="Risk level.true.attribute_value.attribute"/>
    <parameter key="49" value="Risk grade.true.attribute_value.attribute"/>
    <parameter key="50" value="Aircraft damage.true.attribute_value.attribute"/>
    <parameter key="51" value="Highest damage.true.polynominal.attribute"/>
    <parameter key="52" value="Injury level.true.polynominal.attribute"/>
    <parameter key="53" value="Total number fatalities.true.attribute_value.attribute"/>
    <parameter key="54" value="Total no injuries.true.attribute_value.attribute"/>
    <parameter key="55" value="Wake turb\. category.true.polynominal.attribute"/>
    <parameter key="56" value="Flight number.true.polynominal.attribute"/>
    <parameter key="57" value="Call sign.true.polynominal.attribute"/>
    <parameter key="58" value="Last departure point.true.polynominal.attribute"/>
    <parameter key="59" value="Planned destination.true.polynominal.attribute"/>
    <parameter key="60" value="ANSP name.true.polynominal.attribute"/>
    <parameter key="61" value="Airspace class.true.polynominal.attribute"/>
    <parameter key="62" value="Location on aerodrome.true.attribute_value.attribute"/>
    <parameter key="63" value="Airspace name.true.polynominal.attribute"/>
    <parameter key="64" value="Airspace type.true.polynominal.attribute"/>
    <parameter key="65" value="FIR/UIR name.true.polynominal.attribute"/>
    <parameter key="66" value="Occurrence status.true.polynominal.attribute"/>
    <parameter key="67" value="Occurrence moderator.true.polynominal.attribute"/>
    <parameter key="68" value="Third party damage.true.polynominal.attribute"/>
    <parameter key="69" value="ATM contribution.true.polynominal.attribute"/>
    <parameter key="70" value="Effect on ATM service.true.polynominal.attribute"/>
    <parameter key="71" value="Schedule type (2).true.polynominal.attribute"/>
    <parameter key="72" value="Ops range.true.polynominal.attribute"/>
    <parameter key="73" value="Risk classification.true.attribute_value.attribute"/>
    <parameter key="74" value="Manufacturer/model [Level 1].true.polynominal.attribute"/>
    <parameter key="75" value="Manufacturer/model [Level 2].true.polynominal.attribute"/>
    <parameter key="76" value="Manufacturer/model [Level 3].true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="false" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="Merging Narratives to Master" width="90" x="179" y="34">
    <process expanded="true">
    <operator activated="true" class="trim" compatibility="8.1.001" expanded="true" height="82" name="Trim" width="90" x="45" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="8.1.001" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
    <list key="function_descriptions">
    <parameter key="Narrative_Master" value="replaceAll(concat(Headline,&quot;. &quot;,[Narrative text],&quot;. &quot;,[Narrative text (2)],&quot;. &quot;,[Narrative text (3)],&quot;. &quot;,[Narrative text (4)],&quot;. &quot;,[Narrative text (5)]),&quot;_x\\w\\w\\w\\w_&quot;,&quot;&quot;)"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Narrative_Master"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.1.001" expanded="true" height="82" name="Nominal to Text" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Narrative_Master"/>
    </operator>
    <connect from_port="in 1" to_op="Trim" to_port="example set input"/>
    <connect from_op="Trim" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="false" class="read_excel" compatibility="8.1.000" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="187">
    <parameter key="excel_file" value="C:\Users\vriesv\Documents\afstuderen\Library\Databases\COMPLETE_no_dup.xlsx"/>
    <parameter key="imported_cell_range" value="A1:BP47479"/>
    <parameter key="encoding" value="SYSTEM"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="KEY.true.polynominal.attribute"/>
    <parameter key="1" value="ECCAIRSNUMBER.true.polynominal.attribute"/>
    <parameter key="2" value="RESPONSIBLEENTITY.true.integer.attribute"/>
    <parameter key="3" value="File number.true.polynominal.attribute"/>
    <parameter key="4" value="Responsible entity.true.polynominal.attribute"/>
    <parameter key="5" value="Report identification.true.polynominal.attribute"/>
    <parameter key="6" value="Local date.true.polynominal.attribute"/>
    <parameter key="7" value="UTC date.true.polynominal.attribute"/>
    <parameter key="8" value="Local time.true.polynominal.attribute"/>
    <parameter key="9" value="UTC time.true.polynominal.attribute"/>
    <parameter key="10" value="Occurrence class.true.polynominal.attribute"/>
    <parameter key="11" value="Occurrence category.true.polynominal.attribute"/>
    <parameter key="12" value="Headline.true.polynominal.attribute"/>
    <parameter key="13" value="Narrative text.true.polynominal.attribute"/>
    <parameter key="14" value="Narrative text (2).true.attribute_value.attribute"/>
    <parameter key="15" value="Narrative text (3).true.attribute_value.attribute"/>
    <parameter key="16" value="Narrative text (4).true.attribute_value.attribute"/>
    <parameter key="17" value="Narrative text (5).true.attribute_value.attribute"/>
    <parameter key="18" value="Event type.true.attribute_value.attribute"/>
    <parameter key="19" value="Descr factor subject.true.attribute_value.attribute"/>
    <parameter key="20" value="Descr modifier.true.attribute_value.attribute"/>
    <parameter key="21" value="Expl factor subject.true.attribute_value.attribute"/>
    <parameter key="22" value="Expl factor modifier.true.attribute_value.attribute"/>
    <parameter key="23" value="Phase.true.attribute_value.attribute"/>
    <parameter key="24" value="Flight phase.true.polynominal.attribute"/>
    <parameter key="25" value="State/area of occ.true.polynominal.attribute"/>
    <parameter key="26" value="Location name.true.polynominal.attribute"/>
    <parameter key="27" value="Latitude of occ.true.attribute_value.attribute"/>
    <parameter key="28" value="Longitude of occ.true.attribute_value.attribute"/>
    <parameter key="29" value="Location indicator.true.polynominal.attribute"/>
    <parameter key="30" value="Aircraft category.true.polynominal.attribute"/>
    <parameter key="31" value="Aircraft registration.true.polynominal.attribute"/>
    <parameter key="32" value="Mass group.true.polynominal.attribute"/>
    <parameter key="33" value="State of registry.true.polynominal.attribute"/>
    <parameter key="34" value="Propulsion type.true.polynominal.attribute"/>
    <parameter key="35" value="Number of engines.true.integer.attribute"/>
    <parameter key="36" value="Operator.true.polynominal.attribute"/>
    <parameter key="37" value="Operation type.true.polynominal.attribute"/>
    <parameter key="38" value="Schedule type.true.polynominal.attribute"/>
    <parameter key="39" value="Risk level.true.attribute_value.attribute"/>
    <parameter key="40" value="Risk grade.true.attribute_value.attribute"/>
    <parameter key="41" value="Aircraft damage.true.attribute_value.attribute"/>
    <parameter key="42" value="Highest damage.true.polynominal.attribute"/>
    <parameter key="43" value="Injury level.true.polynominal.attribute"/>
    <parameter key="44" value="Total number fatalities.true.attribute_value.attribute"/>
    <parameter key="45" value="Total no injuries.true.attribute_value.attribute"/>
    <parameter key="46" value="Wake turb\. category.true.polynominal.attribute"/>
    <parameter key="47" value="Flight number.true.polynominal.attribute"/>
    <parameter key="48" value="Call sign.true.polynominal.attribute"/>
    <parameter key="49" value="Last departure point.true.polynominal.attribute"/>
    <parameter key="50" value="Planned destination.true.polynominal.attribute"/>
    <parameter key="51" value="ANSP name.true.polynominal.attribute"/>
    <parameter key="52" value="Airspace class.true.polynominal.attribute"/>
    <parameter key="53" value="Location on aerodrome.true.attribute_value.attribute"/>
    <parameter key="54" value="Airspace name.true.polynominal.attribute"/>
    <parameter key="55" value="Airspace type.true.polynominal.attribute"/>
    <parameter key="56" value="FIR/UIR name.true.attribute_value.attribute"/>
    <parameter key="57" value="Occurrence status.true.polynominal.attribute"/>
    <parameter key="58" value="Occurrence moderator.true.attribute_value.attribute"/>
    <parameter key="59" value="Third party damage.true.polynominal.attribute"/>
    <parameter key="60" value="ATM contribution.true.polynominal.attribute"/>
    <parameter key="61" value="Effect on ATM service.true.polynominal.attribute"/>
    <parameter key="62" value="Schedule type (2).true.polynominal.attribute"/>
    <parameter key="63" value="Ops range.true.polynominal.attribute"/>
    <parameter key="64" value="Risk classification.true.attribute_value.attribute"/>
    <parameter key="65" value="Manufacturer/model [Level 1].true.polynominal.attribute"/>
    <parameter key="66" value="Manufacturer/model [Level 2].true.polynominal.attribute"/>
    <parameter key="67" value="Manufacturer/model [Level 3].true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="false" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="Merging Narratives to Master (2)" width="90" x="179" y="187">
    <process expanded="true">
    <operator activated="true" class="trim" compatibility="8.1.001" expanded="true" height="82" name="Trim (2)" width="90" x="45" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="8.1.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="179" y="34">
    <list key="function_descriptions">
    <parameter key="Narrative_Master" value="replaceAll(concat(Headline,&quot;. &quot;,[Narrative text],&quot;. &quot;,[Narrative text (2)],&quot;. &quot;,[Narrative text (3)],&quot;. &quot;,[Narrative text (4)],&quot;. &quot;,[Narrative text (5)]),&quot;_x\\w\\w\\w\\w_&quot;,&quot;&quot;)"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Narrative_Master"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.1.001" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Narrative_Master"/>
    </operator>
    <connect from_port="in 1" to_op="Trim (2)" to_port="example set input"/>
    <connect from_op="Trim (2)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Text (2)" to_port="example set input"/>
    <connect from_op="Nominal to Text (2)" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="false" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (4)" width="90" x="313" y="187">
    <parameter key="create_word_vector" value="false"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (4)" width="90" x="45" y="34"/>
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize (4)" width="90" x="179" y="34"/>
    <operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (4)" width="90" x="313" y="34">
    <parameter key="min_chars" value="3"/>
    </operator>
    <operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (4)" width="90" x="447" y="34"/>
    <operator activated="true" class="text:generate_n_grams_terms" compatibility="8.1.000" expanded="true" height="68" name="Generate n-Grams (2)" width="90" x="581" y="34"/>
    <connect from_port="document" to_op="Transform Cases (4)" to_port="document"/>
    <connect from_op="Transform Cases (4)" from_port="document" to_op="Tokenize (4)" to_port="document"/>
    <connect from_op="Tokenize (4)" from_port="document" to_op="Filter Tokens (4)" to_port="document"/>
    <connect from_op="Filter Tokens (4)" from_port="document" to_op="Filter Stopwords (4)" to_port="document"/>
    <connect from_op="Filter Stopwords (4)" from_port="document" to_op="Generate n-Grams (2)" to_port="document"/>
    <connect from_op="Generate n-Grams (2)" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="false" class="text:wordlist_to_data" compatibility="8.1.000" expanded="true" height="82" name="WordList to Data (2)" width="90" x="447" y="238"/>
    <operator activated="false" class="filter_examples" compatibility="8.1.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="581" y="238">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="word.contains._"/>
    </list>
    </operator>
    <operator activated="false" class="write_excel" compatibility="8.1.001" expanded="true" height="82" name="Write Excel" width="90" x="849" y="136">
    <parameter key="excel_file" value="C:\Users\vriesv\Documents\RapidMiner_text_mining\Data\Bi-Gram_nodup_2007-2008.xlsx"/>
    <parameter key="encoding" value="SYSTEM"/>
    </operator>
    <operator activated="false" class="multiply" compatibility="8.1.001" expanded="true" height="82" name="Multiply (2)" width="90" x="715" y="238"/>
    <operator activated="false" class="write_excel" compatibility="8.1.001" expanded="true" height="82" name="Write Excel (2)" width="90" x="849" y="340">
    <parameter key="excel_file" value="C:\Users\vriesv\Documents\RapidMiner_text_mining\Data\Bi-gram_nodup_2014-2016.xlsx"/>
    <parameter key="encoding" value="SYSTEM"/>
    </operator>
    <operator activated="true" breakpoints="after" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="85">
    <parameter key="connection" value="Twitter - Studio Connection"/>
    <parameter key="query" value="rapidminer"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.1.001" expanded="true" height="82" name="Nominal to Text (3)" width="90" x="179" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="313" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (3)" width="90" x="313" y="34">
    <parameter key="create_word_vector" value="false"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (3)" width="90" x="45" y="34"/>
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize (3)" width="90" x="179" y="34"/>
    <operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (2)" width="90" x="313" y="34">
    <parameter key="min_chars" value="3"/>
    </operator>
    <operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (3)" width="90" x="447" y="34"/>
    <operator activated="true" class="text:generate_n_grams_terms" compatibility="8.1.000" expanded="true" height="68" name="Generate n-Grams (Terms)" width="90" x="581" y="34"/>
    <connect from_port="document" to_op="Transform Cases (3)" to_port="document"/>
    <connect from_op="Transform Cases (3)" from_port="document" to_op="Tokenize (3)" to_port="document"/>
    <connect from_op="Tokenize (3)" from_port="document" to_op="Filter Tokens (2)" to_port="document"/>
    <connect from_op="Filter Tokens (2)" from_port="document" to_op="Filter Stopwords (3)" to_port="document"/>
    <connect from_op="Filter Stopwords (3)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
    <connect from_op="Generate n-Grams (Terms)" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="text:wordlist_to_data" compatibility="8.1.000" expanded="true" height="82" name="WordList to Data" width="90" x="447" y="34"/>
    <operator activated="true" class="filter_examples" compatibility="8.1.001" expanded="true" height="103" name="Filter Examples" width="90" x="581" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="word.contains._"/>
    </list>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.1.001" expanded="true" height="103" name="Multiply" width="90" x="715" y="34"/>
    <connect from_op="Read Excel" from_port="output" to_op="Merging Narratives to Master" to_port="in 1"/>
    <connect from_op="Merging Narratives to Master (2)" from_port="out 1" to_op="Process Documents from Data (4)" to_port="example set"/>
    <connect from_op="Process Documents from Data (4)" from_port="word list" to_op="WordList to Data (2)" to_port="word list"/>
    <connect from_op="WordList to Data (2)" from_port="example set" to_op="Filter Examples (2)" to_port="example set input"/>
    <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
    <connect from_op="Multiply (2)" from_port="output 1" to_op="Write Excel (2)" to_port="input"/>
    <connect from_op="Search Twitter" from_port="output" to_op="Nominal to Text (3)" to_port="example set input"/>
    <connect from_op="Nominal to Text (3)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
    <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Process Documents from Data (3)" to_port="example set"/>
    <connect from_op="Process Documents from Data (3)" from_port="word list" to_op="WordList to Data" to_port="word list"/>
    <connect from_op="WordList to Data" from_port="example set" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 2" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>
  • Options
    Vincent_de_VrieVincent_de_Vrie Member Posts: 3 Contributor I

    Hello Thomas,

     

    I am also running on a Windows, perhaps the problem lies with my input data. I am not able to sort the exampleset coming from the wordlist to data operator nor the wordlist coming from this operator. I tried the sort operator but I am not able to select any atttribute from the attribute name field. Do you know why my example set does not seem to contain any attributes that are selectable within the sort operator (see screenshot)? While there actually are the attributes: Row.no, word, in documents and total within the exampleset, as can be seen in the result tab (see screenshot in my previous reply).

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @Vincent_de_Vrie Lately there's been some wierd metadata propogation problems (tagging @sgenzer), the solution to that problem is toggling on the "Snyc Metadata with Real Data" option in the Process pull down menu. Then run the process, let it error out, and then the meta data will be available. 

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi yes there are a few metadata propagation "oddities" that I believe are going to be sorted out in Studio 8.2. If you have a replicable process that can illustrate this, please post in Product Feedback and I will pass along.


    Scott

     

  • Options
    MEMMMEMM Member Posts: 5 Contributor II
    I seem to have the same problem using RapidMiner 10.3 on Windows and Mac. If there are more than about 100,000 rows I cannot sort, but it works if there are less than approx 100,000 rows.
Sign In or Register to comment.