NOTE: IF YOU WISH TO REPORT A NEW BUG, PLEASE POST A NEW QUESTION AND TAG AS "BUG REPORT". THANK YOU.

cross validation together with "Extract Topics from Data (LDA) operator

JoosJoos Member Posts: 11 Newbie
Hi

I am trying to set up the cross validation together with LDA. I used the cross validation operator with the output of the process documents operator as input. In the training half of the cross validation, I have the LDA operator. This model is then passed on to the test part. There I apply the model and I have this linked to a performance operator. When I run it, the apply model operator returns an error "The attribute documentid was already present in the example set." No idea how to resolve this?


0
0 votes

Sent to Engineering · Last Updated

FT-155

Comments

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi @Joos ,
    i am the author of the operator and need to say that this looks like a bug. A work around is to use Materialize Data right before apply model and LDA like this:
    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"><br>  <context><br>    <input/><br>    <output/><br>    <macros/><br>  </context><br>  <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process" origin="GENERATED_TUTORIAL"><br>    <parameter key="logverbosity" value="init"/><br>    <parameter key="random_seed" value="2001"/><br>    <parameter key="send_mail" value="never"/><br>    <parameter key="notification_email" value=""/><br>    <parameter key="process_duration_for_mail" value="30"/><br>    <parameter key="encoding" value="SYSTEM"/><br>    <process expanded="true"><br>      <operator activated="true" class="concurrency:loop" compatibility="8.2.000" expanded="true" height="82" name="Loop" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"><br>        <parameter key="number_of_iterations" value="5"/><br>        <parameter key="iteration_macro" value="iteration"/><br>        <parameter key="reuse_results" value="false"/><br>        <parameter key="enable_parallel_execution" value="true"/><br>        <process expanded="true"><br>          <operator activated="false" class="text:create_document" compatibility="9.1.000-SNAPSHOT" expanded="true" height="68" name="Create Document" origin="GENERATED_TUTORIAL" width="90" x="45" y="136"><br>            <parameter key="text" value="Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.   &#10;&#10;Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.   &#10;&#10;Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.   &#10;&#10;Nam liber tempor **** soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.   &#10;&#10;Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis.   &#10;&#10;At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, At accusam aliquyam diam diam dolore dolores duo eirmod eos erat, et nonumy sed tempor et et invidunt justo labore Stet clita ea et gubergren, kasd magna no rebum. sanctus sea sed takimata ut vero voluptua. est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur"/><br>            <parameter key="add label" value="false"/><br>            <parameter key="label_type" value="nominal"/><br>          </operator><br>          <operator activated="true" class="generate_data_user_specification" compatibility="9.2.001" expanded="true" height="68" name="Generate Data by User Specification" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"><br>            <list key="attribute_values"><br>              <parameter key="text" value="&quot;Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.   &#13;&#10;&#13;&#10;Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.   &#13;&#10;&#13;&#10;Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.   &#13;&#10;&#13;&#10;Nam liber tempor **** soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.   &#13;&#10;&#13;&#10;Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis.   &#13;&#10;&#13;&#10;At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, At accusam aliquyam diam diam dolore dolores duo eirmod eos erat, et nonumy sed tempor et et invidunt justo labore Stet clita ea et gubergren, kasd magna no rebum. sanctus sea sed takimata ut vero voluptua. est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur&quot;"/><br>            </list><br>            <list key="set_additional_roles"/><br>          </operator><br>          <operator activated="true" class="split" compatibility="9.2.001" expanded="true" height="82" name="Split" origin="GENERATED_TUTORIAL" width="90" x="179" y="34"><br>            <parameter key="attribute_filter_type" value="all"/><br>            <parameter key="attribute" value=""/><br>            <parameter key="attributes" value=""/><br>            <parameter key="use_except_expression" value="false"/><br>            <parameter key="value_type" value="nominal"/><br>            <parameter key="use_value_type_exception" value="false"/><br>            <parameter key="except_value_type" value="file_path"/><br>            <parameter key="block_type" value="single_value"/><br>            <parameter key="use_block_type_exception" value="false"/><br>            <parameter key="except_block_type" value="single_value"/><br>            <parameter key="invert_selection" value="false"/><br>            <parameter key="include_special_attributes" value="false"/><br>            <parameter key="split_pattern" value=","/><br>            <parameter key="split_mode" value="ordered_split"/><br>          </operator><br>          <operator activated="true" class="transpose" compatibility="9.2.001" expanded="true" height="82" name="Transpose" origin="GENERATED_TUTORIAL" width="90" x="380" y="34"/><br>          <operator activated="true" class="rename" compatibility="9.2.001" expanded="true" height="82" name="Rename" origin="GENERATED_TUTORIAL" width="90" x="581" y="34"><br>            <parameter key="old_name" value="att_1"/><br>            <parameter key="new_name" value="text"/><br>            <list key="rename_additional_attributes"/><br>          </operator><br>          <connect from_op="Generate Data by User Specification" from_port="output" to_op="Split" to_port="example set input"/><br>          <connect from_op="Split" from_port="example set output" to_op="Transpose" to_port="example set input"/><br>          <connect from_op="Transpose" from_port="example set output" to_op="Rename" to_port="example set input"/><br>          <connect from_op="Rename" from_port="example set output" to_port="output 1"/><br>          <portSpacing port="source_input 1" spacing="0"/><br>          <portSpacing port="sink_output 1" spacing="0"/><br>          <portSpacing port="sink_output 2" spacing="0"/><br>        </process><br>        <description align="center" color="transparent" colored="false" width="126">Get Texts</description><br>      </operator><br>      <operator activated="true" class="append" compatibility="9.2.001" expanded="true" height="82" name="Append" origin="GENERATED_TUTORIAL" width="90" x="179" y="34"><br>        <parameter key="datamanagement" value="double_array"/><br>        <parameter key="data_management" value="auto"/><br>        <parameter key="merge_type" value="all"/><br>      </operator><br>      <operator activated="true" class="concurrency:cross_validation" compatibility="9.2.001" expanded="true" height="145" name="Cross Validation" width="90" x="581" y="34"><br>        <parameter key="split_on_batch_attribute" value="false"/><br>        <parameter key="leave_one_out" value="false"/><br>        <parameter key="number_of_folds" value="10"/><br>        <parameter key="sampling_type" value="automatic"/><br>        <parameter key="use_local_random_seed" value="false"/><br>        <parameter key="local_random_seed" value="1992"/><br>        <parameter key="enable_parallel_execution" value="false"/><br>        <process expanded="true"><br>          <operator activated="true" class="materialize_data" compatibility="9.2.001" expanded="true" height="82" name="Materialize Data (3)" width="90" x="45" y="34"><br>            <parameter key="datamanagement" value="double_array"/><br>            <parameter key="data_management" value="auto"/><br>          </operator><br>          <operator activated="true" class="operator_toolbox:lda_exampleset" compatibility="2.3.000-SNAPSHOT" expanded="true" height="124" name="Extract Topics from Data (LDA)" origin="GENERATED_TUTORIAL" width="90" x="313" y="34"><br>            <parameter key="text_attribute" value="text"/><br>            <parameter key="number_of_topics" value="10"/><br>            <parameter key="use_alpha_heuristics" value="true"/><br>            <parameter key="alpha_sum" value="0.1"/><br>            <parameter key="use_beta_heuristics" value="true"/><br>            <parameter key="beta" value="0.01"/><br>            <parameter key="optimize_hyperparameters" value="true"/><br>            <parameter key="optimize_interval_for_hyperparameters" value="10"/><br>            <parameter key="top_words_per_topic" value="5"/><br>            <parameter key="iterations" value="100"/><br>            <parameter key="reproducible" value="false"/><br>            <parameter key="enable_logging" value="false"/><br>            <parameter key="use_local_random_seed" value="false"/><br>            <parameter key="local_random_seed" value="1992"/><br>          </operator><br>          <connect from_port="training set" to_op="Materialize Data (3)" to_port="example set input"/><br>          <connect from_op="Materialize Data (3)" from_port="example set output" to_op="Extract Topics from Data (LDA)" to_port="exa"/><br>          <connect from_op="Extract Topics from Data (LDA)" from_port="mod" to_port="model"/><br>          <portSpacing port="source_training set" spacing="0"/><br>          <portSpacing port="sink_model" spacing="0"/><br>          <portSpacing port="sink_through 1" spacing="0"/><br>        </process><br>        <process expanded="true"><br>          <operator activated="true" class="materialize_data" compatibility="9.2.001" expanded="true" height="82" name="Materialize Data (2)" width="90" x="45" y="136"><br>            <parameter key="datamanagement" value="double_array"/><br>            <parameter key="data_management" value="auto"/><br>          </operator><br>          <operator activated="true" class="apply_model" compatibility="9.2.001" expanded="true" height="82" name="Apply Model" width="90" x="179" y="34"><br>            <list key="application_parameters"/><br>            <parameter key="create_view" value="false"/><br>          </operator><br>          <operator activated="true" class="extract_performance" compatibility="9.2.001" expanded="true" height="82" name="Performance" width="90" x="514" y="34"><br>            <parameter key="performance_type" value="data_value"/><br>            <parameter key="statistics" value="average"/><br>            <parameter key="attribute_name" value="confidence(Topic_1)"/><br>            <parameter key="example_index" value="1"/><br>            <parameter key="optimization_direction" value="maximize"/><br>          </operator><br>          <connect from_port="model" to_op="Apply Model" to_port="model"/><br>          <connect from_port="test set" to_op="Materialize Data (2)" to_port="example set input"/><br>          <connect from_op="Materialize Data (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/><br>          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="example set"/><br>          <connect from_op="Performance" from_port="performance" to_port="performance 1"/><br>          <portSpacing port="source_model" spacing="0"/><br>          <portSpacing port="source_test set" spacing="0"/><br>          <portSpacing port="source_through 1" spacing="0"/><br>          <portSpacing port="sink_test set results" spacing="0"/><br>          <portSpacing port="sink_performance 1" spacing="0"/><br>          <portSpacing port="sink_performance 2" spacing="0"/><br>        </process><br>      </operator><br>      <connect from_op="Loop" from_port="output 1" to_op="Append" to_port="example set 1"/><br>      <connect from_op="Append" from_port="merged set" to_op="Cross Validation" to_port="example set"/><br>      <connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/><br>      <portSpacing port="source_input 1" spacing="0"/><br>      <portSpacing port="sink_result 1" spacing="0"/><br>      <portSpacing port="sink_result 2" spacing="0"/><br>    </process><br>  </operator><br></process><br><br>

    I am sorry for the inconvinience.

    Best,
    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • JoosJoos Member Posts: 11 Newbie
    can you help me how to import the xml?
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hi @Joos,

    To use the XML, you need to open a new process in rapidminer. After that, you need to go to View --> Show Panel --> XML present in the menu bar of rapdiminer. You need to copy the XML code provided in the above post and paste it in the XML window of rapidminer, then click the green tick mark on the XML window and you can see the process mention by rodrigo in your RM.

    Thanks,
    Varun
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • JoosJoos Member Posts: 11 Newbie
    I was able to check the xml, but I am still gettting the same error message.


    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="9.2.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="34">
            <parameter key="excel_file" value="C:\Users\jvandyc2\Documents\PostGraduate\Emails bewerkt20190512.xlsx"/>
            <parameter key="sheet_selection" value="sheet number"/>
            <parameter key="sheet_number" value="2"/>
            <parameter key="imported_cell_range" value="A1"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="first_row_as_names" value="true"/>
            <list key="annotations"/>
            <parameter key="date_format" value=""/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Mailbox.true.polynominal.attribute"/>
              <parameter key="1" value="Folder.true.polynominal.attribute"/>
              <parameter key="2" value="SubFolder.true.polynominal.attribute"/>
              <parameter key="3" value="Sender.true.polynominal.attribute"/>
              <parameter key="4" value="Subject.true.polynominal.attribute"/>
              <parameter key="5" value="Date.false.real.attribute"/>
              <parameter key="6" value="Size.false.integer.attribute"/>
              <parameter key="7" value="EmailID.false.polynominal.attribute"/>
              <parameter key="8" value="Body.true.polynominal.attribute"/>
              <parameter key="9" value="Category.false.polynominal.attribute"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="136">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="Subject|Body"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="map" compatibility="9.2.001" expanded="true" height="82" name="Map" width="90" x="246" y="85">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="SubFolder"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <list key="value_mappings">
              <parameter key="B.1 Pensioenen " value="B.1 Pensioenen "/>
              <parameter key="B1. Pensioenen " value="B.1 Pensioenen "/>
              <parameter key="C. Klantenrekeningen Betwisting " value="C. Klantenrekeningen Betwisting "/>
              <parameter key="C. Klantenrekeningen Inning " value="C. Klantenrekeningen Inning "/>
              <parameter key="D. Knowledge Center " value="D. Knowledge Center "/>
              <parameter key="E. Knowledge Center - Oost-Vlaanderen " value="E. Knowledge Center "/>
              <parameter key="E. Knowledge Center - West-Vlaanderen (niet in submappen) " value="E. Knowledge Center "/>
              <parameter key="E. Knowledge Center - Zelfstandigen A'pen " value="E. Knowledge Center "/>
              <parameter key="E. Knowledge Center- Zelfstandigen Kempen " value="E. Knowledge Center "/>
              <parameter key="E. Knowledge Center- Zelfstandigen Limburg " value="E. Knowledge Center "/>
              <parameter key="E. Verzekeringen - AS Antwerpen " value="E. Verzekeringen "/>
              <parameter key="F. AS KMO Brugge ( Optimalisatie bijdragen ? ) " value="F. KMO "/>
              <parameter key="F. AS KMO Hasselt " value="F. KMO "/>
              <parameter key="F. AS KMO Leuven " value="F. KMO "/>
              <parameter key="F. KMO Aalst " value="F. KMO "/>
              <parameter key="F. KMO Mechelen " value="F. KMO "/>
              <parameter key="F. KMO Turnhout " value="F. KMO "/>
              <parameter key="F. Verzekeringen - Zelfstandigen Antwerpen " value="E. Verzekeringen "/>
              <parameter key="F. Verzekeringen - Zelfstandigen Limburg " value="E. Verzekeringen "/>
              <parameter key="F. Verzekeringen - Zelfstandigen Oost-Vlaanderen " value="E. Verzekeringen "/>
              <parameter key="F. Verzekeringen - Zelfstandigen Vlaams-Brabant " value="E. Verzekeringen "/>
              <parameter key="F. Verzekeringen - Zelfstandigen West-Vlaanderen " value="E. Verzekeringen "/>
              <parameter key="G. AS KMO Brussel " value="F. KMO "/>
              <parameter key="G. AS KMO Roeselare ( Optimalisatie bijdragen ? ) " value="F. KMO "/>
              <parameter key="G. KMO Gent " value="F. KMO "/>
              <parameter key="G. KMO Wilrijk " value="F. KMO "/>
              <parameter key="H. KMO Antwerpen Centrum  " value="F. KMO "/>
              <parameter key="Moederschapsrust " value="Moederschapsrust "/>
              <parameter key="OLK " value="OLK "/>
              <parameter key="SVF " value="SVF "/>
            </list>
            <parameter key="consider_regular_expressions" value="false"/>
            <parameter key="add_default_mapping" value="false"/>
          </operator>
          <operator activated="true" class="replace" compatibility="9.2.001" expanded="true" height="82" name="Replace" width="90" x="313" y="238">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Body"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="_x000D_"/>
            <parameter key="replace_by" value=" "/>
          </operator>
          <operator activated="true" class="nominal_to_text" compatibility="9.2.001" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="447" y="289">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="text:data_to_documents" compatibility="8.1.000" expanded="true" height="68" name="Data to Documents" width="90" x="112" y="391">
            <parameter key="select_attributes_and_weights" value="true"/>
            <list key="specify_weights">
              <parameter key="Body" value="2.0"/>
              <parameter key="Subject" value="1.0"/>
            </list>
          </operator>
          <operator activated="true" class="text:process_documents" compatibility="8.1.000" expanded="true" height="103" name="Process Documents" width="90" x="313" y="391">
            <parameter key="create_word_vector" value="true"/>
            <parameter key="vector_creation" value="Binary Term Occurrences"/>
            <parameter key="add_meta_information" value="true"/>
            <parameter key="keep_text" value="true"/>
            <parameter key="prune_method" value="absolute"/>
            <parameter key="prune_below_percent" value="3.0"/>
            <parameter key="prune_above_percent" value="30.0"/>
            <parameter key="prune_below_absolute" value="5"/>
            <parameter key="prune_above_absolute" value="2000"/>
            <parameter key="prune_below_rank" value="0.05"/>
            <parameter key="prune_above_rank" value="0.95"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
            <process expanded="true">
              <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="45" y="34">
                <parameter key="mode" value="non letters"/>
                <parameter key="characters" value=".:"/>
                <parameter key="language" value="English"/>
                <parameter key="max_token_length" value="3"/>
              </operator>
              <operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="112" y="289">
                <parameter key="min_chars" value="2"/>
                <parameter key="max_chars" value="9999"/>
              </operator>
              <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="112" y="442">
                <parameter key="transform_to" value="lower case"/>
              </operator>
              <operator activated="true" class="text:filter_stopwords_dictionary" compatibility="8.1.000" expanded="true" height="82" name="Filter Stopwords (2)" width="90" x="246" y="493">
                <parameter key="file" value="C:\Users\jvandyc2\Documents\PostGraduate\Stopwords Dutch.txt"/>
                <parameter key="case_sensitive" value="false"/>
                <parameter key="encoding" value="SYSTEM"/>
              </operator>
              <operator activated="true" class="text:stem_snowball" compatibility="8.1.000" expanded="true" height="68" name="Stem (2)" width="90" x="380" y="289">
                <parameter key="language" value="Dutch"/>
              </operator>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
              <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
              <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
              <connect from_op="Filter Stopwords (2)" from_port="document" to_op="Stem (2)" to_port="document"/>
              <connect from_op="Stem (2)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="concurrency:cross_validation" compatibility="9.2.001" expanded="true" height="145" name="Cross Validation" width="90" x="313" y="595">
            <parameter key="split_on_batch_attribute" value="false"/>
            <parameter key="leave_one_out" value="false"/>
            <parameter key="number_of_folds" value="10"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="operator_toolbox:lda_exampleset" compatibility="2.0.001" expanded="true" height="124" name="Extract Topics from Data (LDA)" width="90" x="45" y="136">
                <parameter key="text_attribute" value="text"/>
                <parameter key="number_of_topics" value="5"/>
                <parameter key="use_alpha_heuristics" value="true"/>
                <parameter key="alpha_sum" value="0.1"/>
                <parameter key="use_beta_heuristics" value="true"/>
                <parameter key="beta" value="0.01"/>
                <parameter key="optimize_hyperparameters" value="true"/>
                <parameter key="optimize_interval_for_hyperparameters" value="10"/>
                <parameter key="top_words_per_topic" value="7"/>
                <parameter key="iterations" value="1000"/>
                <parameter key="reproducible" value="true"/>
                <parameter key="enable_logging" value="true"/>
                <parameter key="use_local_random_seed" value="true"/>
                <parameter key="local_random_seed" value="1992"/>
              </operator>
              <operator activated="true" class="write_excel" compatibility="9.2.001" expanded="true" height="82" name="Write Excel (2)" width="90" x="179" y="34">
                <parameter key="excel_file" value="C:\Users\jvandyc2\Documents\PostGraduate\output\topic.xlsx"/>
                <parameter key="file_format" value="xlsx"/>
                <parameter key="encoding" value="SYSTEM"/>
                <parameter key="sheet_name" value="RapidMiner Data"/>
                <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
                <parameter key="number_format" value="#.0"/>
              </operator>
              <connect from_port="training set" to_op="Extract Topics from Data (LDA)" to_port="exa"/>
              <connect from_op="Extract Topics from Data (LDA)" from_port="top" to_op="Write Excel (2)" to_port="input"/>
              <connect from_op="Extract Topics from Data (LDA)" from_port="mod" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="materialize_data" compatibility="9.2.001" expanded="true" height="82" name="Materialize Data" width="90" x="45" y="340">
                <parameter key="datamanagement" value="double_array"/>
                <parameter key="data_management" value="auto"/>
              </operator>
              <operator activated="true" class="apply_model" compatibility="9.2.001" expanded="true" height="82" name="Apply Model" width="90" x="246" y="289">
                <list key="application_parameters"/>
                <parameter key="create_view" value="false"/>
              </operator>
              <operator activated="true" class="performance" compatibility="9.2.001" expanded="true" height="82" name="Performance" width="90" x="313" y="34">
                <parameter key="use_example_weights" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Materialize Data" to_port="example set input"/>
              <connect from_op="Materialize Data" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
              <connect from_op="Performance" from_port="example set" to_port="test set results"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_test set results" spacing="0"/>
              <portSpacing port="sink_performance 1" spacing="0"/>
              <portSpacing port="sink_performance 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="648" y="646">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="confidence(Topic_0)|confidence(Topic_1)|confidence(Topic_2)|confidence(Topic_3)|confidence(Topic_4)|documentid|prediction(Topic)|text"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="write_excel" compatibility="9.2.001" expanded="true" height="82" name="Write Excel" width="90" x="771" y="442">
            <parameter key="excel_file" value="C:\Users\jvandyc2\Documents\PostGraduate\output\exa.xlsx"/>
            <parameter key="file_format" value="xlsx"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="sheet_name" value="RapidMiner Data"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="number_format" value="#.0"/>
          </operator>
          <connect from_op="Read Excel (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Map" to_port="example set input"/>
          <connect from_op="Map" from_port="example set output" to_op="Replace" to_port="example set input"/>
          <connect from_op="Replace" from_port="example set output" to_op="Nominal to Text (2)" to_port="example set input"/>
          <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Data to Documents" to_port="example set"/>
          <connect from_op="Data to Documents" from_port="documents" to_op="Process Documents" to_port="documents 1"/>
          <connect from_op="Process Documents" from_port="example set" to_op="Cross Validation" to_port="example set"/>
          <connect from_op="Cross Validation" from_port="example set" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Write Excel" to_port="input"/>
          <connect from_op="Write Excel" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

Sign In or Register to comment.