Options

"FP Growth - Runs Indefinitely[?]"

NedPointsmanNedPointsman Member Posts: 2 Contributor I
edited June 2019 in Help
I've tried on multiple machines but have never been able to get this operator to work in any practical scenario - it just goes for hours but never seems to produce any results, the example in the tutorial runs fine however.

I basically copy this video - http://www.youtube.com/watch?v=vhMzUi-FMy0

He's doing the same thing I am and his process is almost immediate, the only difference is he's using sql whereas I'm using flat html files, but it would be absurd if that were the problem.

I've tried it without the 'create association rules' operator and still got the same problem

I've tried it with only two html files, same problem.

Is there some fundamental thing that I've just missed?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.007">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:process_document_from_file" compatibility="5.3.000" expanded="true" height="76" name="Process Documents from Files" width="90" x="112" y="30">
        <list key="text_directories">
          <parameter key="1" value="C:\Sources"/>
        </list>
        <parameter key="vector_creation" value="Binary Term Occurrences"/>
        <parameter key="add_meta_information" value="false"/>
        <parameter key="keep_text" value="true"/>
        <parameter key="prune_method" value="absolute"/>
        <parameter key="prune_below_absolute" value="2"/>
        <parameter key="prune_above_absolute" value="999"/>
        <process expanded="true">
          <operator activated="true" class="web:extract_html_text_content" compatibility="5.3.000" expanded="true" height="60" name="Extract Content" width="90" x="112" y="30">
            <parameter key="minimum_text_block_length" value="3"/>
          </operator>
          <operator activated="true" class="text:transform_cases" compatibility="5.3.000" expanded="true" height="60" name="Transform Cases" width="90" x="112" y="120"/>
          <operator activated="true" class="text:tokenize" compatibility="5.3.000" expanded="true" height="60" name="Tokenize" width="90" x="112" y="210"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="5.3.000" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="112" y="300"/>
          <operator activated="true" class="text:filter_by_length" compatibility="5.3.000" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="313" y="120">
            <parameter key="min_chars" value="2"/>
            <parameter key="max_chars" value="99"/>
          </operator>
          <connect from_port="document" to_op="Extract Content" to_port="document"/>
          <connect from_op="Extract Content" from_port="document" to_op="Transform Cases" to_port="document"/>
          <connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
          <connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="numerical_to_binominal" compatibility="5.3.007" expanded="true" height="76" name="Numerical to Binominal" width="90" x="112" y="165"/>
      <operator activated="true" class="fp_growth" compatibility="5.3.007" expanded="true" height="76" name="FP-Growth" width="90" x="112" y="300">
        <parameter key="min_support" value="0.05"/>
      </operator>
      <operator activated="true" class="create_association_rules" compatibility="5.3.007" expanded="true" height="76" name="Create Association Rules" width="90" x="246" y="300">
        <parameter key="min_confidence" value="0.95"/>
      </operator>
      <connect from_op="Process Documents from Files" from_port="example set" to_op="Numerical to Binominal" to_port="example set input"/>
      <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
      <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
      <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Tagged:
Sign In or Register to comment.