Meta data problem

Benedict_von_AhBenedict_von_Ah Member Posts: 8 Contributor II
edited November 2018 in Help

Hey, 

I'm using the extension "dictionary based sentiment analysis" and have got a problem with some meta data output at the end. Everything works out fine, but i cannot see the token number. What i wanted to do: Screening text, scoring each text, output is negative/ positive and the number of uncovered tokens - so in order to be able to use the "number of uncovered tokens" i want to know the number of total tokens i have in my text. I'm using the "Extract token number" but it won't display at the end.

 

Thanks for help

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="text:filter_stopwords_german" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (German)" width="90" x="447" y="34">
<parameter key="stop_word_list" value="Standard"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="179" y="187">
<parameter key="min_chars" value="4"/>
<parameter key="max_chars" value="40"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="313" y="187">
<parameter key="transform_to" value="lower case"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="text:extract_token_number" compatibility="7.5.000" expanded="true" height="68" name="Extract Token Number" width="90" x="514" y="187">
<parameter key="metadata_key" value="token_number"/>
<parameter key="condition" value="all"/>
<parameter key="case_sensitive" value="false"/>
<parameter key="invert_condition" value="false"/>
</operator>
</process>

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Your process XML appears to be malformed and won't render.  Are you sure this is the XML from a single complete process?

    In the meantime, "Extract Token Number" is meant to be used inside "Process Documents" so you'll need to incorporate it there in your workflow.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Sorry but I have a tangential question... @Telcontar120 - this thing seems to happen a lot.  Any idea why people's XML gets corrupted in this way?  @Benedict_von_Ah if you could help me understand how you pasted the XML, this would be helpful.  Thanks!

     

    Scott

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    @sgenzer I really don't know about the XML corruption---I remember discussing this at one point in the past with @Thomas_Ott and I think he thought it was some kind of problem with the Lithium site backend.  

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    yeah that's what I'm worried about but I don't see that problem when experienced users post code - only new users.  I assume this is not corrupted for you?

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="112" y="289">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="split_data" compatibility="7.6.001" expanded="true" height="103" name="Split Data" width="90" x="246" y="289">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.9"/>
    <parameter key="ratio" value="0.1"/>
    </enumeration>
    </operator>
    <operator activated="true" class="keras:sequential" compatibility="1.0.003" expanded="true" height="166" name="Keras Model" width="90" x="447" y="187">
    <parameter key="input shape" value="(4,)"/>
    <parameter key="loss" value="categorical_crossentropy"/>
    <parameter key="optimizer" value="Adam"/>
    <parameter key="learning rate" value="0.001"/>
    <enumeration key="metric"/>
    <parameter key="epochs" value="128"/>
    <enumeration key="callbacks">
    <parameter key="callbacks" value="TensorBoard(log_dir='./logs', histogram_freq=0, write_graph=True, write_images=False, embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None)"/>
    </enumeration>
    <process expanded="true">
    <operator activated="true" class="keras:core_layer" compatibility="1.0.003" expanded="true" height="82" name="Add Core Layer" width="90" x="179" y="289">
    <parameter key="no_units" value="8"/>
    <parameter key="activation_function" value="'relu'"/>
    <parameter key="target_shape" value="(1, 1)"/>
    <parameter key="dims" value="1.1"/>
    <parameter key="repetition_factor" value="2"/>
    </operator>
    <operator activated="true" class="keras:core_layer" compatibility="1.0.003" expanded="true" height="82" name="Add Core Layer (2)" width="90" x="313" y="289">
    <parameter key="no_units" value="3"/>
    <parameter key="activation_function" value="'softmax'"/>
    <parameter key="target_shape" value="(1, 1)"/>
    <parameter key="dims" value="1.1"/>
    <parameter key="repetition_factor" value="2"/>
    </operator>
    <connect from_op="Add Core Layer" from_port="layers 1" to_op="Add Core Layer (2)" to_port="layers"/>
    <connect from_op="Add Core Layer (2)" from_port="layers 1" to_port="layers 1"/>
    <portSpacing port="sink_layers 1" spacing="0"/>
    <portSpacing port="sink_layers 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="keras:apply" compatibility="1.0.003" expanded="true" height="82" name="Apply Keras Model" width="90" x="648" y="289">
    <parameter key="batch_size" value="16"/>
    </operator>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Split Data" to_port="example set"/>
    <connect from_op="Split Data" from_port="partition 1" to_op="Keras Model" to_port="training set"/>
    <connect from_op="Split Data" from_port="partition 2" to_op="Apply Keras Model" to_port="unlabelled data"/>
    <connect from_op="Keras Model" from_port="model" to_op="Apply Keras Model" to_port="model"/>
    <connect from_op="Apply Keras Model" from_port="labelled data" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    @sgenzer yep, that one's fine for me (nice Keras model, btw) :smileyhappy:

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Now that more people are posting XML's I think I might have to rethink my original hypothesis. It appears that new users are posting corrupted XML's mostly. 

Sign In or Register to comment.