WordList to Data gives me example set with no attribute names

mizunootomizunooto Member Posts: 9 Contributor II
edited December 2018 in Product Feedback - Resolved

I'm running Process Documents to get a word list which I then convert to data using WordList to Data. All goes well until I try to select, filter or otherwise use the dataset thus created. I cannot see any attribute names in the data. I can manually type them in (e.g. in Select Attributes, but not all operators allow this), but subsequent operators in the chain do not see the names.

I have tried Synchronize Metadata and Materiaize Data, but neither of these helps.

In the example, the final Select Attributes operator does not allow me to select attributes from a list, although I can type "word" into the box when the filter type is "single".

Any ideas?

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="7.6.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="34">
<list key="attribute_values">
<parameter key="text" value="&quot;hello world out there&quot;"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="7.6.001" expanded="true" height="82" name="Nominal to Text" width="90" x="246" y="34">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="447" y="34">
<parameter key="create_word_vector" value="true"/>
<parameter key="vector_creation" value="TF-IDF"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="keep_text" value="false"/>
<parameter key="prune_method" value="none"/>
<parameter key="prune_below_percent" value="3.0"/>
<parameter key="prune_above_percent" value="30.0"/>
<parameter key="prune_below_rank" value="0.05"/>
<parameter key="prune_above_rank" value="0.95"/>
<parameter key="datamanagement" value="double_sparse_array"/>
<parameter key="data_management" value="auto"/>
<parameter key="select_attributes_and_weights" value="false"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="85">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:wordlist_to_data" compatibility="7.5.000" expanded="true" height="82" name="WordList to Data" width="90" x="581" y="34"/>
<operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="715" y="34">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="word list" to_op="WordList to Data" to_port="word list"/>
<connect from_op="WordList to Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0
0 votes

Fixed and Released · Last Updated

31 May 2019: no additional comments here. Moving to Resolved. Please note if this is still an ongoing issue. From engineering: "There seems to be a misunderstanding of what "Synchronize Meta Data with Real Data" does. As the tooltip explains, this is purely for replacing 'guessed' meta data with the 'real deal' AFTER operator execution. So after Studio actually saw the output, it will now build Meta Data based on that output, as opposed to the earlier guessing of the output. Just switching between the states without running the process has no effect." Please add additional information to reproduce bug behavior. IC-812

Comments

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    Yes, this is an annoying RapidMiner bug.  It's purely a metadata propagation problem.  If you tried to use the "Synchronize Meta Data with Real Data" option but it didn't solve the problem, you can still add the attributes you want manually when selecting the "subset" option.  Just type the name of the attributes into the box in the right hand side and then click the green plus button (see screenshot).  They won't autopopulate but they will actually work when you run the process.

    select attributes.PNG

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • mizunootomizunooto Member Posts: 9 Contributor II

    Thanks - glad it's not just me!

    Given that it's a bug, I've got a workaround which is to generate new attributes (equal to the existing attributes) after the dataset conversion. I can then select the new attributes as per normal and throw away the old, hidden attributes. A bit of a faff, but as the issue propagates all through the workflow, it's easier this way.

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    @mizunooto I'd love to see the process you are using to do that---I am assuming you are doing it in some kind of automated fashion and not just typing in all the new attribute names manually?

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • mizunootomizunooto Member Posts: 9 Contributor II

    Imagination is a wonderful thing... at present I'm only using a couple of attributes so a manual operation is fine.

    e.g. newword = word

    Automation sounds like a great idea but I'll need to learn more to make it work.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    yes that's a metadata propagation error for sure. Pushing this thread to Product Feedback.


    Scott

     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Metadata problem recognized and reported to dev team. Please watch this board for updates.


    SG

     

     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    just to confirm with people watching this issue - toggling the "Synchronize Metadata with Real Data" feature does not resolve this, correct?

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Correct, the "Sychronize Metadata" option doesn't fix this particular problem.  I've found that it doesn't correct it 100% of the time, there are a few other cases where I have found similar behavior.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.