WordList to Data gives me example set with no attribute names

mizunooto · April 2018

I'm running Process Documents to get a word list which I then convert to data using WordList to Data. All goes well until I try to select, filter or otherwise use the dataset thus created. I cannot see any attribute names in the data. I can manually type them in (e.g. in Select Attributes, but not all operators allow this), but subsequent operators in the chain do not see the names.

I have tried Synchronize Metadata and Materiaize Data, but neither of these helps.

In the example, the final Select Attributes operator does not allow me to select attributes from a list, although I can type "word" into the box when the filter type is "single".

Any ideas?

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="generate_data_user_specification" compatibility="7.6.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="34">
        <list key="attribute_values">
          <parameter key="text" value="&quot;hello world out there&quot;"/>
        </list>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="7.6.001" expanded="true" height="82" name="Nominal to Text" width="90" x="246" y="34">
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="447" y="34">
        <parameter key="create_word_vector" value="true"/>
        <parameter key="vector_creation" value="TF-IDF"/>
        <parameter key="add_meta_information" value="true"/>
        <parameter key="keep_text" value="false"/>
        <parameter key="prune_method" value="none"/>
        <parameter key="prune_below_percent" value="3.0"/>
        <parameter key="prune_above_percent" value="30.0"/>
        <parameter key="prune_below_rank" value="0.05"/>
        <parameter key="prune_above_rank" value="0.95"/>
        <parameter key="datamanagement" value="double_sparse_array"/>
        <parameter key="data_management" value="auto"/>
        <parameter key="select_attributes_and_weights" value="false"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="85">
            <parameter key="mode" value="non letters"/>
            <parameter key="characters" value=".:"/>
            <parameter key="language" value="English"/>
            <parameter key="max_token_length" value="3"/>
          </operator>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="text:wordlist_to_data" compatibility="7.5.000" expanded="true" height="82" name="WordList to Data" width="90" x="581" y="34"/>
      <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="715" y="34">
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <connect from_op="Generate Data by User Specification" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_op="WordList to Data" to_port="word list"/>
      <connect from_op="WordList to Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Telcontar120 · April 2018

Yes, this is an annoying RapidMiner bug. It's purely a metadata propagation problem. If you tried to use the "Synchronize Meta Data with Real Data" option but it didn't solve the problem, you can still add the attributes you want manually when selecting the "subset" option. Just type the name of the attributes into the box in the right hand side and then click the green plus button (see screenshot). They won't autopopulate but they will actually work when you run the process.

select attributes.PNG

mizunooto · April 2018

Thanks - glad it's not just me!

Given that it's a bug, I've got a workaround which is to generate new attributes (equal to the existing attributes) after the dataset conversion. I can then select the new attributes as per normal and throw away the old, hidden attributes. A bit of a faff, but as the issue propagates all through the workflow, it's easier this way.

Telcontar120 · April 2018

@mizunooto I'd love to see the process you are using to do that---I am assuming you are doing it in some kind of automated fashion and not just typing in all the new attribute names manually?

mizunooto · April 2018

Imagination is a wonderful thing... at present I'm only using a couple of attributes so a manual operation is fine.

e.g. newword = word

Automation sounds like a great idea but I'll need to learn more to make it work.

sgenzer · April 2018

yes that's a metadata propagation error for sure. Pushing this thread to Product Feedback.

Scott

sgenzer · April 2018

Metadata problem recognized and reported to dev team. Please watch this board for updates.

SG

sgenzer · May 2018

just to confirm with people watching this issue - toggling the "Synchronize Metadata with Real Data" feature does not resolve this, correct?

Telcontar120 · May 2018

Correct, the "Sychronize Metadata" option doesn't fix this particular problem. I've found that it doesn't correct it 100% of the time, there are a few other cases where I have found similar behavior.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

WordList to Data gives me example set with no attribute names

Fixed and Released · Last Updated May 2019

Comments