Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Nominal and polynominal attributes

colocolo Member Posts: 236 Maven
edited November 2018 in Help
Hello everybody,

I want to join two example sets through an attribute holding a certain file name. My problem is that one attribute was automatically generated as meta-data from the "Process Documents" operator (type "Polynominal" was assigned) and the second one is read from an Excel file and is of type "Nominal". In both example sets those attributes are set as id, but when trying to join ("Join" operator) it leads to an error:

Message: The attribute metadata_file has value type polynominal, should be nominal.

There is no operator to convert one type into the other one. Is there a clean and simple way to manage the join with these attributes or do I need to do some silly things like copying the value of the polynominal attribute into a new one with type "Nominal"?

Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi there Colo,

    Without the XML it is difficult to be that helpful; I know you want to join on a specified attribute, but you can always get brutal.... If both rowsets are in the correct order you could just add an ID attribute to both, and use that to join on; alternatively things might work if you set the role of that attribute metadata_file to ID, like this...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="339" width="795">
          <operator activated="true" class="web:read_rss" expanded="true" height="60" name="Read RSS Feed" width="90" x="67" y="41">
            <parameter key="url" value="http://pipes.yahoo.com/pipes/pipe.run?_id=ac1eb8b5a37926f02d3edcb857158e4a&amp;_render=rss"/>
          </operator>
          <operator activated="true" class="text:process_document_from_data" expanded="true" height="76" name="Process Documents from Data" width="90" x="268" y="41">
            <parameter key="vector_creation" value="Term Occurrences"/>
            <parameter key="keep_text" value="true"/>
            <list key="specify_weights"/>
            <process expanded="true" height="357" width="813">
              <operator activated="true" class="text:filter_stopwords_english" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="112" y="30"/>
              <operator activated="true" class="text:tokenize" expanded="true" height="60" name="Tokenize" width="90" x="447" y="75"/>
              <connect from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
              <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="433" y="40">
            <parameter key="name" value="text"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <connect from_op="Read RSS Feed" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
          <connect from_op="Process Documents from Data" from_port="example set" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Having text as an Id rather depends on non-duplication I guess, but you get the drift - put up landing lights to make the join easy..

    Have fun..

Sign In or Register to comment.