[SOLVED] Splitting a nominal attribute that has no separator

UgoUgo Member Posts: 20 Contributor II
edited October 2019 in Help
Hello.

I have attributes whose values have the form:
  AK234  
  A112
In other words we have set of a alphabetical characters
followed by a set of digits. The question is: how can I split
the attribute into two attributes each containing either the
alphabetical or numerical characters.

I have attempted to use the Split operator but I seem to be
only able to select either one of the parts but not both.
Is their any way I can do this with an operator?

TIA,
Hugo F.

Tagged:

Answers

  • UgoUgo Member Posts: 20 Contributor II
    Hello,

    I have just realized that even though the split via regexp
    seems to work in the dialogue box that allows testing, the
    output does _not_ split the attribute value.

    Any help will be appreciated.
    TIA
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Salut Ugo,

    does your attribute have a special role, is it e.g. defined as label or as id? In that case, you have to check the "include_special_attributes" parameter at the Split operator.

    Otherwise, please post your regular expression such that we can have a look at it.

    Best regards,
    Marius
  • UgoUgo Member Posts: 20 Contributor II
    All those attributes are marked regular and polynominal.

    I think I am close to a solution.
    My current attempt uses the expression:
      [^a-z]+
    so a k100 will have the 100 highlighted.
    This means I can generate attributes such as:

      att_
      att_g
      att_k
      att_l
      att_s
      etc.

    This looks ok. For the second part I have:
      [a-z]+
    which results in attributes such as:
      att_
      att_100
      att_985
      etc.

    I cannot seem to do the split in a single expression
    (note that I was able to do this when I had values split
    with '/' in another attribute). I am now looking how I can
    use a "multiply" and then combine those attributes back into
    a single example set. Problem is now I have duplicate
    attributes.

    Seems way too convoluted. Any easier to do this?

    TIA
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    in fact the Split operator is useful, if you have two values in an attribute which are separated by a fixed string. Referring to your example with A100, AK200 or alike, there is no splitting sequence. The case would be different if you had A_100, AK_200 etc. Since you don't have it, you should use Generate Attributes and create 2 new attributes with the following definition:
    a1:    replaceAll(a, "[a-zA-Z]", "")
    a2:    replaceAll(a, "[^a-zA-Z]", "")

    This assumes that your original attribute is called a. The process below does what I described above.

    Best regards,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.005" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="45" y="30">
            <list key="attribute_values">
              <parameter key="a" value="&quot;AK100&quot;"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.005" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="30">
            <list key="function_descriptions">
              <parameter key="a1" value="replaceAll(a, &quot;[a-zA-Z]&quot;, &quot;&quot;)"/>
              <parameter key="a2" value="replaceAll(a, &quot;[^a-zA-Z]&quot;, &quot;&quot;)"/>
            </list>
          </operator>
          <connect from_op="Generate Data by User Specification" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • UgoUgo Member Posts: 20 Contributor II
    Ok, I am going to try this.

    Thanks.
  • UgoUgo Member Posts: 20 Contributor II
    Ok. Worked fine.

    Thanks once again.
Sign In or Register to comment.