[SOLVED] Splitting a nominal attribute that has no separator

UgoUgo Member Posts: 20 Contributor II
edited October 2019 in Help

I have attributes whose values have the form:
In other words we have set of a alphabetical characters
followed by a set of digits. The question is: how can I split
the attribute into two attributes each containing either the
alphabetical or numerical characters.

I have attempted to use the Split operator but I seem to be
only able to select either one of the parts but not both.
Is their any way I can do this with an operator?

Hugo F.



  • Options
    UgoUgo Member Posts: 20 Contributor II

    I have just realized that even though the split via regexp
    seems to work in the dialogue box that allows testing, the
    output does _not_ split the attribute value.

    Any help will be appreciated.
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Salut Ugo,

    does your attribute have a special role, is it e.g. defined as label or as id? In that case, you have to check the "include_special_attributes" parameter at the Split operator.

    Otherwise, please post your regular expression such that we can have a look at it.

    Best regards,
  • Options
    UgoUgo Member Posts: 20 Contributor II
    All those attributes are marked regular and polynominal.

    I think I am close to a solution.
    My current attempt uses the expression:
    so a k100 will have the 100 highlighted.
    This means I can generate attributes such as:


    This looks ok. For the second part I have:
    which results in attributes such as:

    I cannot seem to do the split in a single expression
    (note that I was able to do this when I had values split
    with '/' in another attribute). I am now looking how I can
    use a "multiply" and then combine those attributes back into
    a single example set. Problem is now I have duplicate

    Seems way too convoluted. Any easier to do this?

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn

    in fact the Split operator is useful, if you have two values in an attribute which are separated by a fixed string. Referring to your example with A100, AK200 or alike, there is no splitting sequence. The case would be different if you had A_100, AK_200 etc. Since you don't have it, you should use Generate Attributes and create 2 new attributes with the following definition:
    a1:    replaceAll(a, "[a-zA-Z]", "")
    a2:    replaceAll(a, "[^a-zA-Z]", "")

    This assumes that your original attribute is called a. The process below does what I described above.

    Best regards,
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
      <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.005" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="45" y="30">
            <list key="attribute_values">
              <parameter key="a" value="&quot;AK100&quot;"/>
            <list key="set_additional_roles"/>
          <operator activated="true" class="generate_attributes" compatibility="5.3.005" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="30">
            <list key="function_descriptions">
              <parameter key="a1" value="replaceAll(a, &quot;[a-zA-Z]&quot;, &quot;&quot;)"/>
              <parameter key="a2" value="replaceAll(a, &quot;[^a-zA-Z]&quot;, &quot;&quot;)"/>
          <connect from_op="Generate Data by User Specification" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
  • Options
    UgoUgo Member Posts: 20 Contributor II
    Ok, I am going to try this.

  • Options
    UgoUgo Member Posts: 20 Contributor II
    Ok. Worked fine.

    Thanks once again.
Sign In or Register to comment.