Rename by Replacing - REGEX

miked
miked New Altair Community Member
edited November 2024 in Community Q&A
Hi All - struggling to utilize regex to rename columns after pivoting. Ideally trying to remove the "count(IDT)_" and leave the year on the header. Any help would be appreciated. XML and sample data attached. 

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_excel" compatibility="9.0.002" expanded="true" height="68" name="Read Excel (3)" width="90" x="112" y="1207">
        <parameter key="excel_file" value="C:\Users\mdueber\OneDrive\OneDrive - JAGUAR LAND ROVER\Analytics Department\RapidMiner Support Files\RM on OneDrive Support\Service Retention\Data files\Car Parc\example2.xlsx"/>
        <list key="annotations"/>
        <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="IDT.true.polynominal.attribute"/>
          <parameter key="1" value="ZIP.true.polynominal.attribute"/>
          <parameter key="2" value="COLOR.true.polynominal.attribute"/>
          <parameter key="3" value="MODEL.true.polynominal.attribute"/>
          <parameter key="4" value="SEGMENT.true.polynominal.attribute"/>
          <parameter key="5" value="YEAR.true.integer.attribute"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
        <description align="center" color="transparent" colored="false" width="126">UIO</description>
      </operator>
      <operator activated="true" class="append" compatibility="9.0.002" expanded="true" height="82" name="Append (3)" width="90" x="313" y="1207"/>
      <operator activated="true" class="filter_examples" compatibility="9.0.002" expanded="true" height="103" name="Filter Examples (4)" width="90" x="447" y="1207">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="YEAR.ge.2012"/>
        </list>
      </operator>
      <operator activated="true" class="numerical_to_polynominal" compatibility="9.0.002" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="581" y="1156">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="YEAR"/>
      </operator>
      <operator activated="true" class="aggregate" compatibility="9.0.002" expanded="true" height="82" name="Aggregate (3)" width="90" x="782" y="1258">
        <list key="aggregation_attributes">
          <parameter key="IDT" value="count"/>
        </list>
        <parameter key="group_by_attributes" value="IDT|YEAR"/>
      </operator>
      <operator activated="true" class="pivot" compatibility="9.0.002" expanded="true" height="82" name="Pivot (3)" width="90" x="1050" y="1156">
        <parameter key="group_attribute" value="IDT"/>
        <parameter key="index_attribute" value="YEAR"/>
        <parameter key="consider_weights" value="false"/>
        <parameter key="skip_constant_attributes" value="false"/>
      </operator>
      <operator activated="true" class="concurrency:join" compatibility="9.0.002" expanded="true" height="82" name="Join (2)" width="90" x="1117" y="1258">
        <list key="key_attributes">
          <parameter key="IDT" value="IDT"/>
        </list>
      </operator>
      <operator activated="true" class="rename_by_replacing" compatibility="9.0.002" expanded="true" height="82" name="Rename by Replacing" width="90" x="1251" y="1207">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="count(IDT)"/>
        <parameter key="regular_expression" value=".*count(VIN)"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="replace_what" value=" count"/>
      </operator>
      <connect from_op="Read Excel (3)" from_port="output" to_op="Append (3)" to_port="example set 1"/>
      <connect from_op="Append (3)" from_port="merged set" to_op="Filter Examples (4)" to_port="example set input"/>
      <connect from_op="Filter Examples (4)" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
      <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Aggregate (3)" to_port="example set input"/>
      <connect from_op="Aggregate (3)" from_port="example set output" to_op="Pivot (3)" to_port="example set input"/>
      <connect from_op="Aggregate (3)" from_port="original" to_op="Join (2)" to_port="right"/>
      <connect from_op="Pivot (3)" from_port="example set output" to_op="Join (2)" to_port="left"/>
      <connect from_op="Join (2)" from_port="join" to_op="Rename by Replacing" to_port="example set input"/>
      <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Tagged:

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Answer ✓
    Hi!

    In this case, you need the regexp in two places.
    attribute filter type: regular_expression
    regular expression: count\(IDT\)_.+ (the .+ is necessary because in this context RapidMiner wants to match the entire string)
    replace what: count\(IDT\)_ (here it is fine without the .+ because we really just match this part of the string)

    Regards,
    Balázs

Answers

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Hi!

    The parentheses in count(IDT)_ have a special meaning in regular expressions. This would be the correct one: count\(IDT\)_

    Regards,
    Balázs
  • miked
    miked New Altair Community Member
    @BalazsBarany Thank you... I understand the escape character to recognize the parentheses but still having 2 issues 
    1 - not identifying the attributes
    2 - regex still not recognizing the count\(IDT\)_ pattern to replace with nothing. 
    Any other ideas would be helpful
  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Answer ✓
    Hi!

    In this case, you need the regexp in two places.
    attribute filter type: regular_expression
    regular expression: count\(IDT\)_.+ (the .+ is necessary because in this context RapidMiner wants to match the entire string)
    replace what: count\(IDT\)_ (here it is fine without the .+ because we really just match this part of the string)

    Regards,
    Balázs
  • miked
    miked New Altair Community Member
    @BalazsBarany - Awesome thank you! That worked. Does the plus sign in this case just handle the way the "year' is handled in the attribute name? Thanks for the help! 
  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Hi!

    Yes, RapidMiner requires matching the full string for the attribute selection, not just a part of the string. The .+ after count\(IDT\)_ matches the variable part, in your case the year. 

    Regards,
    Balázs

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.