RapidMiner

Trim Not Working

Contributor I edinsda2
Contributor I

Trim Not Working

Hi All,

 

I am trying to use the trim operator to remove a space at the start of my attribute values

 

But it doesn't seem to be working, I am using v7.5

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.5.001" expanded="true" height="68" name="Retrieve CountryAndGPName" width="90" x="246" y="85">
<parameter key="repository_entry" value="../Data/CountryAndGPName"/>
</operator>
<operator activated="true" breakpoints="before,after" class="trim" compatibility="7.5.001" expanded="true" height="82" name="Trim" width="90" x="447" y="85">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Country"/>
</operator>
<connect from_op="Retrieve CountryAndGPName" from_port="output" to_op="Trim" to_port="example set input"/>
<connect from_op="Trim" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

 

 

3 REPLIES
RM Certified Expert
RM Certified Expert

Re: Trim Not Working

I'm not on my regular machine so I can't import the XML but one note of caution. Trim only works with a polynominal data type. If you have spaces with numbers, then I'd suggest converting them to polynominals, then applying Trim, and then converting back to numericals.

 

Highlighted
Maven
Maven

Re: Trim Not Working

Hi,

 

it looks like the whitespaces in front of your data points are not real whitespaces. When importing it with UTF-8, I get this weird symbol, indicating that there is some kind of character that is not recognizable. Unless you know exactly what this character is, I think the simplest way would be to use the "Replace" operator with some Regex function. See, if the one below works for you:

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" breakpoints="before" class="replace" compatibility="7.6.001" expanded="true" height="82" name="Replace" width="90" x="313" y="136">
        <parameter key="replace_what" value="[^\u0000-\u007F]+"/>
      </operator>
      <connect from_op="Replace" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Community Manager Community Manager
Community Manager

Re: Trim Not Working

haha @FBT I was working on the same thing at the same time.  It's a &nbsp character (unicode %C2%A0).  Trim will not take care of this but this will do the trick.

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve CountryAndGPName (2)" width="90" x="246" y="85">
        <parameter key="repository_entry" value="//Google Drive/RapidMiner/CountryAndGPName"/>
      </operator>
      <operator activated="true" class="web:encode_urls" compatibility="7.3.000" expanded="true" height="82" name="Encode URLs" width="90" x="380" y="85">
        <parameter key="url_attribute" value="Country"/>
      </operator>
      <operator activated="true" class="replace" compatibility="7.6.001" expanded="true" height="82" name="Replace" width="90" x="514" y="85">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Country"/>
        <parameter key="replace_what" value="%C2%A0"/>
      </operator>
      <connect from_op="Retrieve CountryAndGPName (2)" from_port="output" to_op="Encode URLs" to_port="example set input"/>
      <connect from_op="Encode URLs" from_port="example set output" to_op="Replace" to_port="example set input"/>
      <connect from_op="Replace" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.