Trim Not Working

edinsda2edinsda2 Member Posts: 5 Contributor I
edited December 2018 in Help

Hi All,

 

I am trying to use the trim operator to remove a space at the start of my attribute values

 

But it doesn't seem to be working, I am using v7.5

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.5.001" expanded="true" height="68" name="Retrieve CountryAndGPName" width="90" x="246" y="85">
<parameter key="repository_entry" value="../Data/CountryAndGPName"/>
</operator>
<operator activated="true" breakpoints="before,after" class="trim" compatibility="7.5.001" expanded="true" height="82" name="Trim" width="90" x="447" y="85">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Country"/>
</operator>
<connect from_op="Retrieve CountryAndGPName" from_port="output" to_op="Trim" to_port="example set input"/>
<connect from_op="Trim" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

 

 

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I'm not on my regular machine so I can't import the XML but one note of caution. Trim only works with a polynominal data type. If you have spaces with numbers, then I'd suggest converting them to polynominals, then applying Trim, and then converting back to numericals.

     

  • FBTFBT Member Posts: 106 Unicorn

    Hi,

     

    it looks like the whitespaces in front of your data points are not real whitespaces. When importing it with UTF-8, I get this weird symbol, indicating that there is some kind of character that is not recognizable. Unless you know exactly what this character is, I think the simplest way would be to use the "Replace" operator with some Regex function. See, if the one below works for you:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" breakpoints="before" class="replace" compatibility="7.6.001" expanded="true" height="82" name="Replace" width="90" x="313" y="136">
    <parameter key="replace_what" value="[^\u0000-\u007F]+"/>
    </operator>
    <connect from_op="Replace" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    haha @FBT I was working on the same thing at the same time.  It's a &nbsp character (unicode %C2%A0).  Trim will not take care of this but this will do the trick.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve CountryAndGPName (2)" width="90" x="246" y="85">
    <parameter key="repository_entry" value="//Google Drive/RapidMiner/CountryAndGPName"/>
    </operator>
    <operator activated="true" class="web:encode_urls" compatibility="7.3.000" expanded="true" height="82" name="Encode URLs" width="90" x="380" y="85">
    <parameter key="url_attribute" value="Country"/>
    </operator>
    <operator activated="true" class="replace" compatibility="7.6.001" expanded="true" height="82" name="Replace" width="90" x="514" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Country"/>
    <parameter key="replace_what" value="%C2%A0"/>
    </operator>
    <connect from_op="Retrieve CountryAndGPName (2)" from_port="output" to_op="Encode URLs" to_port="example set input"/>
    <connect from_op="Encode URLs" from_port="example set output" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

Sign In or Register to comment.