Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"String/Text value modification"

jaakko1jaakko1 Member Posts: 8 Contributor II
edited June 2019 in Help
Hi everyone,

I'm struggling to find a solution to this simple task. My data consists of columns with text in them, eg.
col1
example text
another - example
third, example

Question is, how can I get rid of the spaces, hyphens, commas, etc, so the data would look something like this:
col1
example_text
another_example
third_example.

Any help would be much appreciated!

-J

Answers

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering
    Hi,

    you can try the following process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.007-SNAPSHOT" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="6.0.007-SNAPSHOT" expanded="true" height="60" name="Retrieve toDo" width="90" x="45" y="30">
            <parameter key="repository_entry" value="originalData"/>
          </operator>
          <operator activated="true" class="trim" compatibility="6.0.007-SNAPSHOT" expanded="true" height="76" name="Trim" width="90" x="178" y="30"/>
          <operator activated="true" class="replace" compatibility="6.0.007-SNAPSHOT" expanded="true" height="76" name="Replace comma" width="90" x="313" y="30">
            <parameter key="attribute" value="nom"/>
            <parameter key="replace_what" value=","/>
            <parameter key="replace_by" value=" "/>
          </operator>
          <operator activated="true" class="replace" compatibility="6.0.007-SNAPSHOT" expanded="true" height="76" name="Replace hyphen" width="90" x="447" y="30">
            <parameter key="replace_what" value="-"/>
            <parameter key="replace_by" value=" "/>
          </operator>
          <operator activated="true" class="replace" compatibility="6.0.007-SNAPSHOT" expanded="true" height="76" name="Replace whitespace" width="90" x="580" y="30">
            <parameter key="attribute" value="nom"/>
            <parameter key="replace_what" value=" +"/>
            <parameter key="replace_by" value="_"/>
          </operator>
          <connect from_op="Retrieve toDo" from_port="output" to_op="Trim" to_port="example set input"/>
          <connect from_op="Trim" from_port="example set output" to_op="Replace comma" to_port="example set input"/>
          <connect from_op="Replace comma" from_port="example set output" to_op="Replace hyphen" to_port="example set input"/>
          <connect from_op="Replace hyphen" from_port="example set output" to_op="Replace whitespace" to_port="example set input"/>
          <connect from_op="Replace whitespace" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Basically the idea is to first trim (i.e. remove leading and trailing whitespaces), then replace everything that is undesired and not a whitespace with a whitespace. As the last step, we replace all remaining whitespaces of length >= 1 with an underscore.

    Regards,
    Marco
  • jaakko1jaakko1 Member Posts: 8 Contributor II
    Marco,

    thank you so much! This solved the problem.

    -J
Sign In or Register to comment.