The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

"String/Text value modification"

jaakko1jaakko1 Member Posts: 8 Contributor II
edited June 2019 in Help
Hi everyone,

I'm struggling to find a solution to this simple task. My data consists of columns with text in them, eg.
col1
example text
another - example
third, example

Question is, how can I get rid of the spaces, hyphens, commas, etc, so the data would look something like this:
col1
example_text
another_example
third_example.

Any help would be much appreciated!

-J

Answers

  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering
    Hi,

    you can try the following process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.007-SNAPSHOT" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="6.0.007-SNAPSHOT" expanded="true" height="60" name="Retrieve toDo" width="90" x="45" y="30">
            <parameter key="repository_entry" value="originalData"/>
          </operator>
          <operator activated="true" class="trim" compatibility="6.0.007-SNAPSHOT" expanded="true" height="76" name="Trim" width="90" x="178" y="30"/>
          <operator activated="true" class="replace" compatibility="6.0.007-SNAPSHOT" expanded="true" height="76" name="Replace comma" width="90" x="313" y="30">
            <parameter key="attribute" value="nom"/>
            <parameter key="replace_what" value=","/>
            <parameter key="replace_by" value=" "/>
          </operator>
          <operator activated="true" class="replace" compatibility="6.0.007-SNAPSHOT" expanded="true" height="76" name="Replace hyphen" width="90" x="447" y="30">
            <parameter key="replace_what" value="-"/>
            <parameter key="replace_by" value=" "/>
          </operator>
          <operator activated="true" class="replace" compatibility="6.0.007-SNAPSHOT" expanded="true" height="76" name="Replace whitespace" width="90" x="580" y="30">
            <parameter key="attribute" value="nom"/>
            <parameter key="replace_what" value=" +"/>
            <parameter key="replace_by" value="_"/>
          </operator>
          <connect from_op="Retrieve toDo" from_port="output" to_op="Trim" to_port="example set input"/>
          <connect from_op="Trim" from_port="example set output" to_op="Replace comma" to_port="example set input"/>
          <connect from_op="Replace comma" from_port="example set output" to_op="Replace hyphen" to_port="example set input"/>
          <connect from_op="Replace hyphen" from_port="example set output" to_op="Replace whitespace" to_port="example set input"/>
          <connect from_op="Replace whitespace" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Basically the idea is to first trim (i.e. remove leading and trailing whitespaces), then replace everything that is undesired and not a whitespace with a whitespace. As the last step, we replace all remaining whitespaces of length >= 1 with an underscore.

    Regards,
    Marco
  • Options
    jaakko1jaakko1 Member Posts: 8 Contributor II
    Marco,

    thank you so much! This solved the problem.

    -J
Sign In or Register to comment.