RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

replace hyphen

pb42pb42 Member Posts: 12 Contributor II
I am trying to replace a hyphen from a Grade attribute by using the Replace operator. I would like to replace it with text that describes no value has been entered (i.e., Not indicated). The problem is that the attribute includes values such as - (the hyphen I want to replace), A-, B-, C-. Using the replace operator replaces all of the hyphens (including those being used as minuses). I tried using the regular expression, \b[-]\b, but that is not working. I also tried, \b["-"]\b without success.
Jasmine_

Best Answer

Answers

  • [Deleted User][Deleted User] Posts: 0 Learner III
    @pb42

    Hello

    This is very similar with your question ;) Take a look on that please :)

    https://community.rapidminer.com/discussion/comment/63840#Comment_63840

    I hope this helps
    mbs
    Jasmine_
  • pb42pb42 Member Posts: 12 Contributor II
    Thank you for the direction. I did read this question, but the solution did not make sense to me.
    Jasmine_
  • varunm1varunm1 Moderator, Member Posts: 1,185   Unicorn
    Hello @pb42

    Can you provide some sample data?
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    Jasmine_
  • pb42pb42 Member Posts: 12 Contributor II
    This is the file
    Jasmine_
  • sgnarkhede2016sgnarkhede2016 Member Posts: 65 Contributor II
    but in replace operator i need to pass "regex" it not working for me 
    e.g
    Sachin N
    Jonn Clara

    I have passed "replace what"  \^(\w+ \w+)
                             "replace by"   \("\w+ \w+")

    I want above string as "Sachin N" and "John Clara"
    Jasmine_
  • Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 283  RM Data Scientist
    If I understood you correctly you want to have the entries in the Attributes completed by leading and trailing double quotes. Value => "Value"
    In this case you replace:
    ^(.+)$
    by
    "$1"
    Happy Mining,
    Edin

    P.S.:
    The Operator Generate Attributes could have also been used. The expression would have been:
    "\"" + AttributeName + "\""
    where AttributeName would be the name of the Attribute which values you want to change.
    Jasmine_[Deleted User]sgenzer
  • sara20sara20 Member Posts: 20 Contributor II
    edited May 29
    @Edin_Klapic

    Hello

    I work on a data for a store and I want to analyze the basket of customers, for the name of  columns I have alot of symbols and RM is not able to understand them also I can not replace all of them because they are in different types. Could you please tell me how can I solve it?

    Also I think it can be useful if RM team can solve this problem for the next version of RM( Future request)

    Thank you in  advance
    sara
  • Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 283  RM Data Scientist
    Hi @sara20 ,

    Although your problem is somewhat similar to the abovementioned "hyphen"-issue it affects Names of Attributes and not Attribute values.
    Thus, I suggest for the future that you rather open a new thread in case the answers in a thread don't provide the help you need. That also makes it easier to find for users which might have a similar problem in the future.

    You can use "Rename by Replacing" to replace certain patterns represented by Regular Expressions. But only 1 at a time.
    So, unfortunately, the solution to your problem is not yet (as of version 9.6) a single Operator solution. Please find attached a quick solution using "Rename by Replacing" in loops together with some self created dictionary with which you are hopefully able to achieve your desired goal.

    Happy Mining,
    Edin

    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.5.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.5.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="179" y="34">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="concurrency:loop_attributes" compatibility="9.5.001" expanded="true" height="82" name="Loop Attributes" width="90" x="313" y="34">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="attribute_name_macro" value="loop_attribute"/>
            <parameter key="reuse_results" value="true"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="utility:create_exampleset" compatibility="9.5.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85">
                <parameter key="generator_type" value="comma separated text"/>
                <parameter key="number_of_examples" value="100"/>
                <parameter key="use_stepsize" value="false"/>
                <list key="function_descriptions"/>
                <parameter key="add_id_attribute" value="false"/>
                <list key="numeric_series_configuration"/>
                <list key="date_series_configuration"/>
                <list key="date_series_configuration (interval)"/>
                <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
                <parameter key="time_zone" value="SYSTEM"/>
                <parameter key="input_csv_text" value="old,new&#10;o,-&#10;i,%"/>
                <parameter key="column_separator" value=","/>
                <parameter key="parse_all_as_nominal" value="true"/>
                <parameter key="decimal_point_character" value="."/>
                <parameter key="trim_attribute_names" value="true"/>
              </operator>
              <operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (4)" width="90" x="246" y="85">
                <parameter key="macro" value="number_of_examples"/>
                <parameter key="macro_type" value="number_of_examples"/>
                <parameter key="statistics" value="average"/>
                <parameter key="attribute_name" value=""/>
                <list key="additional_macros"/>
              </operator>
              <operator activated="true" class="concurrency:loop" compatibility="9.5.001" expanded="true" height="103" name="Loop (2)" width="90" x="380" y="187">
                <parameter key="number_of_iterations" value="%{number_of_examples}"/>
                <parameter key="iteration_macro" value="iteration"/>
                <parameter key="reuse_results" value="true"/>
                <parameter key="enable_parallel_execution" value="false"/>
                <process expanded="true">
                  <operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (5)" width="90" x="112" y="34">
                    <parameter key="macro" value="old_character"/>
                    <parameter key="macro_type" value="data_value"/>
                    <parameter key="statistics" value="average"/>
                    <parameter key="attribute_name" value="old"/>
                    <parameter key="example_index" value="%{iteration}"/>
                    <list key="additional_macros"/>
                  </operator>
                  <operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (6)" width="90" x="246" y="34">
                    <parameter key="macro" value="new_character"/>
                    <parameter key="macro_type" value="data_value"/>
                    <parameter key="statistics" value="average"/>
                    <parameter key="attribute_name" value="new"/>
                    <parameter key="example_index" value="%{iteration}"/>
                    <list key="additional_macros"/>
                  </operator>
                  <operator activated="true" class="delay" compatibility="9.5.001" expanded="true" height="103" name="only to ensure execution order (2)" width="90" x="447" y="85">
                    <parameter key="delay" value="none"/>
                    <parameter key="delay_amount" value="1000"/>
                    <parameter key="min_delay_amount" value="0"/>
                    <parameter key="max_delay_amount" value="1000"/>
                  </operator>
                  <operator activated="true" class="rename_by_replacing" compatibility="9.5.001" expanded="true" height="82" name="Rename by Replacing (2)" width="90" x="581" y="136">
                    <parameter key="attribute_filter_type" value="all"/>
                    <parameter key="attribute" value=""/>
                    <parameter key="attributes" value=""/>
                    <parameter key="use_except_expression" value="false"/>
                    <parameter key="value_type" value="attribute_value"/>
                    <parameter key="use_value_type_exception" value="false"/>
                    <parameter key="except_value_type" value="time"/>
                    <parameter key="block_type" value="attribute_block"/>
                    <parameter key="use_block_type_exception" value="false"/>
                    <parameter key="except_block_type" value="value_matrix_row_start"/>
                    <parameter key="invert_selection" value="false"/>
                    <parameter key="include_special_attributes" value="false"/>
                    <parameter key="replace_what" value="%{old_character}"/>
                    <parameter key="replace_by" value="%{new_character}"/>
                  </operator>
                  <connect from_port="input 1" to_op="Extract Macro (5)" to_port="example set"/>
                  <connect from_port="input 2" to_op="only to ensure execution order (2)" to_port="through 2"/>
                  <connect from_op="Extract Macro (5)" from_port="example set" to_op="Extract Macro (6)" to_port="example set"/>
                  <connect from_op="Extract Macro (6)" from_port="example set" to_op="only to ensure execution order (2)" to_port="through 1"/>
                  <connect from_op="only to ensure execution order (2)" from_port="through 1" to_port="output 1"/>
                  <connect from_op="only to ensure execution order (2)" from_port="through 2" to_op="Rename by Replacing (2)" to_port="example set input"/>
                  <connect from_op="Rename by Replacing (2)" from_port="example set output" to_port="output 2"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="source_input 3" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                  <portSpacing port="sink_output 3" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Loop (2)" to_port="input 2"/>
              <connect from_op="Create ExampleSet" from_port="output" to_op="Extract Macro (4)" to_port="example set"/>
              <connect from_op="Extract Macro (4)" from_port="example set" to_op="Loop (2)" to_port="input 1"/>
              <connect from_op="Loop (2)" from_port="output 2" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="147"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Golf" from_port="output" to_op="Loop Attributes" to_port="input 1"/>
          <connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>



    sara20
  • sara20sara20 Member Posts: 20 Contributor II
    Thank you very much

Sign In or Register to comment.