replace hyphen

pb42pb42 Member Posts: 16 Contributor II
I am trying to replace a hyphen from a Grade attribute by using the Replace operator. I would like to replace it with text that describes no value has been entered (i.e., Not indicated). The problem is that the attribute includes values such as - (the hyphen I want to replace), A-, B-, C-. Using the replace operator replaces all of the hyphens (including those being used as minuses). I tried using the regular expression, \b[-]\b, but that is not working. I also tried, \b["-"]\b without success.

Best Answer

Answers

  • [Deleted User][Deleted User] Posts: 0 Learner III
    @pb42

    Hello

    This is very similar with your question ;) Take a look on that please :)

    https://community.rapidminer.com/discussion/comment/63840#Comment_63840

    I hope this helps
    mbs
  • pb42pb42 Member Posts: 16 Contributor II
    Thank you for the direction. I did read this question, but the solution did not make sense to me.
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @pb42

    Can you provide some sample data?
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • pb42pb42 Member Posts: 16 Contributor II
    This is the file
  • sgnarkhede2016sgnarkhede2016 Member Posts: 152 Contributor II
    but in replace operator i need to pass "regex" it not working for me 
    e.g
    Sachin N
    Jonn Clara

    I have passed "replace what"  \^(\w+ \w+)
                             "replace by"   \("\w+ \w+")

    I want above string as "Sachin N" and "John Clara"
  • Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 299 RM Data Scientist
    If I understood you correctly you want to have the entries in the Attributes completed by leading and trailing double quotes. Value => "Value"
    In this case you replace:
    ^(.+)$
    by
    "$1"
    Happy Mining,
    Edin

    P.S.:
    The Operator Generate Attributes could have also been used. The expression would have been:
    "\"" + AttributeName + "\""
    where AttributeName would be the name of the Attribute which values you want to change.
  • sara20sara20 Member Posts: 110 Unicorn
    edited May 2020
    @Edin_Klapic

    Hello

    I work on a data for a store and I want to analyze the basket of customers, for the name of  columns I have alot of symbols and RM is not able to understand them also I can not replace all of them because they are in different types. Could you please tell me how can I solve it?

    Also I think it can be useful if RM team can solve this problem for the next version of RM( Future request)

    Thank you in  advance
    sara
  • Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 299 RM Data Scientist
    Hi @sara20 ,

    Although your problem is somewhat similar to the abovementioned "hyphen"-issue it affects Names of Attributes and not Attribute values.
    Thus, I suggest for the future that you rather open a new thread in case the answers in a thread don't provide the help you need. That also makes it easier to find for users which might have a similar problem in the future.

    You can use "Rename by Replacing" to replace certain patterns represented by Regular Expressions. But only 1 at a time.
    So, unfortunately, the solution to your problem is not yet (as of version 9.6) a single Operator solution. Please find attached a quick solution using "Rename by Replacing" in loops together with some self created dictionary with which you are hopefully able to achieve your desired goal.

    Happy Mining,
    Edin

    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.5.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.5.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="179" y="34">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="concurrency:loop_attributes" compatibility="9.5.001" expanded="true" height="82" name="Loop Attributes" width="90" x="313" y="34">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="attribute_name_macro" value="loop_attribute"/>
            <parameter key="reuse_results" value="true"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="utility:create_exampleset" compatibility="9.5.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85">
                <parameter key="generator_type" value="comma separated text"/>
                <parameter key="number_of_examples" value="100"/>
                <parameter key="use_stepsize" value="false"/>
                <list key="function_descriptions"/>
                <parameter key="add_id_attribute" value="false"/>
                <list key="numeric_series_configuration"/>
                <list key="date_series_configuration"/>
                <list key="date_series_configuration (interval)"/>
                <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
                <parameter key="time_zone" value="SYSTEM"/>
                <parameter key="input_csv_text" value="old,new&#10;o,-&#10;i,%"/>
                <parameter key="column_separator" value=","/>
                <parameter key="parse_all_as_nominal" value="true"/>
                <parameter key="decimal_point_character" value="."/>
                <parameter key="trim_attribute_names" value="true"/>
              </operator>
              <operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (4)" width="90" x="246" y="85">
                <parameter key="macro" value="number_of_examples"/>
                <parameter key="macro_type" value="number_of_examples"/>
                <parameter key="statistics" value="average"/>
                <parameter key="attribute_name" value=""/>
                <list key="additional_macros"/>
              </operator>
              <operator activated="true" class="concurrency:loop" compatibility="9.5.001" expanded="true" height="103" name="Loop (2)" width="90" x="380" y="187">
                <parameter key="number_of_iterations" value="%{number_of_examples}"/>
                <parameter key="iteration_macro" value="iteration"/>
                <parameter key="reuse_results" value="true"/>
                <parameter key="enable_parallel_execution" value="false"/>
                <process expanded="true">
                  <operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (5)" width="90" x="112" y="34">
                    <parameter key="macro" value="old_character"/>
                    <parameter key="macro_type" value="data_value"/>
                    <parameter key="statistics" value="average"/>
                    <parameter key="attribute_name" value="old"/>
                    <parameter key="example_index" value="%{iteration}"/>
                    <list key="additional_macros"/>
                  </operator>
                  <operator activated="true" class="extract_macro" compatibility="9.5.001" expanded="true" height="68" name="Extract Macro (6)" width="90" x="246" y="34">
                    <parameter key="macro" value="new_character"/>
                    <parameter key="macro_type" value="data_value"/>
                    <parameter key="statistics" value="average"/>
                    <parameter key="attribute_name" value="new"/>
                    <parameter key="example_index" value="%{iteration}"/>
                    <list key="additional_macros"/>
                  </operator>
                  <operator activated="true" class="delay" compatibility="9.5.001" expanded="true" height="103" name="only to ensure execution order (2)" width="90" x="447" y="85">
                    <parameter key="delay" value="none"/>
                    <parameter key="delay_amount" value="1000"/>
                    <parameter key="min_delay_amount" value="0"/>
                    <parameter key="max_delay_amount" value="1000"/>
                  </operator>
                  <operator activated="true" class="rename_by_replacing" compatibility="9.5.001" expanded="true" height="82" name="Rename by Replacing (2)" width="90" x="581" y="136">
                    <parameter key="attribute_filter_type" value="all"/>
                    <parameter key="attribute" value=""/>
                    <parameter key="attributes" value=""/>
                    <parameter key="use_except_expression" value="false"/>
                    <parameter key="value_type" value="attribute_value"/>
                    <parameter key="use_value_type_exception" value="false"/>
                    <parameter key="except_value_type" value="time"/>
                    <parameter key="block_type" value="attribute_block"/>
                    <parameter key="use_block_type_exception" value="false"/>
                    <parameter key="except_block_type" value="value_matrix_row_start"/>
                    <parameter key="invert_selection" value="false"/>
                    <parameter key="include_special_attributes" value="false"/>
                    <parameter key="replace_what" value="%{old_character}"/>
                    <parameter key="replace_by" value="%{new_character}"/>
                  </operator>
                  <connect from_port="input 1" to_op="Extract Macro (5)" to_port="example set"/>
                  <connect from_port="input 2" to_op="only to ensure execution order (2)" to_port="through 2"/>
                  <connect from_op="Extract Macro (5)" from_port="example set" to_op="Extract Macro (6)" to_port="example set"/>
                  <connect from_op="Extract Macro (6)" from_port="example set" to_op="only to ensure execution order (2)" to_port="through 1"/>
                  <connect from_op="only to ensure execution order (2)" from_port="through 1" to_port="output 1"/>
                  <connect from_op="only to ensure execution order (2)" from_port="through 2" to_op="Rename by Replacing (2)" to_port="example set input"/>
                  <connect from_op="Rename by Replacing (2)" from_port="example set output" to_port="output 2"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="source_input 3" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                  <portSpacing port="sink_output 3" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Loop (2)" to_port="input 2"/>
              <connect from_op="Create ExampleSet" from_port="output" to_op="Extract Macro (4)" to_port="example set"/>
              <connect from_op="Extract Macro (4)" from_port="example set" to_op="Loop (2)" to_port="input 1"/>
              <connect from_op="Loop (2)" from_port="output 2" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="147"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Golf" from_port="output" to_op="Loop Attributes" to_port="input 1"/>
          <connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>



  • sara20sara20 Member Posts: 110 Unicorn
    Thank you very much

Sign In or Register to comment.