Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Generate new attribute using nominal regex

In777In777 Member Posts: 29 Contributor II
edited February 2020 in Help

I work with Excel-file, which contains several sentences. I would like to generate new attribute (I use "Generate Attribute" operator), which returns (“true or false”) if the sentence contains the chain of numbers with white spaces between them (e.g. 234 45 56 or 2.3 34 56 5.6 or 2,345 56,67 34 2013 or 23% 34% 56%..). So a chain can contain different numbers, e.g. with percent or decimal points) and can be of different length. I have used the function “match nominal regex” (match(sentences,"\d+\s+\d) to do this. However, I faced the problem that Rapidminer does not recognize the escape (\) character. How do I change my Regex to make it work? Besides I would appreciate some help on my regex, I am not sure how to capture the decimal values.

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    I think what you are saying is that you want to generate any number of new attributes based on splitting your data by whitespace. 

    For this use the Split operator.  Not sure your meaning when you say that escape characters such as \s or \W are not recognised.  Try this example below.  I went for the simple RegEx of ([\s|\W]+) as it does what you describe you need without much fuss.

     

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.001">
    <operator activated="true" class="generate_data_user_specification" compatibility="7.1.001" expanded="true" height="68" name="whitespace example1" width="90" x="112" y="136">
    <list key="attribute_values">
    <parameter key="series" value="&quot;2.3 34 56 5.6&quot;"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    </process>
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.001">
    <operator activated="true" class="generate_data_user_specification" compatibility="7.1.001" expanded="true" height="68" name="whitespace example2" width="90" x="112" y="238">
    <list key="attribute_values">
    <parameter key="series" value="&quot;23% 34% 56%&quot;"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    </process>
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.001">
    <operator activated="true" class="append" compatibility="7.1.001" expanded="true" height="103" name="Append" width="90" x="246" y="187">
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="merge_type" value="all"/>
    </operator>
    </process>
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.001">
    <operator activated="true" class="split" compatibility="7.1.001" expanded="true" height="82" name="Split" width="90" x="447" y="187">
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="nominal"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="file_path"/>
    <parameter key="block_type" value="single_value"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="single_value"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="split_pattern" value="([\s|\W]+)"/>
    <parameter key="split_mode" value="ordered_split"/>
    </operator>
    </process>
Sign In or Register to comment.