Options

inconsistent behaviour when using replaceAll

kaymankayman Member Posts: 662 Unicorn
edited December 2018 in Product Feedback - Resolved

When using the replaceAll operator it seems some functions are ignored while other seem to work fine.

 

As an example :

 

replaceAll([myField],"^(.)",upper("$1")) just returns the same, whereas the expected behaviour would be to get the first character being returned in upper case. There is no error thrown, the upper command is just ignored

 

replaceAll([myField],"^(.)",concat("-","$1","-")) nicely returns a concatenated field, as expected.

 

Any idea why?

Tagged:
0
0 votes

Declined · Last Updated

26 Jul 2019 - no votes or activity since December 2018. Changing status to "Declined". If you want to reopen this idea for voting, please comment and cc @sgenzer. PROD-699

Comments

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Hello @kayman - I'm sorry this has sat here for so long. Can you please help with a sample XML and dataset so I can reproduce it?


    Thanks.


    Scott

     

  • Options
    kaymankayman Member Posts: 662 Unicorn

    Hi @sgenzer,

     

    Thanks for your attention to this.

    Below an example. nasically you will notice that the regex results are stored and used, just not in combination with all of the functions. Hope it helps, easier to see then to explain...

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="8.1.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="179" y="85">
    <list key="attribute_values">
    <parameter key="sample" value="&quot;just some text&quot;"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.1.000" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="85">
    <list key="function_descriptions">
    <parameter key="result1" value="replaceAll([sample],&quot;^(.)(.*)$&quot;,concat(upper(&quot;$1&quot;),&quot;$2&quot;))"/>
    <parameter key="result2" value="replaceAll([sample],&quot;^(.)(.*)$&quot;,concat(&quot;-&quot;,&quot;$1&quot;,&quot;-&quot;,&quot;$2&quot;))"/>
    <parameter key="WantedResult" value="replaceAll([sample],&quot;^(.)(.*)$&quot;,concat(upper(&quot;j&quot;),&quot;$2&quot;))"/>
    </list>
    </operator>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi @kayman - ok thanks for the sample. So that is really interesting the way you're using the replaceAll function from within Generate Attributes. I have never seen RegEx in the third input of this function; the instructions ask for a "nominal replacement", not a "nominal RegEx" like it does for the second term, and so I would have never thought to put RegEx there:

     

    Screen Shot 2018-03-08 at 5.11.02 PM.png

     

    I'm going to push this around internally and see what people think. My feeling is that you have discovered an undocumented, rather cool, functionality in replaceAll that could possibly be made as a documented feature of replaceAll.

     

    Scott

     

  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hey @kayman!

     

    Let's think about how this is supposed to work. (Without having access to the source code.)

     

    replaceAll is a function with three parameters: [myField], regular expression to search, replacement text.

     

    When the function is called, the parameters are evaluated before being sent into it. 

     

    So if you use upper("$1"), this evaluates to "$1" (and that is the function parameter). If you use concat("-", "$1", "-"), that will evaluate to "-$1-". This is a correct regexp replacement string, so the $1 will be replaced by the string found by your regexp. 

     

    replaceAll can't magically apply arbitrary functions inside the replacement. It takes a replacement string; instead of manipulating that, just manipulate the result of replaceAll.

     

    Your upper("$1") could be also outside of replaceAll: upper(replaceAll([myField],"^(.)","$1"))

     

    But this could be done easier: upper(prefix([myField], 1))

     

    Regards,

    Balázs

     

  • Options
    kaymankayman Member Posts: 662 Unicorn

    Yeah, guess that's the advantage of not knowing that something isn't supposed to work and just try stuff :-)

     

    @BalazsBarany, for the given example the prefix option would work indeed, but my use case was a bit more complex so I would end up with rather long and nasty code chains. hence the reason why I wanted to try the replaceAll option as it was easier to catch my phrase using regex than the static way.

     

    The point is that my function parameter seems to be accepted as a string sometimes (like when using concat), and is ignored other times (like with the upper function), so it would be really cool if a regex nominal could be used as a regular nominal all over the place. It doesn't throw an error so at least it seems to be accepted, and then ignored. 

     

    In the end, if I take the result of my regex search, that becomes a nominal. And if that given nominal would be treated like any other nominal it becomes a pretty powerfull option. 

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist

    Hi,

     

    for reference, the source code is public: https://github.com/rapidminer/rapidminer-studio/tree/master/src/main/java/com/rapidminer/tools/expression/internal/function/text

    And it's also realtivly easy to add new functions.

     

    Best,

    Martin

     

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hey @kayman,

     

    my point is that the function arguments are simple, non-magic strings. I'm sure they are like this in any programming language (maybe aside from Perl, too much magic going on there).

     

    In any function call, the function arguments are evaluated before the value is passed to the function. This is how our programming languages work. So replaceAll sees "$1" from the upper() and "-$1-" from the concat().

    This fully explains why concat() works in your example but upper() doesnt.

     

    There is no way to specify that the regexp replacement should apply an arbitrary function to the replacement string inside of replaceAll. 

    There are languages like Perl (and libraries like PCRE) that support the "\U$1" syntax in the replacement to apply simple transformations like uppercase to the replacement string. But Java doesn't support this, therefore RapidMiner doesn't, too.

     

    Regards,

    Balázs

  • Options
    kaymankayman Member Posts: 662 Unicorn

    Fair enough @BalazsBarany, I just like a little magic time by time ;-)

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    ok I think we're all on the same page here :) I have pushed this to the documentation folks and I will let them chew on this. Thanks @kayman for always showing me something new!

Sign In or Register to comment.