Options

Error in Rename By Replacing with RegEx

batstache611batstache611 Member Posts: 45 Guru
edited April 2020 in Help

Hello,

I'm getting the following error while trying to use Rename by Replacing with the help of RegEx -

 

"An attribute was already present in the example set which is not allowed because attribute names must be unique."

 

 

What does this mean? My attribute names ARE unique. The attribute names come from a %{file_name} macro from loop files and contain *.xlsx in their names since the files happen to be spreadsheet dictionary files each with it's own unique name. Its that .xlsx that I'm trying to get rid of by using regular expressions because otherwise in my next operator which is Generate Attributes, my IF() statement fails because of the ".x" in the xlsx.

 

Thank you for your help!

Tagged:

Best Answer

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Solution Accepted

    Ah, here we have the problem :smileyhappy:

     

    There are actually two mistakes in your regular expression:

     

    1. First, you need to quote the dot before the "xlsx".  Otherwise the regular expression parser reads it just as an arbitrary symbol (just like your first dot).  You can quote it by putting a backslash in front of it, i.e. use the following:
      .*\.xlsx
    2. This expression now will match file names ending with ".xlsx" and some arbitrary text before.  So that's basically all files and always the complete name, including the extension.  And then you replace the complete name with... nothing.  Which brings me to the second error.  You should use a so-called capture group in your expression and replace the match with the content of this group.  Use the following for the replace parameter:
      (.*)\.xslx
      and the following for the replace-by parameter:
      $1
      This will now replace the complete match (the first string) by only the content of the first capture group (indicated by $1, the second group would be $2 etc.).  You specify your capture groups with round parantheses.

    This should do the trick.  Learn more about capture groups here:

    http://www.javamex.com/tutorials/regular_expressions/capturing_groups.shtml

    http://docs.oracle.com/javase/tutorial/essential/regex/groups.html

     

    Hope this helps,

    Ingo

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hi,

     

    Could it be that your expression is faulty and generates (at least) one more attribute with the same name as the others?  Let's say, for example, that you want to keep the first part but accidentally kept the ".xlsx" parts or something like this?

     

    Can you please post the expressions you use for the renaming?  This might help to find a potential problem...

     

    Cheers,

    Ingo

  • Options
    batstache611batstache611 Member Posts: 45 Guru

    Thank you very much for your reply @IngoRM!! 

     

    The RegEx expression I'm using is ".*.xlsx"  so I'm basically taking everything before the ".xlsx" and replacing it with nothing. I tried replacing it with my own suffix as well but the problem remains.

     

    I've attached a sample process that replicated this problem. Thank you once again.

  • Options
    batstache611batstache611 Member Posts: 45 Guru

    Thank you very much @IngoRM for the solution and for pointing out my mistakes! The solution works precisely as desired :smileyhappy:

Sign In or Register to comment.