Data Cleansing Tips: How to Rename Attributes to Lower Case

MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
edited November 2018 in Knowledge Base
320px-Cyrillic_letter_A_-_uppercase_and_lowercase.svg.png

The Rename and Rename by Replacing Operators are powerful tools if you want to rename your attributes. At some point, however, not even these tools are enough. One example for this is to transform all attributes into lower case characters. You need this for some databases or hdfs. The solution for this is a very short groovy script which loops over all attributes and replaces them with the lower case version.

 

ExampleSet inputData = input[0];
for(Attribute a : inputData.getAttributes()){
a.setName(a.getName().toLowerCase())
}


return inputData;

 

If you are working on hdfs you might also want to replace white spaces with under scores. This can be done by adding a small .replace to the script.

 

ExampleSet inputData = input[0];
for(Attribute a : inputData.getAttributes()){
a.setName(a.getName().toLowerCase().replace(" ","_"))
}


return inputData;

Attached is also a process demonstrating this on the Titanic data set.

- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany

Comments

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    @mschmitz thanks for that clever workaround!  But it seems like this would be some nice built-in functionality to add to RapidMiner.  Always a pity to have to resort to groovy scripts for simple data ETL tasks like this one.  Maybe a feature request for the future?  Sounds like a  mashup between the "transform cases" and the "rename" operators :smileyhappy:

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi @Telcontar120,

    most likely the real functionallity would be an expression editor similar to Generate Attributes but for Attribute Names. That's not a trivial operator like this script.

     

    @sgenzer, thoughts?

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    I agree with @Telcontar120 - seems like an Operator Toolbox operator to me :)


    Scott

     

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    I agree, but not just a toolbox operator.  I'd love to be able to do this with RegEx. 

    https://www.regular-expressions.info/replacecase.html

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Deals" width="90" x="45" y="136">
    <parameter key="repository_entry" value="//Samples/data/Deals"/>
    </operator>
    <operator activated="true" class="rename_by_replacing" compatibility="7.6.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="246" y="136">
    <parameter key="replace_what" value="(\w)"/>
    <parameter key="replace_by" value="\L$1"/>
    <description align="center" color="transparent" colored="false" width="126">Replace with uppercase</description>
    </operator>
    <connect from_op="Retrieve Deals" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
    <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <description align="left" color="yellow" colored="false" height="92" resized="true" width="419" x="69" y="294">It would be great to be able to do this!&lt;br&gt;&lt;br&gt;https://www.regular-expressions.info/replacecase.html</description&gt;
    </process>
    </operator>
    </process>
Sign In or Register to comment.