Data Cleansing Tips: How to Rename Attributes to Lower Case

mschmitzmschmitz Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 1,872  RM Data Scientist
edited November 2018 in Knowledge Base
320px-Cyrillic_letter_A_-_uppercase_and_lowercase.svg.png

The Rename and Rename by Replacing Operators are powerful tools if you want to rename your attributes. At some point, however, not even these tools are enough. One example for this is to transform all attributes into lower case characters. You need this for some databases or hdfs. The solution for this is a very short groovy script which loops over all attributes and replaces them with the lower case version.

 

ExampleSet inputData = input[0];
for(Attribute a : inputData.getAttributes()){
a.setName(a.getName().toLowerCase())
}


return inputData;

 

If you are working on hdfs you might also want to replace white spaces with under scores. This can be done by adding a small .replace to the script.

 

ExampleSet inputData = input[0];
for(Attribute a : inputData.getAttributes()){
a.setName(a.getName().toLowerCase().replace(" ","_"))
}


return inputData;

Attached is also a process demonstrating this on the Titanic data set.

- Head of Data Science Services at RapidMiner -
Dortmund, Germany
robin

Comments

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 916   Unicorn

    @mschmitz thanks for that clever workaround!  But it seems like this would be some nice built-in functionality to add to RapidMiner.  Always a pity to have to resort to groovy scripts for simple data ETL tasks like this one.  Maybe a feature request for the future?  Sounds like a  mashup between the "transform cases" and the "rename" operators :smileyhappy:

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • mschmitzmschmitz Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 1,872  RM Data Scientist

    Hi @Telcontar120,

    most likely the real functionallity would be an expression editor similar to Generate Attributes but for Attribute Names. That's not a trivial operator like this script.

     

    @sgenzer, thoughts?

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member Posts: 1,968  Community Manager

    I agree with @Telcontar120 - seems like an Operator Toolbox operator to me :)


    Scott

     

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 563   Unicorn

    I agree, but not just a toolbox operator.  I'd love to be able to do this with RegEx. 

    https://www.regular-expressions.info/replacecase.html

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Deals" width="90" x="45" y="136">
    <parameter key="repository_entry" value="//Samples/data/Deals"/>
    </operator>
    <operator activated="true" class="rename_by_replacing" compatibility="7.6.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="246" y="136">
    <parameter key="replace_what" value="(\w)"/>
    <parameter key="replace_by" value="\L$1"/>
    <description align="center" color="transparent" colored="false" width="126">Replace with uppercase</description>
    </operator>
    <connect from_op="Retrieve Deals" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
    <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <description align="left" color="yellow" colored="false" height="92" resized="true" width="419" x="69" y="294">It would be great to be able to do this!&lt;br&gt;&lt;br&gt;https://www.regular-expressions.info/replacecase.html</description>;
    </process>
    </operator>
    </process>
Sign In or Register to comment.