Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Reading csv files with % and $ data

vansvans Member Posts: 2 Learner I
I am trying to read a csv file.  The column names and first row of data are below. I need to convert data that contain % or $ to real numbers, not polynomials. How can I do this?

"id";"rank";"name";"employees";"previous";"revenues";"revenuechange";"profits";"profitschange";"assets";"marketvalue"
"2";"1";"Walmart";"2,300,000";"1";"$485,873";"0.8%";"$13,643.0 ";"-7.2%";"$198,825";"$218,619"

I am sure there is an easy way to do this, but I can't find it.  Thanks in advance.

Best Answer

  • jwpfaujwpfau Employee, Member Posts: 303 RM Engineering
    Solution Accepted
    Hi vans,

    delete the Create Document and Write Document operators of Marco Barradas example and choose your local file for the csv file parameter of the Read CSV operator.

    Check the LEARN section at the bottom of this webpage and/or complete the in-product tutorials to get a feeling for the process workflow.

    Greetings,
    Jonas

Answers

  • jwpfaujwpfau Employee, Member Posts: 303 RM Engineering
    Hi vans,

    Just remove the $ or % and convert it into a number you can do this with the the Replace and Nominal to Numerical operator.
    Or you could use the Generate Attributes operator with it's replace and parse functions.

    Greetings,
    Jonas

  • MarcoBarradasMarcoBarradas Administrator, Employee, RapidMiner Certified Analyst, Member Posts: 272 Unicorn
    Hi @vans

    Here is a little example on how you can do what you asked.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.10.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.10.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="187">
            <parameter key="text" value="&quot;id&quot;;&quot;rank&quot;;&quot;name&quot;;&quot;employees&quot;;&quot;previous&quot;;&quot;revenues&quot;;&quot;revenuechange&quot;;&quot;profits&quot;;&quot;profitschange&quot;;&quot;assets&quot;;&quot;marketvalue&quot;&#10;&quot;2&quot;;&quot;1&quot;;&quot;Walmart&quot;;&quot;2,300,000&quot;;&quot;1&quot;;&quot;$485,873&quot;;&quot;0.8%&quot;;&quot;$13,643.0 &quot;;&quot;-7.2%&quot;;&quot;$198,825&quot;;&quot;$218,619&quot;"/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
          </operator>
          <operator activated="true" class="text:write_document" compatibility="9.3.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="187">
            <parameter key="overwrite" value="true"/>
            <parameter key="encoding" value="SYSTEM"/>
          </operator>
          <operator activated="true" class="read_csv" compatibility="9.10.001" expanded="true" height="68" name="Read CSV" width="90" x="380" y="187">
            <parameter key="column_separators" value=";"/>
            <parameter key="trim_lines" value="false"/>
            <parameter key="use_quotes" value="true"/>
            <parameter key="quotes_character" value="&quot;"/>
            <parameter key="escape_character" value="\"/>
            <parameter key="skip_comments" value="false"/>
            <parameter key="comment_characters" value="#"/>
            <parameter key="starting_row" value="1"/>
            <parameter key="parse_numbers" value="true"/>
            <parameter key="decimal_character" value="."/>
            <parameter key="grouped_digits" value="false"/>
            <parameter key="grouping_character" value=","/>
            <parameter key="infinity_representation" value=""/>
            <parameter key="date_format" value=""/>
            <parameter key="first_row_as_names" value="true"/>
            <list key="annotations"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information"/>
            <parameter key="read_not_matching_values_as_missings" value="true"/>
          </operator>
          <operator activated="true" class="replace" compatibility="9.10.001" expanded="true" height="82" name="Replace" width="90" x="514" y="187">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="[%$]"/>
            <parameter key="replace_by" value=""/>
          </operator>
          <operator activated="true" class="parse_numbers" compatibility="9.10.001" expanded="true" height="82" name="Parse Numbers" width="90" x="648" y="187">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="revenuechange|marketvalue|profits|profitschange|revenues"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="decimal_character" value="."/>
            <parameter key="grouped_digits" value="true"/>
            <parameter key="grouping_character" value=","/>
            <parameter key="infinity_representation" value=""/>
            <parameter key="unparsable_value_handling" value="fail"/>
          </operator>
          <connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
          <connect from_op="Write Document" from_port="file" to_op="Read CSV" to_port="file"/>
          <connect from_op="Read CSV" from_port="output" to_op="Replace" to_port="example set input"/>
          <connect from_op="Replace" from_port="example set output" to_op="Parse Numbers" to_port="example set input"/>
          <connect from_op="Parse Numbers" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


  • vansvans Member Posts: 2 Learner I
    Thanks. I don't really understand, but I pasted the example into the process window and ran it. It produced the results I wanted for the example row, but I don't know how to get it to parse a file that contains many rows with the same formatting. I tried to link an external file to the process, but couldn't get it to work.  How can I read a text file from the repository or from a directory on my PC?

    What is the best source of information on coding processes like these?

Sign In or Register to comment.