parse numbers output not numerical

dhamptondhampton Member Posts: 14 Contributor II
edited November 2018 in Help

I am reading a .csv file that has some numbers formatted as currency, eg $1,000 or $500. These are read by RapidMiner as polynominal.  So I am using the Replace operator to remove the $ and , characters.  The $ removal works fine and the , removal is also fine, but oddly for sums of $999 and below, which did not have a comma in them, I receive an error message: "No Number: according to the specified format, 500 cannot be parsed as a number".  There are no spaces or other nuisances.  Any ideas what could cause this?  Thanks...

Best Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    Hi,

    this sounds a bit odd. Could you provide an example process? And did you tried Trim to remove leading and ending white spaces?

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • dhamptondhampton Member Posts: 14 Contributor II
    Solution Accepted

    Certainly is easy when you know how!  Many thanks Brian, I really like this one-size-fits-all replacement operator and will include it in our training.

     

    For the benefit of others: Brian's killer replacement operator has this in the 'replace what' parameter:  [-!"#$%&'()*+,/:;<=>?@\[\\\]_`{|}~a-zA-Z\s]

     

    David

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Have you tried the Parse Numbers operator and set the separator paremter to a comma?

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    You can just be extra cautious and replace all characters that won't parse with the replace operator.  It works for me on your dataset.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="7.3.000" expanded="true" height="68" name="Read CSV" width="90" x="179" y="85">
    <parameter key="csv_file" value="C:\Users\think\Downloads\Insurance Preparation.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Order.true.integer.attribute"/>
    <parameter key="1" value="No\. Risks.true.integer.attribute"/>
    <parameter key="2" value="Value Insured.true.polynominal.attribute"/>
    <parameter key="3" value="Employees.true.integer.attribute"/>
    <parameter key="4" value="Rent.true.polynominal.attribute"/>
    <parameter key="5" value="Preparation Time.true.real.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="replace" compatibility="7.3.000" expanded="true" height="82" name="Replace" width="90" x="313" y="85">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Rent|Value Insured"/>
    <parameter key="replace_what" value="[-!&quot;#$%&amp;'()*+,/:;&lt;=&gt;?@\[\\\]_`{|}~a-zA-Z\s]"/&gt;
    </operator>
    <operator activated="true" class="parse_numbers" compatibility="7.3.000" expanded="true" height="82" name="Parse Numbers" width="90" x="581" y="85">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Rent|Value Insured"/>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_op="Parse Numbers" to_port="example set input"/>
    <connect from_op="Parse Numbers" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • dramhamptondramhampton Member Posts: 9 Contributor II

    Hi JEdward

     

    Really helpful, thank you... but I do not know how to use the xml code you have provided, could you please tell me where to go to learn how to do that?

  • dhamptondhampton Member Posts: 14 Contributor II

    Hi Martin

     

    Many thanks for the speedy response.

     

    My original csv file does not have any spaces in it.  But, your Trim operator suggestion worked!  So, many thanks.  In case you are interested the file is attached but I'm counting this one as solved.

  • dhamptondhampton Member Posts: 14 Contributor II

    Thanks Thomas, yes I am using the Parse Numbers operator... that's what is giving me the error message. 

     

    I think you were referring to the decimal separator character?  Trouble is that if I change that to a comma then 1,000,000 becomes 1.000.000 which doesn't read as a number

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    If you enable the XML view in Studio, then you can copy the XML provided and replace the default XML, and then hit the green check mark at the top of the window. That will render the process in the main process view and you will be able to see the operators and their configuration.  Sharing the raw XML is thus an easy way of sharing a RapidMiner process and you will see it commonly done this way on the community forum posts.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.