RapidMiner

RapidMiner

parse numbers output not numerical

SOLVED
Regular Contributor

parse numbers output not numerical

[ Edited ]

I am reading a .csv file that has some numbers formatted as currency, eg $1,000 or $500. These are read by RapidMiner as polynominal.  So I am using the Replace operator to remove the $ and , characters.  The $ removal works fine and the , removal is also fine, but oddly for sums of $999 and below, which did not have a comma in them, I receive an error message: "No Number: according to the specified format, 500 cannot be parsed as a number".  There are no spaces or other nuisances.  Any ideas what could cause this?  Thanks...

Attachments

8 REPLIES
RMStaff

Re: parse numbers output not numerical

Hi,

this sounds a bit odd. Could you provide an example process? And did you tried Trim to remove leading and ending white spaces?

 

~Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Community Manager

Re: parse numbers output not numerical

Have you tried the Parse Numbers operator and set the separator paremter to a comma?

Regards,
T-Bone
Twitter: @neuralmarket
Highlighted
Elite III

Re: parse numbers output not numerical

You can just be extra cautious and replace all characters that won't parse with the replace operator.  It works for me on your dataset.

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="7.3.000" expanded="true" height="68" name="Read CSV" width="90" x="179" y="85">
        <parameter key="csv_file" value="C:\Users\think\Downloads\Insurance Preparation.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Order.true.integer.attribute"/>
          <parameter key="1" value="No\. Risks.true.integer.attribute"/>
          <parameter key="2" value="Value Insured.true.polynominal.attribute"/>
          <parameter key="3" value="Employees.true.integer.attribute"/>
          <parameter key="4" value="Rent.true.polynominal.attribute"/>
          <parameter key="5" value="Preparation Time.true.real.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="replace" compatibility="7.3.000" expanded="true" height="82" name="Replace" width="90" x="313" y="85">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Rent|Value Insured"/>
        <parameter key="replace_what" value="[-!&quot;#$%&amp;'()*+,/:;&lt;=&gt;?@\[\\\]_`{|}~a-zA-Z\s]"/>
      </operator>
      <operator activated="true" class="parse_numbers" compatibility="7.3.000" expanded="true" height="82" name="Parse Numbers" width="90" x="581" y="85">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Rent|Value Insured"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Replace" to_port="example set input"/>
      <connect from_op="Replace" from_port="example set output" to_op="Parse Numbers" to_port="example set input"/>
      <connect from_op="Parse Numbers" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
-- Training, Consulting, Sales in China, Hong Kong & Taiwan --
www.RapidMinerChina.com
Contributor

Re: parse numbers output not numerical

[ Edited ]

Hi JEdward

 

Really helpful, thank you... but I do not know how to use the xml code you have provided, could you please tell me where to go to learn how to do that?

Regular Contributor

Re: parse numbers output not numerical

[ Edited ]

Hi Martin

 

Many thanks for the speedy response.

 

My original csv file does not have any spaces in it.  But, your Trim operator suggestion worked!  So, many thanks.  In case you are interested the file is attached but I'm counting this one as solved.

Attachments

Regular Contributor

Re: parse numbers output not numerical

Thanks Thomas, yes I am using the Parse Numbers operator... that's what is giving me the error message. 

 

I think you were referring to the decimal separator character?  Trouble is that if I change that to a comma then 1,000,000 becomes 1.000.000 which doesn't read as a number

Elite II

Re: parse numbers output not numerical

If you enable the XML view in Studio, then you can copy the XML provided and replace the default XML, and then hit the green check mark at the top of the window. That will render the process in the main process view and you will be able to see the operators and their configuration.  Sharing the raw XML is thus an easy way of sharing a RapidMiner process and you will see it commonly done this way on the community forum posts.

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
Regular Contributor

Re: parse numbers output not numerical

[ Edited ]

Certainly is easy when you know how!  Many thanks Brian, I really like this one-size-fits-all replacement operator and will include it in our training.

 

For the benefit of others: Brian's killer replacement operator has this in the 'replace what' parameter:  [-!"#$%&'()*+,/:;<=>?@\[\\\]_`{|}~a-zA-Z\s]

 

David