How do i stop RapidMiner from converting large integers to scientific notation?

KeithrKeithr Member Posts: 10 Contributor II
edited November 2018 in Help
Hi,

I'm using the K-means algorithm to cluster some data for my company, and the ID I'm using, which comes directly from our database, goes up to 9 digits  (e.g. 107,204,426).  The problem is that RM is converting this to scientific notation (e.g. 1.07204426E8).  Now I realize that I can just multiply this number by 10^8 to get the original number but would prefer to have RM leave the number as is so that I can easily insert this data back into our database.

The number is way below the max for an int (2.1 billion) so RM should be able to handle it easily.

Is there a way to stop RM from converting large numbers to scientific notation?

Thanks in advance.

Keith Robinson

Answers

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 294 RM Product Management
    Hi Keith,

    no need to post on multiple boards normally we look through all the boards for new questions! ;)

    Concerning your question, do you mean the number representation you observe in RapidMiner or when the data is written back to the database? I just checked the number representation in RapidMiner with the value you mentioned and it works correctly. Maybe the behaviour is due to the database you are using?

    Cheers,
    Tobias
  • KeithrKeithr Member Posts: 10 Contributor II
    Hi Tobias,

    Sorry about posting my question twice.  After posting it the 1st time I realized that it was in the wrong forum so reposted it in the correct forum and tried to delete it from the wrong one but could not.

    We use Sybase IQ, which is a database specifically for data warehousing.  The one downside to it is that loading records using an insert SQL statement is agonizingly slow.  However, from a flat file and using their proprietary SQL I can load over a million records in seconds.  So my plan is to us RapidMiner to create a CSV that I'll then automatically load into the db using Perl or ksh.  However, when I create a CSV file RM converts the customer ID to scientific notation.  Since no significant digits are lost I can easily convert it back to the actual number, but was wondering if there is a way to have RM keep the actual values.

    Here are a couple of lines from the CSV file I created using a decision tree algorithm.  Notice that the 2nd to last column, which I'm passing in as an integer ID (107204426 ), now ends in "E8" (1.07204426E8).  How do I stop RM from doing this?

    Thanks

    Keith

    "zsButcherSales","zsMreSales","zsPremiumSales","zsStandardSales","zsTobaccoSales","zsFrozenFoodsSales","zsGrocerySales","hhId","cluster"
    -0.754741,-0.024232,-1.277233,-2.890393,-0.294866,-1.042057,-1.814267,1.07204426E8,"0"
Sign In or Register to comment.