Parse Numbers not parsing some attributes - attribute remains nominal

chazzmoneychazzmoney Member Posts: 10 Contributor II
edited November 2018 in Help
Hello all,

I'm starting to get the hang of this - a fully integrated RM and R setup with database access and all.  However, I am struggling to get Parse Numbers to work on all data returned by R.

Some attributes are converted correctly, but others remain nominal although they are numerical text - Parse Numbers doesn't seem to convert them.  I'm wondering if I can determine what is causing this and how to fix it.  Some example of the data that is not converted:
1	93.8050166776113	0.0454157740586547	0.0443342630670029	1.47691587794877e-05
2 74.3344711293915 0.0468817728698001 0.0448437650275624 1.34038276456967e-05
3 74.3344711293915 0.0474954230324842 0.0453740966285467 1.22608784444541e-05
4 92.6475985034064 0.0491963308363763 0.0461385434701127 1.65735980572911e-05
2476	26.7948477790146	-0.00422690861908714	-0.0094301112003816	1.45483738479543e-05
2477 8.61930163773315 -0.00573894600796043 -0.00869187816189736 1.35195186313268e-05
2478 37.0749500040653 -0.00627755754951664 -0.00820901403942122 1.64033364144268e-05
2479 78.0625143961145 -0.00488596064822655 -0.00754440336118229 1.51513082738902e-05
The only guess I have is that Parse Numbers can't handle the number of digits in the string and somehow fails?  If any of you have thoughts, they would be greatly appreciated.

Thanks!

-Charles

Answers

  • chazzmoneychazzmoney Member Posts: 10 Contributor II
    So after spending some more time with my theory, I found the "split" and "Cut" operator.  Using these I was able to get more numerical parsing complete.

    For the scientific notation, Parse Numbers appeared to be unable to deal with it successfully even after splitting, cutting the first part, and regenerating via a concat.  I ended up just leaving the two split variables, assuming that the learning algorithms will be able to figure it out themselves.  Seem like a hack, but it will work.

    However, I still can't get this one to parse, even after cutting it as low as 6 characters:
    2516	3.2152
    2517 1.8000
    2518 0.4943
    2519 59.397
    3087	88.050
    3088 34.398
    3089 47.254
    3090 75.758
    3091 19.514
    3092 41.303
    7724	66.118
    7725 66.118
    7726 81.953
    7727 14.444
    What other kinds of unexpected things will prevent a nominal from being parsed by Parse Numbers besides a long string?

    Thanks!

    -Charles
  • chazzmoneychazzmoney Member Posts: 10 Contributor II
    Oh, and if anyone has any recommendation on getting the scientific notation numbers to parse in directly, that would be great.

    Thanks.
  • chazzmoneychazzmoney Member Posts: 10 Contributor II
    Actually, I exported the data to a file and started executing the exact same functions in R as in the R extension - and the data is in numeric form.  Why would it export to rapidminer in nominal form in the first place?
  • chazzmoneychazzmoney Member Posts: 10 Contributor II
    Bingo!  :D

    1) When in R, if you execute as.character() on a column in an xts or zoo object, ALL columns are converted to characters.
    2) Rapidminer will not accept data frame columns in character format.  Make sure you change things to factors using as.factor().  They will come into rapidminer as nominal.

    Thanks,

    -Charles
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi Charles,
    thanks for this hint. I'm not an expert in R and a character vector didn't occurred to me until now. Could you send me an script generating such a data frame? I could improve the import routine for dataframes to cover these format, too.

    By the way, if you are frequently using RM's R Extension, you might be interested to share your experience on our Special Interest Group for R? We want to get feedback from our users to improve the extension further...

    Greetings,
    Sebastian
  • chazzmoneychazzmoney Member Posts: 10 Contributor II
    Hi Sebastian,

    Thanks for the opportunity.  I'm not really an expert in R - I've been using it for about two weeks so far, about half as long as I've been using Rapidminer.  I'm grateful to you for all your hard work in making these amazing products.  I'd be happy to share my experiences, but I'm definitely not an expert.

    As for the script, there is probably a better / faster way to do this but:

    x=as.data.frame(c(1.5,2.1,3.2))
    colnames(x)="y"
    x[,"y"]=as.character(x[,"y"])
    If you request x back from R as a data frame, rapidminer will give you an error.  If you remove the last line of the R script, it will succeed.

    I think that in R the preferred form may be to keep things as factors, but there might be some use to leaving things as characters.  Someone who knows R better than I may have an answer.

    -Charles
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi,
    thanks, I have noted it down and will try to include it in one of the next releases.

    Greetings,
      Sebastian
Sign In or Register to comment.